…and please stop publishing benchmarks with Pipelining enabled. It’s just lying about real-world performance.
Today I just found out that one of my favorite sources for HTTP framework benchmarks is indeed using pipelining to score the different programming languages and frameworks and I’m mad about it:
The first time I saw this was with Japronto, which claimed one freaking million of requests per second, and of course this wasn’t replicable unless you had a specific benchmarking method with pipelining enabled.
Before HTTP/2 I was in favor of pipelining because we were so limited on parallel requests and TCP connections were so costly that it made sense. Now, with H2 supported on all major browsers and servers, pipelining should be banned from benchmarks.
What is HTTP pipelining?
In classic HTTP/1, we had to open a TCP connection for a single request. Open the socket, send the request, wait for response and close the socket. TCP connections have a big cost to open, so this was a real problem back in the days.
With HTTP/1.1 we had keep-alive, where after the request was completed, we can feed another request on the same TCP socket. This alleviated the problem. But still, if your computer is far from the server (usually is), the server will sit idle waiting for the last packet sent to arrive to your computer, then waiting for your next request back. In most servers this is 80ms of delay from one request to the following one.
So here enters the scenario the named HTTP pipelining, where we could send another request before the response was received, effectively queuing the requests on the server and receiving them in order. Wikipedia has a nice graph on this:
This looks nice, but HTTP/1.1 never got Pipelining working; it was there, not mandatory, with some clients and servers supporting it; but as it seemed that most web servers at the moment were failing to reply properly with pipelining, and there was no reliable way for the client to tell if the server actually supports pipelining, all major browsers didn’t add the support at all. What a shame!
It was a really good idea, but then HTTP/2 came with multiplexing and this problem vanished. There are still challenges in this area, but nothing that Pipelining will solve. So now, we’re happy with multiplexing.
HTTP/2 does not have Pipelining
This is a common misunderstanding. Yes, you can send several requests; even several thousands without waiting to receive anything. This is really good, but it’s not pipelining. Why?
Pipelining, as it’s name implies, acts like a pipe: First In, First Out. The request will be queued in the server in order and will be replied in order.
HTTP/2 has instead multiplexing, which seems similar, but better. Multiplexing means that you get several streams inside one at the same time, so you can receive data as it is produced. The requests are not queued and are not returned in the same order. They come back at the same time.
Why pipelining gives so good results
Because it’s equivalent to copy a file over the network, specially under synthetic benchmarks where localhost is the target, pipelining reduces a lot of effort to get the different packets.
Instead of grabbing a packet for a request and processing it, you can let it buffer, then grab a big chunk in one go that might contain hundreds of requests, and reply back without caring at all if the client is getting the data or not.
Even more, as the benchmark is synthetic, servers might know beforehand what to serve more or less, reducing time for look what is requested and just replying back approximately the same data again and again.
The benchmark clients also do way less effort, because they only need to fill a connection with the same string repeated millions of times.
If you think about it carefully, this is even faster than copying files over localhost: You don’t even need to read a file in the first place.
HTTP/2 multiplexing is slower
Compared to pipelining, of course. Because you’re not serving a clear stream of data but thousands of interleaved streams, your server has to do more work. This is obvious.
Of course, we could craft a cleartext HTTP/2 server that does multiplexing in effectively one single stream, replying in order. This will result in closer performance to pipelining because it’s indeed pipelining.
But this will be naive to be implemented on a production site, as the same applies if HTTP/1.1 pipelining was a thing. HTTP/2 proper multiplexing is far superior in real world scenarios.
And my question is, do you want your benchmark to return higher results or do you want your users to have the best experience possible?
Because if you only care on benchmarks, maybe is just easier to change the benchmark so it returns better results for your servers, right?
Pipelining will not help serve more requests
I can hear some of you saying “If we enable Pipelining in our production, we will be able to serve millions of results!”. And… surprise!
Why? you might ask. Well, depending on the scenario the problem is different, but it will conclude always to the same two: You need to be able to reply out-of-order to avoid bottlenecks and a single user will never cause thousands of pipelined requests like your benchmark tool.
First pitfall: Requests are heterogeneous, not homogeneous.
Requests will not have the same size, nor reply size. They will have different computing times or wait times to reply. Does your production site reply with a fortune cookie for every single request? Even CSS and JPEG queries? No, I don’t think so.
Why this matters? Well, say your client is asking for a CSS and a JPEG for the same page and you’re replying back with pipelining. If the JPEG was requested first, the CSS will stall until the image completed, making the page not render for some time.
Imagine now we have a REST API, and we get thousands of requests from a client. One of the requests contains an expensive search on the database. When that one is processed, the channel will sit idle and your client will be frozen.
Second pitfall: Real users will never pipeline thousands of requests.
Unless your site is really bad designed, you’ll see that more than 50 parallel request do not make much sense. I tried myself HTTP/2 with an Angular site aggressively sending requests for tiny resources, and the results were quite good, but less than 100 requests in parallel. And the approach was pretty stupid. Aside of this, popular servers and browser lack support for HTTP/1.1 pipelining, so enabling it in your product will not make any difference.
Let’s consider this for a second. Why do we want to pipeline in the first place? Because the client is far from the server and we want to reduce the impact of round-trip time. So, say our ping time to the server is 100ms (which is higher than the usual), and we pipeline 100 requests at a time.
Effectively, in one round-trip, we served 100 requests, so this equates to 1ms RTT per HTTP response. What haves 1ms RTT? Local network! So when you reach this parallelism, the client works as fast as from your local network given the same bandwidth is available. Try the same math for one thousand and ten thousand requests pipelined: 0.1ms and 0.01ms respectively.
So now the question is: Are you trying to save 0.9ms per request to the client, or are you just trying to get your benchmark numbers look better?
Scenario 1: API behind reverse proxy
Assume we have our shiny Japronto in port 8001 in localhost, but you want to serve it along the rest of the site, in port 80. So we put it behind a reverse proxy configuration; this might be Apache, Nginx or Varnish.
Here’s the problem: None of the popular web servers or reverse proxies support pipelining. In fact, even serving static data they will be slower than what your shiny pipelining framework claims it can do.
Even if they did, when they proxy the request, they don’t do pipeline on the proxied server either.
This approach renders pipelining useless.
Scenario 2: Main Web Server
So let’s put our framework directly facing public internet, over another port, who cares? We can send the requests from Angular/React/Vue to whatever port and the user will not notice. Of course this will add a bit of complexity as we need to add some headers here and there to tell the browser to trust our application running in a different port than the main page.
Nice! Does this work? Well, yes and no.
The main concern here is that we’re exposing a non well-tested server to the internet and this can be incredibly harmful. Bugs are most probably sitting there unnoticed, until someone actually notices and exploits them to gain access to our data.
If you want seriously to do that, please, put it inside a Docker container with all permissions cut down, with most mounting points as read only. Including the initial docker container image.
Did we enable HTTP/2 with encryption? If we’re lucky enough that our framework supports it, then it will consume extra CPU doing the encryption and multiplexing.
HTTP/2 over clear text does not work in any browser, so if you try, most users will just go with HTTP/1.1.
If we don’t use HTTP/2 at all, 99% of users have browsers that do not use pipelining at all.
For those cases where they do, the routers and hardware that makes internet itself work will mess up sometimes the data because they see HTTP in clear and they want to “manage” it because “they know the standard”. And they’re pretty old.
Scenario 3: Pipelining reverse proxy
I had an excellent idea: Let’s have our main web server to collect all requests from different users and pipeline them under a single stream! Then we can open several processes or threads to further use the CPU power and with pipelining, the amount of requests per second served will be astonishing!
Sounds great, and a patch to Nginx might do the trick. In practice this is going to be horrible. As before, we will have bottlenecks but now one user can freeze every other user because they asked a bunch of costly operations.
The only way this can work is if the framework supports HTTP/2 encrypted and is fast doing it. In this case you should have benchmarked frameworks with HTTP/2 multiplexing.
If your framework does not multiplex properly so indeed, pipelines the data, then users will see unexplainable delays under certain loads that are hard to reproduce.
In some scenarios, the client is not a user browser. For example for RPC calls if we implement the microservices approach. In this case, pipeline indeed works given the responses are homogeneous.
But, just it turns out that HTTP is not the best protocol for those applications. There are tons of RPC protocols and not all of these use HTTP. In fact, if you search for the fast ones, you’ll see that HTTP is the first thing they drop out.
I did in the past an RPC protocol myself called bjsonrpc. I wanted speed, and dropping HTTP was my main motivation to create it.
If you need HTTP for compatibility, just have two ports open, one for each protocol. Clients that can’t understand a specific protocol are likely to not understand pipelining either. Having a port for each thing will give you the best performance in the clients that support it while still allowing other software to connect.
Brief word on QUIC
The new old QUIC protocol by Google is being standarized at the moment by the IETF as a base for the future HTTP/3 protocol. QUIC does support fast encryption (less round trips) and has a lot of tolerance against packet loss as well supporting IP Address changes.
This is indeed the best protocol possible for RPC calls, except for its massive use of CPU compared to raw TCP. I really hope that someone standarizes a non-HTTP protocol on top of it aimed to application connections, to be supported by browsers.
The takeaway: managing the protocol takes a lot of CPU, we have to do it in production, and skipping part of that for some frameworks that support it is unfair for the others. Please be considerate and disable pipelining when publishing benchmarks, otherwise a lot of people will be taking the wrong decision based on YOUR results.