Benchmarking Python vs PyPy vs Go vs Rust

Since I learned Go I started wondering how well it performs compared to Python in a HTTP REST service. There are lots and lots of benchmarks already out there, but the main problem on those benchmarks is that they’re too synthetic; mostly a simple query and far from real world scenarios.

Some frameworks like Japronto exploit this by making the connection and the plain response blazing fast, but of course, as soon as you have to do some calculation (and you have to, if not what’s the point on having a server?) they fall apart pretty easily.

To put a baseline here, Python is 50 times slower than C++ on most benchmarks, while Go is 2-3 times slower than C++ on those and Rust some times even beats C++.

But those benchmarks are pure CPU and memory bound for some particular problems. Also, the people who submitted the code did a lot of tricks and optimizations that will not happen on the code that we use to write, because safety and readability is more important.

Other type of common benchmarks are the HTTP framework benchmarks. In those, we can get a feel of which languages outperform to others, but it’s hard to measure. For example in JSON serialization Rust and C++ dominate the leader board, with Go being only 4.4% slower and Python 10.6% slower.

In multiple queries benchmark, we can appreciate that the tricks used by the frameworks to “appear fast” no longer are useful. Rust is on top here, C++ is 41% slower, and Go is 43.7% slower. Python is 66.6% slower. Some filtering can be done to put all of them in the same conditions.

While in that last test which looks more realistic, is interesting to see that Python is 80% slower, which means 5x from Rust. That’s really really far better from the 50x on most CPU benchmarks that I pointed out first. Go on the other hand does not have any benchmark including any ORM, so it’s difficult to compare the speed.

The question I’m trying to answer here is: Should we drop Python for back-end HTTP REST servers? Is Go or Rust a solid alternative?

The reasoning is, a REST API usually does not contain complicated logic or big programs. They just reply to more or less simple queries with some logic. And then, this program can be written virtually with anything. With the container trend, it is even more appealing to deploy built binaries, as we no longer need to compile for the target machine in most cases.

Benchmark Setup

I want to try out a crafted example of something slightly more complicated, but for now I didn’t find the time to craft a proper thing. For now I have to fall back into the category of “too synthetic benchmarks” and release my findings up to this point.

The base is to implement the fastest possible for the following tests:

  • HTTP “Welcome!\n” test: Just the raw minimum to get the actual overhead of parsing and creating HTTP messages.
  • Parse Message Pack: Grab 1000 pre-encoded strings, and decode them into an array of dicts or structs. Return just the number of strings decoded. Aims to get the speed of a library decoding cache data previously serialized into Redis.
  • Encode JSON: Having cached the previous step, now encode everything as a single JSON. Return the number characters in the final string. Most REST interfaces will have to output JSON, I wanted to get a grasp how fast is this compared to other steps.
  • Transfer Data: Having cached the previous step, now send this data over HTTP (133622 bytes). Sometimes our REST API has to send big chunks over the wire and it contributes to the total time spent.
  • One million loop load: A simple loop over one million doing two simple math operations with an IF condition that returns just a number. Interpreted languages like Python can have huge impact here, if our REST endpoint has to do some work like ORM do, it can be impacted by this.

The data being parsed and encoded looks like this:

{"id":0,"name":"My name","description":"Some words on here so it looks full","type":"U","count":33,"created_at":1569882498.9117897}

The test has been performed on my old i7-920 capped at 2.53GHz. It’s not really rigorous, because I had to have some applications open while testing so assume a margin of error of 10%. The programs were done by minimal effort possible in each language selecting the libraries that seemed the fastest by looking into several benchmarks published.

Python and PyPy were run under uwsgi, sometimes behind NGINX, sometimes with the HTTP server included in uwsgi; whichever was faster for the test. (If anyone knows how to test them with less overhead, let me know)

The measures have been taken with wrk:

$ ./wrk -c 256 -d 15s -t 3 http://localhost:8080/transfer-data

For Python and PyPy the number of connections had to be lowered to 64 in order to perform the tests without error.

For Go and Rust, the webserver in the executables was used directly without NGINX or similar. FastCGI was considered, but seems it’s slower than raw HTTP.

Python and PyPy were using Werkzeug directly with no url routing. I used the built-in json library and msgpack from pip. For PyPy msgpack turned out to be awfully slow so I switched to msgpack_pypy.

Go was using “” and “” for serving HTTP with url routing. For JSON I used “encoding/json” and for MessagePack I used “”.

For Rust I went with “actix-web” for the HTTP server with url routing, “serde_json” for JSON and “rmp-serde” for MessagePack.

Benchmark Results

As expected, Rust won this test; but surprisingly not in all tests and with not much difference on others. Because of the big difference on the numbers, the only way of making them properly readable is with a logarithmic scale; So be careful when reading the following graph, each major tick means double performance:

Here are the actual results in table format: (req/s)

HTTPparse mspencode jsontransfer data1Mill load

Also, for the Transfer Data test, it can be translated into MiB/s:

transfer speed
Rust2,491.53 MiB/s
Go2,897.66 MiB/s
PyPy701.15 MiB/s
Python897.27 MiB/s

And, for the sake of completeness, requests/s can be translated into mean microseconds per request:

HTTPtransfer dataparse mspencode json1Mill load

As per memory footprint: (encoding json)

  • Rust: 41MB
  • Go: 132MB
  • PyPy: 85MB * 8proc = 680MB
  • Python: 20MB * 8proc = 160MB

Some tests impose more load than others. In fact, the HTTP only test is very challenging to measure as any slight change in measurement reflects a complete different result.

The most interesting result here is Python under the tight loop; for those who have expertise in this language it shouldn’t be surprising. Pure Python code is 50x times slower than raw performance.

PyPy on the other hand managed under the same test to get really close to Go, which proves that PyPy JIT compiler actually can detect certain operations and optimize them close to C speeds.

As for the libraries, we can see that PyPy and Python perform roughly the same, with way less difference to the Go counterparts. This difference is caused by the fact that Python objects have certain cost to read and write, and Python cannot optimize the type in advance. In Go and Rust I “cheated” a bit by using raw structs instead of dynamically creating the objects, so they got a huge advantage by knowing in advance the data that they will receive. This implies that if they receive a JSON with less data than expected they will crash while Python will be just fine.

Transferring data is quite fast in Python, and given that most API will not return huge amounts of it, this is not a concern. Strangely, Go outperformed Rust here by a slight margin. Seems that Actix does an extra copy of the data and a check to ensure UTF-8 compatibility. A low-level HTTP server probably will be slightly faster. Anyway, even the slowest 700MiB/s should be fine for any API.

On HTTP connection test, even if Rust is really fast here, Python only takes 50 microseconds. For any REST API this should be more than enough and I don’t think it contributes at all.

On average, I would say that Rust is 2x faster than Go, and Go is 4x faster than PyPy. Python is from 4x to 50x slower than Go depending on the task at hand.

What is more important on REST API is the library selection, followed by raw CPU performance. To get better results I will try to do another benchmark with an ORM, because those will add a certain amount of CPU cycles into the equation.

A word on Rust

Before going all the way into developing everything in Rust because is the fastest, be warned: It’s not that easy. Of all four languages tested here, Rust was by far, the most complex and it took several hours for me, untrained, to get it working at the proper speed.

I had to fight for a while with lifetimes and borrowing values; I was lucky to have the Go test for the same, so I could see clearly that something was wrong. If I didn’t had these I would had finished earlier and call it a day, leaving code that copies data much more times than needed, being slower than regular Go programs.

Rust has more opportunities and information to optimize than C++, so their binaries can be faster and it’s even prepared to run on crazier environments like embedded, malloc-less systems. But it comes with a price to pay.

It requires several weeks of training to get some proficiency on it. You need also to benchmark properly different parts to make sure the compiler is optimizing as you expect. And there is almost no one in the market with Rust knowledge, hiring people for Rust might cost a lot.

Also, build times are slow, and in these test I had always to compile with “–release”; if not the timings were horribly bad, sometimes slower than Python itself. Release builds are even slower. It has a nice incremental build that cuts down this time a lot, but changing just one file requires 15 seconds of build time.

Its speed it’s not that far away from Go to justify all this complexity, so I don’t think it’s a good idea for REST. If someone is targeting near one million requests per second, cutting the CPU by half might make sense economically; but that’s about it.

Update on Rust (January 18 2020): This benchmark used actix-web as webserver and it has been a huge roast recently about their use on “unsafe” Rust. I’m had more benchmarks prepared to come with this webserver, but now I’ll redo them with another web server. Don’t use actix.

About PyPy

I have been pleased to see that PyPy JIT works so well for Pure Python, but it’s not an easy migration from Python.

I spent way more time than I wanted on making PyPy work properly for Python3 code under uWSGI. Also I found the problem with MsgPack being slow on it. Not all Python libraries perform well in PyPy, and some of them do not work.

PyPy also has a high load time, followed by a warm-up. The code needs to be running a few times for PyPy to detect the parts that require optimization.

I am also worried that complex Python code cannot be optimized at all. The loop that was optimized was really straightforward. Under a complex library like SQLAlchemy the benefit could be slim.

If you have a big codebase in Python and you’re wiling to spend several hours to give PyPy a try, it could be a good improvement.

But, if you’re thinking on starting a new project in PyPy for performance I would suggest looking into a different language.

Conclusion: Go with Go

I managed to craft the Go tests in no time with almost no experience with Go, as I learned it several weeks ago and I only did another program. It takes few hours to learn it, so even if a particular team does not know it, it’s fairly easy to get them trained.

Go is a language easy to develop with and really productive. Not as much as Python is, but it gets close. Also, it’s quick build times and the fact that builds statically, makes very easy to do iterations of code-test-code, being attractive as well for deployments.

With Go, you could even deploy source code if you want and make the server rebuild it each time that changes if this makes your life easier, or uses less bandwidth thanks to tools like rsync or git that only transfer changes.

What’s the point of using faster languages? Servers, virtual private servers, server-less or whatever technology incurs a yearly cost of operation. And this cost will have to scale linearly (in the best case scenario) with user visits. Using a programming language, frameworks and libraries that use as less cycles and as less memory as possible makes this year cost low, and allows your site to accept way more visits at the same price.

Go with Go. It’s simple and fast.

Why should you learn Go for your next project

I have been hearing about Go for long time and along with Rust is one of the two new programming languages that seem to be gaining some attention in the last years.

After learning Go it seems to me a good alternative to other programming languages because is simple, beautiful, hassle free, and fast compared to everything else that is not C, C++ or Rust. Its simplicity is really appealing because you can start small and grow as big as you want.

Go is ideal for containerized apps and websites. Runs faster than other popular alternatives for web, with a small memory footprint, and their executables have no dependencies. It’s blazing fast to compile, so iterating with new versions and deploying is almost as fast as interpreted languages.

Being so simple to understand, it requires a very small training to be able to be productive in Go.

Comparing Rust with Go

  • Go is developed by Google. Rust is developed by Mozilla.
  • Go use case is a more practical C or more scalable Python. Rust is a high performance alternative to C++.
  • Both have garbage collection.
  • Both are compiled.
  • Go is productive. Rust is fast.

In short, Rust would be a better option where speed is key as it is as fast as C++; Also, it has some features that aim to have cost-less abstractions, so it looks promising for big complex projects like browsers. It should be able to deliver the same speed as C++ with less complexity on the code.

Five reasons to use Go

  • Simple and beautiful: Go is easy to read and easy to write
  • Runs fast: Go is faster than JavaScript and Python, comparable to Java.
  • Compiles fast: As far as I know, Go is the fastest language on compiling times. It’s one of its main purposes. It also includes an “interpreter”, so you can run go programs from source to avoid compiling while developing.
  • Static typing: When a program grows large, having static types may help a lot, also on safety purposes. Go is statically typed, so your programs can grow while staying safe that they will run as you expect.
  • Explicit but terse: Go language is explicit, so the meaning is clearly conveyed in the programs. But at the same time is terse, so we don’t spend much time writing Go.

Is Go better than Python?

While Go does not have the flexibility and magic that Python has, still has basic primitives for flexible arrays and maps (dictionaries) so with Go we don’t lose as much proficiency as in other compiled languages.

But Go is way faster than Python, so if our program has to do custom complex calculations, Go can be 30 times faster than Python. But beware, as Python links their libraries to plain C, in some (very) specific use cases it could beat Go or other languages.

There’s no much difference on the development cycles from Go to Python. Go also has the ability to run the programs on the fly, so the sequence of code-try-code is equally fast.

Go produces final binaries for the platform, and statically linked. So, for distributing, you don’t need to distribute sources (this can be good or bad depending on your point of view). But you also have to build different binaries for different platforms. Because is statically linked, there’s no need to account for the different Linux distributions, so the same binary should run across all Unix flavours that support ELF format and run the same architecture.

For distributing on Windows, Go could be easier as just produces an executable and runs across many Windows platforms while on Python you have to care on packaging for Windows and test it properly; or tell the user to install the whole development stack which is a hassle.

The main disadvantage of Go vs Python is that Go tries to statically compile everything, so the behavior of code is set at compile time. There are Go interfaces which can help creating this kind of “magic” abstractions that change behavior depending on the scenario, but aside of that it’s a bit limited. In contrast, Python is much more flexible.

Is Go better than C++ or Java?

While C++ and Java are more feature-rich, Go is simplified and more productive. Also Java tends to be memory-hungry, so Go it’s useful to run programs in constrained memory and disk requirements.

Because Go statically links everything inside, its executables will be bigger than their C++ counterpart, but still way smaller than Java as you have to carry the JVM and libraries which use a lot of disk space. This makes Go an excellent candidate for containerized applications.

The downsides of Go is the lack of abstractions, and is slower than C++; being more or less as fast as Java (but still a bit slower in some scenarios).

Go is strongly opinionated

While learning Go for the first time I found a lot of things surprising. For me Go is the plain old C language with a new Pythonic style. I like both C and Python a lot so I see a lot of influence from both languages in Go.

When designing Go they weren’t scared of breaking the rules, it is clear that they have a strong opinion on how things should be done. In the same sense that Python wanted to specifically ditch the braces for blocks and the “switch” statement, they had clear that they don’t want classes (on the usual OOP approach) and they don’t want exceptions.

Unlike Python, it doesn’t use whitespace for blocks and uses the classic braces and has an extended switch statement.

The braces don’t need any explanation, but the switch deserves a mention. The main problem of a switch statement is, by default, it follows from one case to the following, causing unintended bugs.

In Go they solved it by going the other way around: By default each case is independent unless you add the keyword “fallthrough”. This makes this construct less bug-prone and terser in the common case:

There are also special types of switch: with no condition so all conditions are in cases; for variable type detection; and finally for processing many asynchronous events at once.

As said before, there are no exceptions in Go. You’re expected to return the error using a conventional return statement, so the common approach is to return a tuple of (value, error) or (value, ok). This comes from C where we used to encode errors in the return value. But Go makes this way simpler by allowing tuples of values to be returned.

It also has an error primitive that can be used easily to convey error messages as text, and it can be extended to your needs.

This means that your code should be checking for error codes explicitly. Failing to do so means that the code will continue running using a default value instead. It does not fail the execution.

Go programs can fail completely as well and stop execution. This is known as panicking. Panic can be started from an internal call or manually by the programmer by using the “panic” function. So instead of throwing errors, you can just panic. In this case, checks are caller responsibility.

Now functions have two ways of exiting, returning and panicking. So they added a “defer” statement to compute cleanups at the end of the function. This is useful because it is run regardless of how or when the function exits.

Panicking unrolls the stack, calling all defer statements in the way. It is still possible to avoid the program from crashing by using the recover keyword. This actually looks like a flavor of try..catch, but is not recommended in Go. Although less performant than error codes, it can be clearer or easier to reason in some cases.

Going back to classes and object-oriented programming. Go does not have classes but it has some of the object-oriented ideas implemented, again, in a more flavored C style.

They use structs and interfaces. Structs are like regular C structs, so no code, just data. They can be inherited in the same sense as in C, stacking one in top of the other. This is called “struct embedding” and Go adds syntactic sugar to help this:

Multiple embedding is possible, just stacks the structs one in top of the other, much in the style of C. So no diamond problem, if a name appears twice, it will be stored twice.

The code for those is held outside of the struct, by defining functions for types. Much like Python, self/this is declared explicitly:

And then, there are interfaces to be able to write code that manages diverse types at once. It might resemble to Python duck typing, Java interfaces, C++ virtual functions, etc. But it is a thing on its own.

Interfaces define a set of methods that must exist in a type. A type does not declare if adheres to any interface. The fact that the type has all methods is enough to be able to use it for the said interface. So in this sense, it resembles duck typing.

And finally, Go is one of the few programming languages that I know of that is compiled and supports UTF-8 natively.