Rust vs Python: Rust will not replace Python

I love Python, I used it for 10+ years. I also love Rust, I have been learning it for the last year. I wanted a language to replace Python, I looked into Go and became disappointed. I’m excited about Rust, but it’s clear to me that it’s not going to replace Python.

In some parts, yes. There are small niches where Rust can be better than Python and replace it. Games and Microservices seem ones of the best candidates, but Rust will need a lot of time to get there. GUI programs have also a very good opportunity, but the fact that Rust model is too different from regular OOP makes it hard to integrate with existing toolkits, and a GUI toolkit is not something easy to do from scratch.

On CLI programs and utilities, Go is probably to prevent Rust from gaining some ground here. Go is clearly targeted towards this particular scenario, is really simple to learn and code, and it does this really well.

What Python lacks

To understand what are the opportunities from other languages to replace Python we should first look to what are the shortfalls of Python.

Static Typing

There are lots of things that Python could improve, but lately I feel that types are one of the top problems that need to be fixed, and it actually looks it’s fixable.

Python, like Javascript, is completely not typed. You can’t easily control what are the input and output types of functions, or what are the types of local variables.

There’s the option now to type your variables and check it with programs like MyPy or PyType. This is good and a huge step forward, but insufficient.

When coding, having IDE autocompletion, suggestions and inspection helps a lot when writing code, as it speeds up the developer by reducing round-trips to the documentation. On complex codebases it really helps a lot because you don’t need to navigate through lots of files to determine what’s the type that you’re trying to access.

Without types, an IDE is almost unable to determine what are the contents of a variable. It needs to guess and it’s not good. Currently, I don’t know of any autocompletion in Python solely based on MyPy.

If types were enforced by Python, then the compiler/interpreter could do some extra optimizations that aren’t possible now.

Also, there’s the problem of big codebases in Python with contributions of non-senior Python programmers. A senior developer will try to assume a “contract” for functions and objects, like, what are the “valid” inputs for that it works, what are valid outputs that must be checked from the caller. Having strict types is a good reminder for not so experienced people to have consistent designs and checks.

Just have a look on how Typescript improved upon JavaScript by just requiring types. Taking a step further and making Python enforce a minimum, so the developer needs to specify that doesn’t want to type something it will make programs easier to maintain overall. Of course this needs a way to disable it, as forcing it on every scenario would kill a lot of good things on python.

And this needs to be enforced down to libraries. The current problem is that a lot of libraries just don’t care, and if someone wants to enforce it, it gets painful as the number of dependencies increase.

Static analysis in Python exists, but it is weak. Having types enforced would allow to better, faster, and more comprehensive static analysis tools to appear. This is a strong point in Rust, as the compiler itself is doing already a lot of static analysis. If you add other tools like Cargo Clippy, it gets even better.

All of this is important to keep the codebase clean and neat, and to catch bugs before running the code.

Performance

The fact that Python is one of the slowest programming languages in use shouldn’t be news to anyone. But as I covered before in this blog, this is more nuanced than it seems at first.

Python makes heavy use of integration with C libraries, and that’s where its power unleashes. C code called from Python is still going at C speed, and while that is running the GIL is released, allowing you to do a slight multithreading.

The slowness of Python comes from the amount of magic it can do, the fact that almost anything can be replaced, mocked, whatever you want. This makes Python specially good when designing complex logic, as it is able to hide it very nicely. And monkey-patching is very useful in several scenarios.

Python works really well with Machine Learning tooling, as it is a good interface to design what the ML libraries should do. It might be slow, but a few lines of code that configure the underlying libraries take almost zero time, and those libraries do the hard work. So ML in Python is really fast and convenient.

Also, don’t forget that when such levels of introspection and “magic” are needed, regardless of the language, it is slow. This can be seen when comparing ORMs between Python and Go. As soon as the ORM is doing the magic for you, it becomes slow, in any language. To avoid this from happening you need an ORM that it’s simple, and not that automatic and convenient.

The problem arises when we need to do something where a library (that interfaces C) doesn’t exist. We end coding the actual thing manually and this becomes painfully slow.

PyPy solves part of the problem. It is able to optimize some pure python code and run it to speeds near to Javascript and Go (Note that Javascript is really fast to run). There are two problems with this approach, the first one is that the majority of python code can’t be optimized enough to get good performance. The second problem is that PyPy is not compatible with all libraries, since the libraries need to be compiled against PyPy instead of CPython.

If Python were stricter by default, allowing for wizardry stuff only when the developer really needs it, and enforcing this via annotations (types and so), I guess that both PyPy and CPython could optimize it further as it can do better assumptions on how the code is supposed to run.

The ML libraries and similar ones are able to build C code on the fly, and that should be possible for CPython itself too. If Python included a sub-language to do high-performance stuff, even if it takes more time to start a program, it would allow programmers to optimize the critical parts of the code that are specially slow. But this needs to be included on the main language and bundled on every Python installation. That would also mean that some libraries could get away with pure-python, without having to release binaries, which in turn, will increase the compatibility of these with other interpreters like PyPy.

There’s Cython and Pyrex, which I used on the past, but the problem on these is that it will force you to build the code for the different CPU targets and python versions, and that’s hard to maintain. Building anything on Windows is quite painful.

The GIL is another front here. By only allowing Python to execute a instruction at once, threads cannot be used to distribute pure python CPU intensive operations between cores. Better Python optimizations could in fact relief this by determining that function A is totally independent of function B, and allowing them to run in parallel; or even, they could build them into non-pythonic instructions if the code clearly is not making use of any Python magic. This could allow for the GIL to be released, and hence, parallelize much better.

Python & Rust together via WASM

This could solve great part of the problems if it works easy and simple. WebAssembly (WASM) was thought as a way to replace Javascript on browsers, but the neat thing is that creates code that can be run from any programming language and is independent of the CPU target.

I haven’t explored this myself, but if it can deliver what it promises, it means that you only need to build Rust code once and bundle the WASM. This should work on all CPUs and Python interpreters.

The problem I believe it is that the WASM loader for Python will need to be compiled for each combination of CPU, OS and Python interpreter. It’s far from perfect, but at least, it’s easier to get a small common library to support everything, and then other libraries or code to build on top of it. So this could relief some maintenance problems from other libraries by diverting that work onto WASM maintainers.

Other possible problem is that WASM will have it hard to do any stuff that it’s not strictly CPU computing. For example, if it has to manage sockets, files, communicate with the OS, etc. As WASM was designed to be run inside a browser, I expect that all OS communication would require a common API, and that will have some caveats for sure. While the tasks I mentioned before I expect them to be usable from WASM, things like OpenGL and directly communicating with a GPU will surely have a lack of support for long time.

What Rust Lacks

While most people will think that Rust needs to be easier to code, that it is a complex language that it requires a lot of human hours to get the code working, let me heavily disagree.

Rust is one of the most pleasant languages to code on when you have the expertise on the language. It is quite productive almost on the level of Python and very readable.

The problem is gaining this expertise. Takes way too much effort for newcomers, especially when they are already seasoned on dynamic-typed languages.

An easier way to get started in Rust

And I know that this has been said a lot by novice people, and it has been discussed ad-infinitum: We need a RustScript language.

For the sake of simplicity, I named RustScript to this hypothetical language. To my knowledge, this name is not used and RustScript does not exist, even if I sound like it does.

As I read about others proposing this, please keep reading as I already know more or less what has been proposed already and some of those discussions.

The main problem with learning Rust is the borrow-checking rules, (almost) everyone knows that. A RustScript language must have a garbage collector built in.

But the other problem that is not so talked about is the complexity of reading and understanding properly Rust code. Because people come in, try a few things, and the compiler keeps complaining everywhere, they don’t get to learn the basic stuff that would allow them to read code easily. These people will struggle even remembering if the type was f32, float or numeric.

A RustScript language must serve as a bootstrapping into Rust syntax and features of the language, while keeping the hard/puzzling stuff away. In this way, once someone is able to use RustScript easily, they will be able to learn proper Rust with a smaller learning curve, feeling familiar already, and knowing how the code should look like.

So it should change this learning curve:

Into something like this:

Here’s the problem: Rust takes months of learning to be minimally productive. Without knowing properly a lot of complex stuff, you can’t really do much with it, which becomes into frustration.

Some companies require 6 months of training to get productive inside. Do we really expect them also to increase that by another 6 months?

What it’s good about Python it’s that newcomers are productive from day zero. Rust doesn’t need to target this, but the current situation is way too bad and it’s hurting its success.

A lot of programming languages and changes have been proposed or even done but fail to solve this problem completely.

This hypothetical language must:

  • Include a Garbage Collector (GC) or any other solution that avoids requiring a borrow checker.
    Why? Removing this complexity is the main reason for RustScript to exist.
  • Have almost the same syntax as Rust, at least for the features they have in common.
    Why? Because if newcomers don’t learn the same syntax, then they aren’t doing any progress towards learning Rust.
  • Binary and Linker compatible with Rust; all libraries and tooling must work inside RustScript.
    Why? Having a complete different set of libraries would be a headache and it will require a complete different ecosystem. Newcomers should familiarize themselves with Rust libraries, not RustScript specific ones.
  • Rust sample code must be able to be machine-translated into RustScript, like how Python2 can be translated into Python3 using the 2to3 tool. (Some things like macro declarations might not work as they might not have a replacement in RustScript)
    Why? Documentation is key. Having a way to automatically translate your documentation into RustScript will make everyone’s life easier. I don’t want this guessing the API game that happens in PyQT.
  • Officially supported by the Rust team itself, and bundled with Rust when installing via RustUp.
    Why? People will install Rust via RustUp. Ideally, RustScript should be part of it, allowing for easy integration between both languages.

Almost any of these requirements alone is going to be hard to do. Getting a language that does everything needed with all the support… it’s not something I expect happening, ever.

I mean, Python has it easier. What I would ask to Python is way more realizable that what I’m asking here, and yet in 10 years there’s just slight changes in the right direction. With that in mind, I don’t expect Rust to ever have a proper RustScript, but if it happens, well, I would love to see it.

What would be even better is that RustScript were almost a superset of Rust, making Rust programs mostly valid in RustScript, with few exceptions such as macro creation. This would allow developers to incrementally change to Rust as they see fit, and face the borrow checker in small amounts, that are easy to digest. But anyway, having to declare a whole file or module as RustScript would still work, as it will allow devs to migrate file by file or module by module. That’s still better than having to choose between language X or Y for a full project.

Anyway, I’d better stop talking about this, as it’s not gonna happen, and it would require a full post (or several) anyways to describe such a language.

Proper REPL

Python is really good on it’s REPL, and a lot of tools make use of this. Rust REPL exist, but not officially supported, and they’re far from perfect.

A REPL is useful when doing ML and when trying out small things. The fact that Rust needs to compile everything, makes this quite useless as it needs boilerplate to work and every instruction takes time to get built interactively.

If Rust had a script language this would be simpler, as a REPL for scripting languages tends to be straightforward.

Simpler integration with C++ libraries

Given that both Rust and Python integrate only with C and not C++ would make anyone think that they are on the same level here; but no. Because Python’s OOP is quite similar to C++ and it’s magic can make for the missing parts (method overloading), in the end Python has way better integration with C++ than Rust.

There are a lot of ongoing efforts to make C++ integration easier in Rust, but I’m not that sure if they will get at any point something straightforward to use. There’s a lot of pressure on this and I expect it to get much, much better in the next years.

But still, the fact that Rust has strict rules on borrowing and C++ doesn’t, and C++ exceptions really don’t mix with anything else in Rust, it will make this hard to get right.

Maybe the solution is having a C++ compiler written in Rust, and make it part of the Cargo suite, so the sources can be copied inside the project and build the library for Rust, entirely using Rust. This might allow some extra insights and automation that makes things easier, but C++ is quite a beast nowadays, and having a compiler that supports the newest standards is a lot of work. This solution would also conflict with Linux distributions, as the same C++ library would need to be shipped twice in different versions, a standard one and a Rust-compatible one.

Lack of binary libraries and dynamic linking

All Rust dependencies currently rely on downloading and building the sources for each project. Because there so many dependencies, building a project takes a long time. And distributing our build means installing a big binary that contains everything inside. Linux distributions don’t like this.

Having pre-built libraries for common targets it would be nice, or if not a full build, maybe a half-way of some sort that contains the most complex part done, just requiring the final optimization stages for targeting the specific CPU; similar to what WASM is, *.pyc or the JVM. This would reduce building times by a huge amount and will make development more pleasant.

Dynamic linking is another point commonly overlooked. I believe it can be done in Rust but it’s not something that they explain on the regular books. It’s complex and tricky to do, where the regular approach is quite straightforward. This means that any update on any of your libraries require a full build and a full release of all your components.

If an automated way existed to do this in Cargo, even if it builds the libraries in some format that can’t be shared across different applications, it could already have some benefits from what we have. For example, the linking stage could take less time, as most of the time seems to be spent trying to glue everything together. Another possible benefit is that as it will produce N files instead of 1 (let’s say 10), if your application has a way to auto-update, it could update selectively the files needed, instead of re-downloading a full fat binary.

To get this to work across different applications, such as what Linux distributions do, the Rust compiler needs to have better standards and compatibility between builds, so if one library is built using rustc 1.50.0 and the application was built against 1.49.0, they need to work. I believe currently this doesn’t work well and there are no guarantees for binary compatibility across versions. (I might be wrong)

On devices where disk space and memory is constrained, having dynamic libraries shared across applications might help a lot fitting the different projects on such devices. Those might be microcontrollers or small computers. For our current desktop computers and phones, this isn’t a big deal.

The other reason why Linux distributions want these pieces separated is that when a library has a security patch, usually all it takes is to replace the library on the filesystem and you’re safe. With Rust applications you depend on each one of the maintainers of each project to update and release updated versions. Then, a security patch for an OS instead of being, say, 10MiB, it could be 2GiB because of the amount of projects that use the same library.

No officially supported libraries aside of std

In a past article Someone stop NodeJS package madness, please!!, I talked about how bad is the ecosystem in JavaScript. Because everyone does packages and there’s no control, there’s a lot of cross dependency hell.

This can happen to Rust as it has the same system. The difference is that Rust comes with “std”, which contains a lot of common tooling that prevents this from getting completely out of hand.

Python also has the same in PyPI, but turns out that the standard Python libraries cover a lot more functionality than “std”. So PyPI is quite saner than any other repository.

Rust has its reasons to have a thin std library, and probably it’s for the best. But something has to be done about the remaining common functionality that doesn’t cover.

There are lots of solutions. For example, having a second standard library which bundles all remaining common stuff (call it “extra_std” or whatever), then everyone building libraries will tend to depend on that one, instead of a myriad of different dependencies.

Another option is to promote specific libraries as “semi-official”, to point people to use these over other options if possible.

The main problem of having everyone upload and cross-depend between them is that these libraries might have just one maintainer, and that maintainer might move on and forget about these libraries forever; then you have a lot of programs and libraries depending on it unaware that it’s obsolete from long ago. Forking the library doesn’t solve the problem because no one has access to the original repo to say “deprecated, please use X”.

Another problem are security implications from doing this. You depend on a project that might have been audited on the past or never, but the new version is surely not audited. In which state is the code? Is it sound or it abuses unsafe to worrying levels? We’ll need to inspect it ourselves and we all know that most of us would never do that.

So if I were to fix this, I would say that a Rust committee with security expertise should select and promote which libraries are “common” and “sane enough”, then fork them under a slightly different name, do an audit, and always upload audited-only code. Having a group looking onto those forked libraries means that if the library is once deprecated they will correctly update the status and send people to the right replacement. If someone does a fork of a library and then that one is preferred, the security fork should then migrate and follow that fork, so everyone depending on it is smoothly migrated.

In this way, “serde” would have a fork called something like “serde-audited” or “rust-audit-group/serde”. Yes, it will be always a few versions behind, but it will be safer to depend on it than depending on upstream.

No introspection tooling in std

Python is heavy on introspection stuff and it’s super nice to automate stuff. Even Go has some introspection capabilities for their interfaces. Rust on the other hand needs to make use of macros, and the sad part is that there aren’t any officially supported macros that makes this more or less work. Even contributed packages are quite ugly to use.

Something that tends to be quite common in Python is iterating through the elements of a object/struct; their names and their values.

I would like to see a Derive macro in std to add methods that are able to list the names of the different fields, and standardize this for things like Serde. Because if using Serde is overkill for some program, then you have to cook these macros yourself.

The other problem is the lack of standard variadic types. So if I were to iterate through the values/content of each field, it becomes toilsome to do and inconvenient, because you need to know in advance which types you might receive and how, having to add boilerplate to support all of this.

The traits also lack some supertraits to be able to classify easily some variable types. So if you want a generic function that works against any integer, you need to figure out all the traits you need. When in reality, I would like to say that type T is “int-alike”.

Personal hate against f32 and f64 traits

This might be only me, but every time I add a float in Rust makes my life hard. The fact that it doesn’t support proper ordering and proper equality makes them unusable on lots of collection types (HashMaps, etc).

Yes, I know that these types don’t handle equality (due to imprecision) and comparing them is also tricky (due to NaN and friends). But, c’mon… can’t we have a “simple float”?

On some cases, like configs, decimal numbers are convenient. I wouldn’t mind using a type that is slower for those cases, that more or less handles equality (by having an epsilon inbuilt) and handles comparison (by having a strict ordering between NaN and Inf, or by disallowing it at all).

This is something that causes pain to me every time I use floats.

Why I think Rust will not replace Python

Take into account that I’m still learning Rust, I might have missed or be wrong on some stuff above. One year of practising on my own is not enough to have enough context for all of this, so take this article with a pinch of salt.

Rust is way too different to Python. I really would like Rust to replace my use on Python but seeing there are some irreconcilable differences makes me believe that this will never happen.

WASM might be able to bridge some gaps, and Diesel and other ORM might make Rust a better replacement of Python for REST APIs in the future.

On the general terms I don’t see a lot of people migrating from Python to Rust. The learning curve is too steep and for most of those replacements Go might be enough, and therefore people would skip Rust altogether. And this is sad, because Rust has a lot of potentials on lots of fronts, just requires more attention than it has.

I’m sad and angry because this isn’t the article I wanted to write. I would like to say that Rust will replace Python at some point, but if I’m realistic, that’s not going to happen. Ever.

References

https://blog.logrocket.com/rust-vs-python-why-rust-could-replace-python/

https://www.reddit.com/r/functionalprogramming/comments/kwgiof/why_do_you_think_data_scientists_prefer_python_to/glzce8e/?utm_source=share&utm_medium=web2x&context=3

Benchmarking Python vs PyPy vs Go vs Rust

Since I learned Go I started wondering how well it performs compared to Python in a HTTP REST service. There are lots and lots of benchmarks already out there, but the main problem on those benchmarks is that they’re too synthetic; mostly a simple query and far from real world scenarios.

Some frameworks like Japronto exploit this by making the connection and the plain response blazing fast, but of course, as soon as you have to do some calculation (and you have to, if not what’s the point on having a server?) they fall apart pretty easily.

To put a baseline here, Python is 50 times slower than C++ on most benchmarks, while Go is 2-3 times slower than C++ on those and Rust some times even beats C++.

But those benchmarks are pure CPU and memory bound for some particular problems. Also, the people who submitted the code did a lot of tricks and optimizations that will not happen on the code that we use to write, because safety and readability is more important.

Other type of common benchmarks are the HTTP framework benchmarks. In those, we can get a feel of which languages outperform to others, but it’s hard to measure. For example in JSON serialization Rust and C++ dominate the leader board, with Go being only 4.4% slower and Python 10.6% slower.

In multiple queries benchmark, we can appreciate that the tricks used by the frameworks to “appear fast” no longer are useful. Rust is on top here, C++ is 41% slower, and Go is 43.7% slower. Python is 66.6% slower. Some filtering can be done to put all of them in the same conditions.

While in that last test which looks more realistic, is interesting to see that Python is 80% slower, which means 5x from Rust. That’s really really far better from the 50x on most CPU benchmarks that I pointed out first. Go on the other hand does not have any benchmark including any ORM, so it’s difficult to compare the speed.

The question I’m trying to answer here is: Should we drop Python for back-end HTTP REST servers? Is Go or Rust a solid alternative?

The reasoning is, a REST API usually does not contain complicated logic or big programs. They just reply to more or less simple queries with some logic. And then, this program can be written virtually with anything. With the container trend, it is even more appealing to deploy built binaries, as we no longer need to compile for the target machine in most cases.

Benchmark Setup

I want to try out a crafted example of something slightly more complicated, but for now I didn’t find the time to craft a proper thing. For now I have to fall back into the category of “too synthetic benchmarks” and release my findings up to this point.

The base is to implement the fastest possible for the following tests:

  • HTTP “Welcome!\n” test: Just the raw minimum to get the actual overhead of parsing and creating HTTP messages.
  • Parse Message Pack: Grab 1000 pre-encoded strings, and decode them into an array of dicts or structs. Return just the number of strings decoded. Aims to get the speed of a library decoding cache data previously serialized into Redis.
  • Encode JSON: Having cached the previous step, now encode everything as a single JSON. Return the number characters in the final string. Most REST interfaces will have to output JSON, I wanted to get a grasp how fast is this compared to other steps.
  • Transfer Data: Having cached the previous step, now send this data over HTTP (133622 bytes). Sometimes our REST API has to send big chunks over the wire and it contributes to the total time spent.
  • One million loop load: A simple loop over one million doing two simple math operations with an IF condition that returns just a number. Interpreted languages like Python can have huge impact here, if our REST endpoint has to do some work like ORM do, it can be impacted by this.

The data being parsed and encoded looks like this:

{"id":0,"name":"My name","description":"Some words on here so it looks full","type":"U","count":33,"created_at":1569882498.9117897}

The test has been performed on my old i7-920 capped at 2.53GHz. It’s not really rigorous, because I had to have some applications open while testing so assume a margin of error of 10%. The programs were done by minimal effort possible in each language selecting the libraries that seemed the fastest by looking into several benchmarks published.

Python and PyPy were run under uwsgi, sometimes behind NGINX, sometimes with the HTTP server included in uwsgi; whichever was faster for the test. (If anyone knows how to test them with less overhead, let me know)

The measures have been taken with wrk:

$ ./wrk -c 256 -d 15s -t 3 http://localhost:8080/transfer-data

For Python and PyPy the number of connections had to be lowered to 64 in order to perform the tests without error.

For Go and Rust, the webserver in the executables was used directly without NGINX or similar. FastCGI was considered, but seems it’s slower than raw HTTP.

Python and PyPy were using Werkzeug directly with no url routing. I used the built-in json library and msgpack from pip. For PyPy msgpack turned out to be awfully slow so I switched to msgpack_pypy.

Go was using “github.com/buaazp/fasthttprouter” and “github.com/valyala/fasthttp” for serving HTTP with url routing. For JSON I used “encoding/json” and for MessagePack I used “github.com/tinylib/msgp/msgp”.

For Rust I went with “actix-web” for the HTTP server with url routing, “serde_json” for JSON and “rmp-serde” for MessagePack.

Benchmark Results

As expected, Rust won this test; but surprisingly not in all tests and with not much difference on others. Because of the big difference on the numbers, the only way of making them properly readable is with a logarithmic scale; So be careful when reading the following graph, each major tick means double performance:

Here are the actual results in table format: (req/s)


HTTPparse mspencode jsontransfer data1Mill load
Rust128747.615485.435637.2019551.831509.84
Go116672.124257.063144.3122738.92852.26
PyPy26507.691088.88864.485502.14791.68
Python21095.921313.93788.767041.1620.94

Also, for the Transfer Data test, it can be translated into MiB/s:


transfer speed
Rust2,491.53 MiB/s
Go2,897.66 MiB/s
PyPy701.15 MiB/s
Python897.27 MiB/s

And, for the sake of completeness, requests/s can be translated into mean microseconds per request:


HTTPtransfer dataparse mspencode json1Mill load
Rust7.7751.15182.30177.39662.32
Go8.5743.98234.90318.031,173.35
PyPy37.72181.75918.371,156.761,263.14
Python47.40142.02761.081,267.8147,755.49

As per memory footprint: (encoding json)

  • Rust: 41MB
  • Go: 132MB
  • PyPy: 85MB * 8proc = 680MB
  • Python: 20MB * 8proc = 160MB

Some tests impose more load than others. In fact, the HTTP only test is very challenging to measure as any slight change in measurement reflects a complete different result.

The most interesting result here is Python under the tight loop; for those who have expertise in this language it shouldn’t be surprising. Pure Python code is 50x times slower than raw performance.

PyPy on the other hand managed under the same test to get really close to Go, which proves that PyPy JIT compiler actually can detect certain operations and optimize them close to C speeds.

As for the libraries, we can see that PyPy and Python perform roughly the same, with way less difference to the Go counterparts. This difference is caused by the fact that Python objects have certain cost to read and write, and Python cannot optimize the type in advance. In Go and Rust I “cheated” a bit by using raw structs instead of dynamically creating the objects, so they got a huge advantage by knowing in advance the data that they will receive. This implies that if they receive a JSON with less data than expected they will crash while Python will be just fine.

Transferring data is quite fast in Python, and given that most API will not return huge amounts of it, this is not a concern. Strangely, Go outperformed Rust here by a slight margin. Seems that Actix does an extra copy of the data and a check to ensure UTF-8 compatibility. A low-level HTTP server probably will be slightly faster. Anyway, even the slowest 700MiB/s should be fine for any API.

On HTTP connection test, even if Rust is really fast here, Python only takes 50 microseconds. For any REST API this should be more than enough and I don’t think it contributes at all.

On average, I would say that Rust is 2x faster than Go, and Go is 4x faster than PyPy. Python is from 4x to 50x slower than Go depending on the task at hand.

What is more important on REST API is the library selection, followed by raw CPU performance. To get better results I will try to do another benchmark with an ORM, because those will add a certain amount of CPU cycles into the equation.

A word on Rust

Before going all the way into developing everything in Rust because is the fastest, be warned: It’s not that easy. Of all four languages tested here, Rust was by far, the most complex and it took several hours for me, untrained, to get it working at the proper speed.

I had to fight for a while with lifetimes and borrowing values; I was lucky to have the Go test for the same, so I could see clearly that something was wrong. If I didn’t had these I would had finished earlier and call it a day, leaving code that copies data much more times than needed, being slower than regular Go programs.

Rust has more opportunities and information to optimize than C++, so their binaries can be faster and it’s even prepared to run on crazier environments like embedded, malloc-less systems. But it comes with a price to pay.

It requires several weeks of training to get some proficiency on it. You need also to benchmark properly different parts to make sure the compiler is optimizing as you expect. And there is almost no one in the market with Rust knowledge, hiring people for Rust might cost a lot.

Also, build times are slow, and in these test I had always to compile with “–release”; if not the timings were horribly bad, sometimes slower than Python itself. Release builds are even slower. It has a nice incremental build that cuts down this time a lot, but changing just one file requires 15 seconds of build time.

Its speed it’s not that far away from Go to justify all this complexity, so I don’t think it’s a good idea for REST. If someone is targeting near one million requests per second, cutting the CPU by half might make sense economically; but that’s about it.

Update on Rust (January 18 2020): This benchmark used actix-web as webserver and it has been a huge roast recently about their use on “unsafe” Rust. I’m had more benchmarks prepared to come with this webserver, but now I’ll redo them with another web server. Don’t use actix.

About PyPy

I have been pleased to see that PyPy JIT works so well for Pure Python, but it’s not an easy migration from Python.

I spent way more time than I wanted on making PyPy work properly for Python3 code under uWSGI. Also I found the problem with MsgPack being slow on it. Not all Python libraries perform well in PyPy, and some of them do not work.

PyPy also has a high load time, followed by a warm-up. The code needs to be running a few times for PyPy to detect the parts that require optimization.

I am also worried that complex Python code cannot be optimized at all. The loop that was optimized was really straightforward. Under a complex library like SQLAlchemy the benefit could be slim.

If you have a big codebase in Python and you’re wiling to spend several hours to give PyPy a try, it could be a good improvement.

But, if you’re thinking on starting a new project in PyPy for performance I would suggest looking into a different language.

Conclusion: Go with Go

I managed to craft the Go tests in no time with almost no experience with Go, as I learned it several weeks ago and I only did another program. It takes few hours to learn it, so even if a particular team does not know it, it’s fairly easy to get them trained.

Go is a language easy to develop with and really productive. Not as much as Python is, but it gets close. Also, it’s quick build times and the fact that builds statically, makes very easy to do iterations of code-test-code, being attractive as well for deployments.

With Go, you could even deploy source code if you want and make the server rebuild it each time that changes if this makes your life easier, or uses less bandwidth thanks to tools like rsync or git that only transfer changes.

What’s the point of using faster languages? Servers, virtual private servers, server-less or whatever technology incurs a yearly cost of operation. And this cost will have to scale linearly (in the best case scenario) with user visits. Using a programming language, frameworks and libraries that use as less cycles and as less memory as possible makes this year cost low, and allows your site to accept way more visits at the same price.

Go with Go. It’s simple and fast.

Why should you learn Go for your next project

I have been hearing about Go for long time and along with Rust is one of the two new programming languages that seem to be gaining some attention in the last years.

After learning Go it seems to me a good alternative to other programming languages because is simple, beautiful, hassle free, and fast compared to everything else that is not C, C++ or Rust. Its simplicity is really appealing because you can start small and grow as big as you want.

Go is ideal for containerized apps and websites. Runs faster than other popular alternatives for web, with a small memory footprint, and their executables have no dependencies. It’s blazing fast to compile, so iterating with new versions and deploying is almost as fast as interpreted languages.

Being so simple to understand, it requires a very small training to be able to be productive in Go.

Comparing Rust with Go

  • Go is developed by Google. Rust is developed by Mozilla.
  • Go use case is a more practical C or more scalable Python. Rust is a high performance alternative to C++.
  • Both have garbage collection.
  • Both are compiled.
  • Go is productive. Rust is fast.

In short, Rust would be a better option where speed is key as it is as fast as C++; Also, it has some features that aim to have cost-less abstractions, so it looks promising for big complex projects like browsers. It should be able to deliver the same speed as C++ with less complexity on the code.

Five reasons to use Go

  • Simple and beautiful: Go is easy to read and easy to write
  • Runs fast: Go is faster than JavaScript and Python, comparable to Java.
  • Compiles fast: As far as I know, Go is the fastest language on compiling times. It’s one of its main purposes. It also includes an “interpreter”, so you can run go programs from source to avoid compiling while developing.
  • Static typing: When a program grows large, having static types may help a lot, also on safety purposes. Go is statically typed, so your programs can grow while staying safe that they will run as you expect.
  • Explicit but terse: Go language is explicit, so the meaning is clearly conveyed in the programs. But at the same time is terse, so we don’t spend much time writing Go.

Is Go better than Python?

While Go does not have the flexibility and magic that Python has, still has basic primitives for flexible arrays and maps (dictionaries) so with Go we don’t lose as much proficiency as in other compiled languages.

But Go is way faster than Python, so if our program has to do custom complex calculations, Go can be 30 times faster than Python. But beware, as Python links their libraries to plain C, in some (very) specific use cases it could beat Go or other languages.

There’s no much difference on the development cycles from Go to Python. Go also has the ability to run the programs on the fly, so the sequence of code-try-code is equally fast.

Go produces final binaries for the platform, and statically linked. So, for distributing, you don’t need to distribute sources (this can be good or bad depending on your point of view). But you also have to build different binaries for different platforms. Because is statically linked, there’s no need to account for the different Linux distributions, so the same binary should run across all Unix flavours that support ELF format and run the same architecture.

For distributing on Windows, Go could be easier as just produces an executable and runs across many Windows platforms while on Python you have to care on packaging for Windows and test it properly; or tell the user to install the whole development stack which is a hassle.

The main disadvantage of Go vs Python is that Go tries to statically compile everything, so the behavior of code is set at compile time. There are Go interfaces which can help creating this kind of “magic” abstractions that change behavior depending on the scenario, but aside of that it’s a bit limited. In contrast, Python is much more flexible.

Is Go better than C++ or Java?

While C++ and Java are more feature-rich, Go is simplified and more productive. Also Java tends to be memory-hungry, so Go it’s useful to run programs in constrained memory and disk requirements.

Because Go statically links everything inside, its executables will be bigger than their C++ counterpart, but still way smaller than Java as you have to carry the JVM and libraries which use a lot of disk space. This makes Go an excellent candidate for containerized applications.

The downsides of Go is the lack of abstractions, and is slower than C++; being more or less as fast as Java (but still a bit slower in some scenarios).

Go is strongly opinionated

While learning Go for the first time I found a lot of things surprising. For me Go is the plain old C language with a new Pythonic style. I like both C and Python a lot so I see a lot of influence from both languages in Go.

When designing Go they weren’t scared of breaking the rules, it is clear that they have a strong opinion on how things should be done. In the same sense that Python wanted to specifically ditch the braces for blocks and the “switch” statement, they had clear that they don’t want classes (on the usual OOP approach) and they don’t want exceptions.

Unlike Python, it doesn’t use whitespace for blocks and uses the classic braces and has an extended switch statement.

The braces don’t need any explanation, but the switch deserves a mention. The main problem of a switch statement is, by default, it follows from one case to the following, causing unintended bugs.

In Go they solved it by going the other way around: By default each case is independent unless you add the keyword “fallthrough”. This makes this construct less bug-prone and terser in the common case:

There are also special types of switch: with no condition so all conditions are in cases; for variable type detection; and finally for processing many asynchronous events at once.

As said before, there are no exceptions in Go. You’re expected to return the error using a conventional return statement, so the common approach is to return a tuple of (value, error) or (value, ok). This comes from C where we used to encode errors in the return value. But Go makes this way simpler by allowing tuples of values to be returned.

It also has an error primitive that can be used easily to convey error messages as text, and it can be extended to your needs.

This means that your code should be checking for error codes explicitly. Failing to do so means that the code will continue running using a default value instead. It does not fail the execution.

Go programs can fail completely as well and stop execution. This is known as panicking. Panic can be started from an internal call or manually by the programmer by using the “panic” function. So instead of throwing errors, you can just panic. In this case, checks are caller responsibility.

Now functions have two ways of exiting, returning and panicking. So they added a “defer” statement to compute cleanups at the end of the function. This is useful because it is run regardless of how or when the function exits.

Panicking unrolls the stack, calling all defer statements in the way. It is still possible to avoid the program from crashing by using the recover keyword. This actually looks like a flavor of try..catch, but is not recommended in Go. Although less performant than error codes, it can be clearer or easier to reason in some cases.

Going back to classes and object-oriented programming. Go does not have classes but it has some of the object-oriented ideas implemented, again, in a more flavored C style.

They use structs and interfaces. Structs are like regular C structs, so no code, just data. They can be inherited in the same sense as in C, stacking one in top of the other. This is called “struct embedding” and Go adds syntactic sugar to help this:

Multiple embedding is possible, just stacks the structs one in top of the other, much in the style of C. So no diamond problem, if a name appears twice, it will be stored twice.

The code for those is held outside of the struct, by defining functions for types. Much like Python, self/this is declared explicitly:

And then, there are interfaces to be able to write code that manages diverse types at once. It might resemble to Python duck typing, Java interfaces, C++ virtual functions, etc. But it is a thing on its own.

Interfaces define a set of methods that must exist in a type. A type does not declare if adheres to any interface. The fact that the type has all methods is enough to be able to use it for the said interface. So in this sense, it resembles duck typing.

And finally, Go is one of the few programming languages that I know of that is compiled and supports UTF-8 natively.

Why Java is faster than Python

And why C is faster than Java.

On some of the things you’ll hear, there is the classic “there is no faster or slower languages, it depends on what purpose you want to use it, some languages are better fit than others”. While part is true (some languages are better fit for some tasks than others), the other part is false. There are faster and slower languages. That is a fact.

The other thing I heard is “Java is a compiled language and Python is interpreted, therefore, Java is much faster”. Also false. Java is interpreted as Python, or Python compiled as Java. Both languages compile (or transpile) to bytecode, and a interpreter then executes those instructions.

But Java has JIT! Well, and PyPy also does have JIT. So what?

And Java is also not older than Python, they’re both same age more or less.

Regardless of any of those typical comments and counter-arguments, the fact is that Java is 4x slower than C and Python is 40x slower than Java (more or less, depends greatly on the benchmark). (And the old C does not have JIT, hah!)

So, why Java is faster than Python?

It’s just because Java leverages way more work and responsibility on the developer than Python. This is just a trade-off between computer performance and developer performance. How is this possible?

First, we have to understand what compilers do and how optimizations work. The compiler serves basically one fundamental purpose and it’s not generating an executable or bytecode. The purpose is to solve as much work as possible beforehand. As a side-effect, it has to write an executable or bytecode that could be read later to follow the instructions. Just to be clear, the compiler job is to remove complexity and uncertainty from the source code and write an output that is as dumb-stupid as possible so it can be followed blindly. The more stuff you remove, the faster it gets later.

Which kind of stuff we can remove or simplify for later? Well, the first step is the parsing stage, parsing a source file takes a lot of time; so the most basic compiler would read the code and output a binary abstract syntax tree that can be loaded onto memory really quick. Then, would be doing simple math stuff ahead of time. Also removing dead code.

But from here it gets tricky. For the CPU handling the instructions we need to know what we’re doing ahead of time, which data types, sizes, expected result types and so. Depending on the language this might be possible but it is not the case for Python. It has way too much abstraction and craziness inbuilt that we could never know what to expect at a particular part of the program. Every variable, even if it looks simple, can be replaced by a different thing by mocking or other kind of weird techniques in Python. So the only way around is to wrap the Python values in a complex structure that can track all those changes. In doing so, we lose all running performance in favour of being a friendlier (and crazier) language.

Java has static typing and this helps a lot translating all instructions into real CPU instructions. But for that step to be done we need a Just In Time compilation (JIT); if not, we would still feed the instructions one by one using the interpreter.

But C is faster. Its trick is moving more burden from the language to the developer. In this case, we not only require the developer to type everything and define every behaviour ahead of time; we also require the developer to have responsibility on the memory access and on the program behaviour. So in C, if a program closes unexpectedly is never C fault, it’s always the developer responsibility to check everything.

This in C is called “Undefined Behaviour” and describes those grey areas where the C compiler just “it doesn’t care”, it will assume everything is good and optimize as much as possible. The C compiler can even replace function calls with the expected result on the final executable file. It can also decide to “unroll” a function into the caller because it believes that is the same result, and faster.

So in short, the flexibility of a language and its “auto-magic” has huge trade-offs in performance. Writing a Java interpreter or compiler that is faster than Python should be easy. But writing a Python interpreter that could be fast enough to be compared to Java is almost impossible.

I want to add a special mention to JavaScript, probably the most hated language lately. Being a bit less auto-magic than Python and having huge efforts to implement faster JS engines in browsers has led JavaScript to have JIT and lots of sorts of optimizations, leaving us with one of the fastest interpreted languages that exists up to now:

https://benchmarksgame-team.pages.debian.net/benchmarksgame/faster/javascript.html

JavaScript lacks thread support, because browsers don’t want scripts to mess with them for security and stability reasons. Still, if you account that most of those benchmarks I shared earlier Java was using all CPUs and JavaScript only one, it seems that JavaScript performance is somewhat close to Java, which is impressive.

Still I would not recommend (yet) JavaScript for server-side applications. But nonetheless it is an interesting outcome of being the only standard scripting language over the web.

Infraestructura de desarrollo web con python

Feliz año nuevo! Empezamos el 2019 y he querido rescatar un artículo que tenía pendiente hace bastante. Cada vez más veo más gente pensando en hacer desarrollo web con Python en vez de con PHP.

¿Cual es la ventaja de Python sobre PHP? PHP nació anticuado y por mucho que intentan mejorarlo, las bases sobre las que se fomentan son arenas movedizas como en Javascript. Python es mucho más robusto y más seguro que PHP.

Pero las cosas claras, en cuanto a velocidad del lenguaje per-sé, Python es varias veces más lento que PHP. Y si ya lo era antes, ahora que PHP en su versión 7 ha mejorado la velocidad muchísimo, ciertamente Python se queda muy atrás en este tema.

Aunque ya he dicho un montón de veces, Python es rápido al final por el ecosistema en sí, si se sabe usar correctamente. Todos los llenguajes son lentos si se usan mal; por mucho que trabajes en C++, he visto programas en Python hacer lo mismo en menos tiempo, únicamente porque estaban mejor hechos. Y éste lenguaje facilita mucho hacer las cosas bien.

PHP por otro lado tiende a fallar, perder memoria por el camino y la plataforma que tiene para funcionar vía web hace que tenga un coste significativo sólo lanzar el programa. Como programador web os puede gustar (a gustos colores), pero como Dev-Ops o Administrador de Sistemas sólo puedes odiarlo. Es bastante coñazo de mantener un sistema estable en un servidor web con varias páginas complejas.

Si en este año nuevo os estáis planteando probar Python para web, os comento por donde empezar:

uWSGI

Python se conecta al servidor web (Apache, Nginx, etc) de muchas formas. Por defecto la mayoría de frameworks levantan un mini servidor web que podéis conectar vía Proxy HTTP. Pero esto sólo es recomendable para sistemas de desarrollo en local.

En servidores tenemos FastCGI y uWSGI. El segundo es más nuevo, fácil de configurar y más rápido. Por ejemplo en Apache tenéis mod_uwsgi y es sencillo:

ProxyPass /foo uwsgi://127.0.0.1:3032/

Además Python no es como PHP en cuanto a la ejecución de los ficheros. Los programas Python no se dejan en la misma carpeta donde están las imágenes. En nuestro caso hay una clara separación entre lo que son programas y datos. Si vienes de PHP puede parecer un engorro comparado con dejar el fichero x.php donde te plazca, pero en cuanto a seguridad es muchísimo mejor; no hay ninguna posibilidad de que el código fuente sea visible desde la web por accidente y ficheros subidos a la web no se pueden ejecutar nunca.

Comparad esto con PHP, donde a veces basta con subir un adjunto con extensión “.php” y luego ir a la URL.

Flask y Django

Si váis a empezar con Python lo mejor es que uséis un Framework donde ya venga todo. Si os gustan los frameworks y/o empezar desde una base que te guíe, Django es ideal. Si prefieres algo más parecido al PHP en crudo, Flask es muy minimalista y te deja trabajar a tu gusto. Se puede bajar aún más i ir a Werkzeug, pero habiendo probado los tres creo que Flask es la mejor recomendación.

Django está bien si quieres una estructura y poder instalar módulos de terceros que hay un montón. Pero si eres organizado y puedes crear cosas rápido, Flask es flexible y más rápido que Django.

Además estos proyectos tienen todos documentación de cómo conectarlos con servidores web correctamente, cuales son los estándares de seguridad a seguir, etcétera.

PostgreSQL

Para base de datos, he trabajado un montón con MySQL, otro tanto con PostgreSQL y también con SQLServer.

SQLServer no lo recomiendo para nada. MySQL es la solución típica para webs, pero lo único realmente bueno que tiene es una caché integrada en la base de datos. PostgreSQL supera a ambas y no por poco. Es excelente, llena de funcionalidades y robusta a más no poder. La única pega para web es que no tiene una caché integrada, por lo que si lanzas la misma consulta 40 veces, se ejecutará 40 veces. Lo que hay que hacer es cachear nosotros lo que nos interese para evitar esto

Las bases de datos NoSQL no hace falta ni que las miréis. Las pocas ventajas que puedan tener sobre una SQL, o son funcionalidades que puedes tener en PostgreSQL, o no le sacas partido porque no tienes suficiente cantidad de datos para que valga la pena. En mis pruebas PostgreSQL era más del doble de rápido que MongoDB (la más rápida) bajo las mismas condiciones. Y luego las NoSQL no son ni consistentes ni confiables, parte de su rapidez viene en que pueden perder datos.

Sqlalchemy

A la hora de acceder a las bases de datos, los ORM son muy recomendables porque su abstracción permite que un mismo código pueda trabajar con los datos sin saber de dónde vienen. Además simplifican mucho el desarrollo y mejoran la lectura.

La pega de los ORM (de todos) es que agregan una carga extra al procesar los datos. Después de revisar y probar SqlAlchemy me pareció que daba un buen resultado. Comparado con el ORM de Django, tiene más funcionalidades y es más rápido.

Hay otros (menos funcionales), pero como no los he podido probar no sé decir si son más rápidos.

Nginx

Evitad usar Apache, y si lo usáis al menos que no sea en Prefork. Nginx es mi favorito como Sysadmin ya que no da dolores de cabeza, funciona, tiene lo que necesito y es extremadamente rápido. Los servidores con Nginx siempre me han funcionado extremadamente bien.

Redis

Que queréis un servidor NoSQL, pues tenéis Redis. Que necesitáis una caché, pues Redis. Redis es ideal para datos temporales, de alta frecuencia de acceso y escritura. La pega es que en seguridad va corto, como todos los servidores de este tipo. Así que si lo usáis, procurad que sólo tengan acceso los programas que deben tenerlo.

Redis se puede configurar con casi todos los frameworks, lenguajes de programación o incluso muchos CMS. Y además soporta clústering y sharding! ¿Qué más podéis pedir?

Ansible

En el último año he trabajado con Ansible, que es una herramienta para orquestrar servidores. Aunque es un poco costoso de empezar con ella, lo bueno es que da resultados bastante reproducibles, con la facilidad de que lo que va en una máquina tiende a ir en las demás. Y las instrucciones de instalación se os quedan guardadas en Git, por lo que instalar los siguientes servidores es más fácil.

Además que, el poder desplegar una actualización a todos los equipos en un sólo comando es una pasada. Cuando tienes 50 equipos que necesitan ser actualizados, o instalar un nuevo programa, Ansible consigue que parezca trivial.

Docker

Pero si queremos ir a la reproducibilidad máxima, a poder trabajar exactamente igual en local que en el servidor, Docker es aún mejor que Ansible. Aunque ambos se pueden usar a la vez y complementarse.

Docker además permite tener distintas aplicaciones aisladas entre sí de forma que una aplicación comprometida no pone en riesgo el servidor o las otras aplicaciones.

En las próximas semanas, cada jueves, iré publicando una serie de artículos sobre cómo usar Docker para hacer un servidor LAMP ultra-seguro. Estad atentos!


Python, la barrera de los diez mil

Llevo mucho tiempo programando con Python y me encanta. El suficiente tiempo como para haber probado a hacer prácticamente de todo. Pero más pronto o más tarde me topo con la dichosa barrera, que al final, he decidido darle el dichoso nombre.

Python es un lenguaje cuyas instrucciones (sin contar optimizaciones) tardan, de media, 50 veces comparadas que su equivalente en C. Sin embargo Python crea aplicaciones muy rápidas debido a que existen grandes optimizaciones y funciones que hacen el trabajo por tí, y ese trabajo es realizado con una eficiencia cercana a la de C. Además muchas de las librerías de Python usan C para ejecutar las operaciones o usan algún tipo de truco para evitar tener que ejecutar manualmente todo en Python.

Tu puedes mandar a una librería que te procese una imagen de 5000×5000, puedes decirle que renderice en OpenGL, que aplique una transformación a una matriz… pero cuando la lógica de la operación se tiene que expresar en Python, tienes que iterar en Python. Y aquí llega la barrera de los diez mil… de las diez mil instrucciones por segundo quiero decir.

Y es que este es el hecho: prácticamente cualquier trivialidad hecha en Python, por simple que sea, suele tener un coste de unos 10us (microsegundos). Cuando a eso le sumas una lógica real, nos vamos a los 30us, y eso, hablando de mínimos absolutos. Eso quiere decir que prácticamente cualquier cosa hecha con Python puede ejecutarse como máximo a 1/30us, lo que sería algo más de 30.000 operaciones por segundo. A unos 100.000 si es una implementación muy directa y sencilla, y unos 400.000 por segundo en caso de que apenas sea una instrucción y de las rápidas.

Así que, cuando tienes un código y te va lento para lo que necesitas, empiezas a buscar los responsables y a optimizar. Claro, mientras hay una operación que tarda 5ms, eso es fácil de localizar y probablemente también de arreglar. Pero conforme vas corrigiendo y mejorando, cada vez la velocidad de la función u operación más lenta se reduce… y se diluye con el resto de operaciones. Hasta que llegas a un momento donde todas tus operaciones parecen necesarias y tardan aproximadamente 0.1ms. Bienvenido a la barrera de los diez mil. Eso hacen diez mil iteraciones por segundo.

Y la barrera de los diez mil es como la barrera del sonido: parece insuperable. Cuanto más ganas le pones, más complicado te lo pone, de una forma exponencial.

Afortunadamente, muy pocas son las aplicaciones que requieren más de mil iteraciones por segundo. Pero si en algún momento pasa por tu cabeza “podría hacer esto en Python” y es algo donde la velocidad es importante…. párate. Piensa y calcula: ¿Cuantas iteraciones por segundo requiere tu programa? ¿Más de cinco mil? si es el caso ya puedes ir preparando una interfaz en C y desviar todos los bucles a C. Y cuando digo C  me refiero a C puro. Sin llamadas a python de ningún tipo.

Puedes pensar… usaré un ordenador el doble de potente, o compilaré las partes afectadas con Cython… error. Por un lado, ordenadores más potentes sólo llevan más núcleos, y entonces tendrías que escribir las partes con threads, pero no sirve de nada porque está el GIL, y a la práctica sólo usarías un núcleo. Así que tendrías que escribir esa parte con multiprocessing (varios procesos ejecutando en paralelo). Por otra parte, lo de compilar con Cython no ayuda lo que uno cree. Si sigue procesando objetos de Python durante los bucles, sigue siendo lento igualmente. Si hay alguna mejora, es pobre.

El truco aquí reside en saber qué es lo que se quiere realizar y tener los datos a punto para realizar la tarea entera sin requerir la intervención de Python en el proceso. El cáncer de los lenguajes dinámicos son las comprobaciones y las búsquedas. Cada paso que da tiene que buscar por lo que se le pregunta y comprobar cada operación que se hace hasta el punto de cambiar el comportamiento según lo que se encuentra. Esto es lento, lo queramos o no. C es rápido porque no comprueba, asume que es correcto, asume que sólo hay una forma de hacer algo. Por tanto, es más rápido pasar unas instrucciones, y comprobar la validez de éstas una vez, que ejecutar la operación manualmente y que el intérprete esté comprobando todo lo que hacemos en cada iteración.

Como conclusión, diría que no vale la pena meterse en estos tinglados. Se invierte mucho tiempo y apenas se obtienen resultados.