ORM should be used in every project

When it comes to perform database queries in an application I hear things from “I stick with SQL-92 so I’m not stuck with one database” to “Everyone should drop ORM’s because they don’t use the full database potential and are slower”. And I believe both points are wrong. Every project should use an ORM: Change my mind.

Sticking with SQL-92, a 25 years old standard (even older, do the math) is not going to make your application run against any database. Things like character escaping, string quoting and name quoting have different requirements on different databases. Also, no database implements a SQL standard fully and properly; some products are more strict than others, but even then there are always some corners where the standard has to be bent a bit to fit the database design. If you do this, you’re going to get stuck with one database anyway, and your application is going to be slower because you would be rejecting to use any powerful feature. For me, this sounds like someone that doesn’t know SQL well and doesn’t want to learn.

Dropping the ORM entirely and go only with raw SQL, using all features that your chosen database has will give you the most performance that you can get, but you’ll end with a lot of very specific SQL everywhere that are hard to translate to other products. This is fine by me, but what worries me is that your code will heavily depend on the specific query performed, so you’ll end with a lot of code duplication.

I have seen this lots of times. In one part of the program we need to display users, so there’s a query and a template. In other parts of the program there’s users by post, so there’s another query and another part of glue code. And again and again there’s more and more glue code everywhere to do the same thing from different sources. Even sometimes we needed to emulate records in the database that aren’t created by then, so there’s more glue code.

Proper approach to database querying

The best solution is to go with a fast ORM that allows use of Raw SQL when needed, in parts or the full query. If the performance is good enough with the default ORM implementation, go with it. Whenever it falls short, try to add some raw strings in the clause where you need it. If for some case the ORM is too slow anyway, then go for full raw query just for those cases.

As I said before, I’m not specially worried of getting stuck with one database product. When I design the application I carefully choose the best possible option and usually I don’t need to change it later. The problem comes on unit testing testing.

Setting up a full-blown database like MySQL or PostgreSQL is hard, and eat lots of resources. SQLite on the other hand offers a really good SQL support and it can be created on memory, so it’s really convenient.

When using an ORM, most of the logic can be tested against SQLite. Even I found that my raw queries with CTE expressions are understood by SQLite, and that’s amazing. For the remaining ones, because they’re just a few, they can have either another raw query just for SQLite or we can mock the call entirely. Even we could have another test folder for integration tests and in those we could use a full PostgreSQL server for test purposes.

But this is not the main benefit of an ORM. And no one should expect being able to use ORM without knowing SQL language properly.

The main benefit of ORM is “Don’t Repeat Yourself” or DRY. Avoiding to write duplicated logic depending on context.

ORM are really powerful in this matter, as they create objects for us and when we create functions for those objects we don’t need to know which data was queried actually, or under which column names. So functions can be reused everywhere to perform the same logic across the application. This is for me my main motivation for using ORM.

Sometimes we can query partial data for a table, and the receiving function would never notice. Sometimes we can prefetch related data and the function will use it just fine without knowing it.

I believe that there is a relation or proportion between code and bugs. The more code you have, more chances of having bugs in it. Having your code as concise as possible makes the intention clearer and is simpler to reason, inspect, and prove correct.

In other cases, a bigger SQL is simpler to prove correct because SQL is closer to a declarative or functional language and expresses terms in mathematical relations that are also a good tool to prove correctness. But if when the results land on your code you still have to do complex calculations on it, then you’re doing something wrong or the problem is really difficult to solve while being fast.

Some ORM have the ability to map raw SQL to objects and we should use these to avoid duplicating logic again and again.

The main problem with ORM is performance. Everywhere I saw developers saying that ORM performance does not matter, we should move the work in the database anyway. This is true and false at the same time. From one side, yes, we should move all possible calculations to the database unless there is a legitimate reason to not to do so. But on the other part, ORM performance does matter.

Guess what happens when we use 10000 rows from a complex database query in most ORMs: The database handles the load, with filters and joins neatly and quickly, responding in 10ms or less. The ORM can add another 10ms on top on some cases, making the responses twice as slow.

This is the main reason why so many people is claiming against ORM, because the different libraries fail to realize how important is to be fast and they become the main bottleneck. They fail to see that databases are not fast, they’re blazing fast: they can handle millions of records in seconds. Any slowdown in our application and we’ll be impacted heavily.

Some clear indicators on how overlooked performance is are the lack of compiled Python ORMs (using C for the parts where speed is required). I also compared Django (Python) against GOrm (Go) and while GOrm queries slightly faster, Django seems to retrieve slightly more rows per second. How’s this even possible if Python is 10x slower? This is because most or all Orm in Go use reflection (like introspection in Python) instead of generating specific code for the types; Go ends doing much more work than it should.

Common Missing Features in ORM

We have a lot of features to mimic more and more stuff from SQL, but as stated before, performance is missing. No matter which ORM I always find the following features missing or incomplete: query plan caching, result caching, proper cache eviction tools and the ability to inject generated/mock results.

Query Plan Caching

There’s an associated cost converting the ORM representation of the query to the database dialect SQL. When a query is complex enough and is executed often, this becomes a small bottleneck.

But most of the query is usually stated already in code, and in fact, at compile-time it could be partially generated for a particular dialect speeding up later calls.

Some databases allow as well prepared statements. On those, you just prepare the calls at the beginning and afterwards you only need to provide the required data, without nay effort. This not only saves you from building a SQL each time, it also saves time to the database as it does not have to parse and plan the query each time.

Result caching

Not only saving the cost of creating a SQL or unmarshalling the records, caching results also saves a database query with its round-trip. Most of the time I had to do it manually without much help from the ORM which is exhausting.

Cache eviction tooling

So we cached the data, good. But now it’s stale: not so good. The ORM in fact knows or has methods to know if the application itself invalidated the data. Because has knowledge on which tables are touched on each query or update, they could easily perform a cache eviction mechanism à la MySql, where whenever a table is modified, all caches using that table are evicted. It’s really simple and it does remove a lot of useless queries.

For external modifications outside the app there are also tricks, like leveraging columns like last_modified, reading internal statistics of the database, or having an external communication channel that allows us to tell the app to evict cache for a certain table.

How many ORM implement this idea? Zero. Good luck implementing this yourself without touching the ORM.

Inject generated results

This pattern is a bit more convoluted. Imagine we have a function that works with data from a query. Now we want to use it without querying. In most cases replacing the query with a list of created objects is good enough.

But what if the function expects to have working relationships? For example in Django, with select_related or prefetch_related. Those are near to impossible to mock up properly.

And now to the final boss: Perform a query to database and fill some Many-to-Many relationships without querying, from cache or some generated data. This makes more sense than might seem. The typical use-case scenario is: We want to output a list of products, but with it a huge bunch of extra data from relationships which is very costly to fetch and process. On the website, we want the products to be fresh, but for the extra data we really don’t care that much as it rarely changes and it’s barely noticeable. So we want to join a fresh query with a potentially stale cache. If the previous example was near impossible, imagine how “easy” is this one.

ORMs are needed but a pain point

Some people might think that I’m obsessed with performance, but I believe that those that have seen me working would agree that I only focus on performance when actually is important. I do not care unless I see potential problems.

We should be using an ORM to avoid code duplication and having logic concentrated in a single point. But when doing that we lose a lot of performance. They also lack tooling to avoid calculating things twice, which makes things even worse.

Still I would recommend everyone using one, because it’s better than raw queries in so many senses. They can provide correctness, slimmer code and reduced chances of bugs. Just hope that someday they’ll have features to make caching easier.

At the moment I’m trying to benchmark different ORM over Python, Go and Rust and hope I can get it out in some weeks.

Explain like I’m five – Databases

I have talked a lot in the past about PostgreSQL, NoSQL and a lot of technical stuff, but I forgot the basics. What are databases? Why use them? Which different types are there? I’ll try to explain these questions in this post in an easy way overlooking the formalities and most technical terms.

What is a database

Roughly speaking, anything that can hold data, search it and update it could be considered (more or less) a database. They’re most often used to persist a program state on disk.

Before they popularized, any program trying to store anything had to create one or several files and design a format to be able to load the data later.

This was fine for programs that worked with documents, for example text documents or drawings. But when the program was about accounting, things started to get weird; you could create some records for a sale, and you’ll expect those to be saved as you go, without having to manually hit “File -> Save As…” each time.

In those cases, the file that would be saved is too big to fit in memory (of the computers of the era), so you cannot longer rely on editing the changes on the program and then dumping on disk in one go.

Updating the new values “on the go” wasn’t a choice but a requirement, but the drives are really bad on searching data if you don’t know the position in disk. Basically they’re like tapes but really fast: they move forward and backward to the desired position and they write the bytes you want. If you don’t know in advance where is that record you wanted you have to start reading from start to the end, understanding each record and discarding them until you find the one you want. If the one you want turns out is in the end, bad luck! you had to read the whole drive.

When updating the data things turned out to be even worse, if the new data is shorter or longer than the old record, we would had to start moving all remaining data to give room or to shrink it depending on the case.

This is not why old computers were so slow; no program that I know of was build that way, because engineers are clever and before doing something that inefficient they came out with clever ideas to be able to find and update records on the go while staying fast.

Those first ideas were the precursors of the databases, and they can already be considered one of those, even they probably didn’t think at the time they were databases.

One of the first designs date from Cobol tables. By defining ahead the different data we want to store in a record, and its size, we can create something that looks like this:

Now, every record has a fixed size. And if we had a record number we can easily find the record on disk. Same for columns, as they have a size we can locate exactly what we wanted. This speeds up everything.

Updating values was also way easier, because they have a maximum width they can never overflow. The downsize is that we cannot longer store arbitrarily long values anymore; it’s a small price to pay for the speed.

Other challenges came later, as for example how to find all records that had a particular date. This resolved on indexes and more complex databases. This kept going and innovating, and nowadays we have all sorts of databases available for different purposes.

When should we use a database

If our program has some sort of structured data and we need to search on it, as well as storing small pieces on the go, the answer is adding a database.

Not all databases are equal, in fact there is a huge range of products that are optimal for different use cases. For instance, the filesystem that allows us to organize data inside named files with hierarchical folders is in fact a database itself. Without a filesystem we would had to search by the byte number inside the disk to locate our data, which is not an easy task.

Also there are databases that do not write to disk at all. Because memory (RAM) is also accessed by position (byte number) the same problem appears when we want to hold in memory complex structures and there are solutions to ease locating and writing to this data.

The line that separates what is a database and what is not is really blurry, and sometimes the same thing can be referred as a database or as a simple algorithm or a file format. When these start to add more and more features, it gets harder and harder to call them just that, so at some point we start calling them databases.

Knowing the different types is key to recognize that we will benefit from a database, so I’ll explain next which ones exist.

Database types

There are lots of different ways of classifying databases, and most of them are completely orthogonal; so most probably you can pick any random properties and some products probably will have them.

  • From where you can query:
    • Local: You have to be on the same machine to access the data. If on memory, only the current program can access it. They’re convenient as no server setup is needed.
    • Network: Multiple users from different computers on the network can access the data remotely. Even from the internet. They require setting up a computer as a server.
  • Where they store:
    • Disk: Data is saved on disk for persistence. Turn off the computer and on again, and the data is still there.
    • Memory: Data is volatile and disappears when the database stops or the computer reboots. They’re faster and commonly used for caching (memorizing last pieces of data seen for faster retrieval)
  • Number of concurrent users:
    • Single user: They cannot allow safely for many users to write at the same time; Some of them allow concurrent reads. Typically the case for locally stored data.
    • Multiple users: They have mechanisms that allow users to read and write safely at the same time. This is the case for most databases that are accessed over the network.
    • Multi-thread or multi-process: If the database can handle concurrent operations at the same time; from the local computer or from the network.
  • Consistency vs Scalability:
    • Highly Consistent: They guarantee that all users will see data that makes sense and it’s complete at all times. Sadly, they’re harder to scale.
    • Easily Scalable, Eventually consistent: These trade off a bit of the consistency in exchange of being able to better spread the load on several computers.
  • Durability of the data:
    • Highly durable: Those guarantee that anything saved is persisted on the disk properly before saying it’s OK.
    • Non-durable: In exchange for having faster write speed, those can lose part of the data that was saved recently in the event of a crash or restart. Some products or configurations can corrupt the whole database in a failure; others will guarantee that data older than X minutes will stay there no matter what.
  • Data structure:
    • Relational: Where you can have fields pointing to other tables, for example invoices pointing to their respective clients.
    • Non-relational: Where your records hold all data needed; if you have invoices, clients would have some of their data copied on each invoice.
  • Write conflict resolution:
    • Pessimistic locking: When a user tries to write, it locks down the whole table and no one can read or write to that part of data during this time. Parallel reads is allowed.
    • Optimistic locking: When users write, they write on a new record, allowing other users to read the old data while the new one is being saved.
    • No locking: When a user writes, all changes are visible to others even if this was part of a bigger change. If two users try to write at the same time, the last write wins.

Examples

That was a lot to digest, right? Let’s focus on concrete examples of known products to see how these things fit in a database.

SQL Databases

Structured Query Language (SQL) is the most common type of database that we will encounter. It’s so common that when we just say “database” people will think we are specifically referring SQL.

They share a lot of features, the main one is the language “SQL” which is used to search data and update it. SQL dates from 1986 and has seen regular updates since then.

Most of them have full ACID compliance, meaning they are really durable (never loss a record on failure), highly consistent (all users see the same data), atomic (users won’t see partially written records) and isolated (ongoing changes from one user aren’t visible to others until finished).

They work on relational data by having many tables and allowing to do complex searches involving pieces from different tables in a single query. They’re really powerful and the de-facto standard to manage data that is related with money (Banks, Accounting, Warehouse managing, etc)

Some of them are local only, allowing just a single computer to access the data; but the majority have access over the network allowing lots of users

Their use of relational data makes them inherently hard to scale between computers. This is called horizontal scaling or sharding.

The data has always a predefined structure (schema) that all records must fulfill.

Some examples are:

PostgreSQL is a free database that can hold Terabytes of data, and never loses a change. Users access it over the network and uses optimistic locking.

Maybe you know it already, but it’s my favorite database!

Oracle is a very high-end database product. It’s not free, but it’s used everywhere in enterprises. It resembles a lot to PostgreSQL; in fact features in Oracle usually serve as inspiration for others, so it’s easy to see lots of new shiny things in Oracle first and after some time in other databases.

SQL Server is the main database product from Microsoft. It’s the big brother to MS Access; In fact, applications that were written in Access are easy to upgrade to SQL Server, allowing for faster operation and multiple users accessing at the same time. It uses pessimistic locking, and the approach to SQL is slightly different to the previous two. It’s also used in enterprises; being cheaper than Oracle makes it attractive.

MySQL is also free and it’s main market are websites. It has a reduced set of features for SQL, but it’s more than enough for those. It’s a disk based database like the others, but leverages much more on memory to be fast. Sometimes using MySQL avoids the need for a second, in-memory database. The downside is MySQL is not as resilient as the others, as trades a lot of durability, consistency and isolation for speed.

SQLite is a local-only database. It’s main selling point is simplicity: its databases are just a single file on disk. They’re perfectly suited for mobile apps, temporary data, or just for saving your configuration in your app. It also has a memory-only version that is ideal for testing applications. It’s free and comes with built-in support in some programming languages. Don’t be fooled by it’s simplicity as it comes with more SQL features than MySQL.

No-SQL Databases

Where still being general purpose databases, No-SQL databases fill the gap where SQL ones aren’t a great fit. If the data doesn’t have relations, don’t have a particular schema or we need to scale it horizontally, No-SQL tends to be the answer.

Nowadays the main reason to go with No-SQL is because we need to share the write load and the data across several servers, as SQL databases have catch up with other things like JSON or fast writes.

Usually they are accessed from the network, allowing for huge amounts of concurrent users to read and write at the same time. They trade off a lot to gain speed, so they have slight to no locking mechanisms.

The main problem on these is their inability to perform complex searches efficiently by non-standard fields. Some of them have better support for those than others, but generally, designing carefully the objects and their main keys is needed to get the promised performance.

Most in-memory databases that were created even before No-SQL was a thing now fall also in this category even if they don’t announce themselves as such. For example, key-value stores.

Let’s see some examples:

Cassandra is a database created by Facebook, as with many NoSQL solutions is intended to be deployed in several servers at the same time to share the load. It is eventually consistent and gets its durability on the fact that the data is replicated across computers. It has its own query language CQL which resembles a lot to SQL.

MongoDB is a document database, where it saves its contents in JSON format, making it easy for developers to read and write to it. Much like Cassandra, is intended to be run in several computers at once, replicating and leveraging the load between them.

Redis is an in-memory database with optional persistence to disk. It’s main usage is for caching data, but as it can persist it is used sometimes as a main database. It doesn’t feature proper durability, as several minutes of data can be lost in case of failure, but this can be leveraged by installing it in several machines for sharding. Also, it’s one of the few databases that currently do not support multi-thread or similar, so it only uses one CPU core to run. As it is really fast, this doesn’t seem to be a problem.

Specialized databases

So far, those databases handle the general case scenario. Saving data, finding data, and so on. But there are times where a specialized database can come handy. These are crafted for a very specific use case but they also are way better under those conditions.

Full-Text search databases

Finding records by date or author is nice, but what if what we want to implement a search in our application that works like Google, how do we do it? The user just inputs a text and we get anything that contains or resembles the text, even correcting the user input.

Some SQL databases do have good enough text search capabilities for most, but if we want a powerful search we will have to resort at some point to a specialized database.

ElasticSearch is the most known database in this category. It allows to do a powerful search by contents, correcting user inputs or suggesting modifications to the search. Allows for a wide way of indexing and scoring.

Xapian is used by Debian and other distros to perform full-text search in their package managers. It’s main benefit is a small footprint, it’s lightweight but it doesn’t have as many features as others.

Graph databases

There are cases where what we want to store aren’t documents or inter-related records, but we want to store a collection of connected nodes to query them later. These are really specialized, and if you didn’t understand what a graph of nodes means probably you don’t need these. I have to admit that I never used one and I never got a problem that really requires one.

Neo4j is for me the name that appears more often when I think of graph databases. It’s open source, written on Java, and of course it’s built from the ground up to tackle these kind of problems.

Spatial databases

When our data is about positions in space, for example in a map, Spatial Databases include the necessary tooling to get nearby places, filter by distance or calculate overlaps. As most often they only require some tooling and not a full database redesign, we can find some GIS capability on SQL and NoSQL products.

Time series databases

Whenever we want to do a continuous recording of measures, for example temperature over time, or CPU usage over time, regular databases can be overwhelmed if we send several records per second. Time series databases offer much more performance on those scenarios and ways of querying samples of different resolutions to do graphs.

File formats

Some file formats like HDF5 are sort of a database in disguise. They can hold many values stored in an organized way that can allow for searching by key and sometimes even in-place updating.

Filesystems

As mentioned at the beginning, filesystems are indeed databases. They hold an hierarchy of data organized in files and folders. An application can use this to organize its data.

Old filesystems like FAT didn’t had protection against failure, so it was easy to corrupt data. Modern ones like NTFS and EXT4 do have some protection so the data stays safe on crash.

Still, filesystems are not the best solution to handle records because they’re designed to hold big chunks of organized data. If that’s your use case, then just store the data as files and you’re good to go.

Conclusion

Databases appear in all sorts of sizes and types. Choosing the right type and size is crucial for an application to avoid losing data while still being fast. Developers don’t know every single database out there; they take time to learn and we need to focus on what matters. Just knowing that they exist is usually enough to identify later a scenario where a specific database could be a better fit.

SQL Databases are the most veteran and it’s usually the right choice. If you plan to use 5 servers or more for your data, NoSQL is best.

For really simple storage, files may be enough. SQLite offers a good middle ground between storing random files on a folder and going to a full fledged SQL database.

Avoid using several databases at once, as it adds a lot of complexity to deploy your program as well as performing and restoring backups. Databases are meant to store the state of your program, and if all your state is in a single place, everything becomes much easier.

Future of Rust and Go

Having used for a bit both languages I start to see where they fit and which market will they replace. Where we can see Rust and Go in the next 10 years? Will they replace any language?

Rust

Being still too new and quite difficult to start with, Rust will take its time to get its place, but I have no doubt that it will.

Rust directly competes with C++, and it’s better in every way, having proper protections against memory misuse. C++ in the other hand has a head start of 27 years, so the amount of code and libraries present in C++ will be a huge thing to overcome. But being Rust easier to work with I believe it will take over at some point.

For C, depending on the program Rust could be seen as a replacement. If the application was big and complex, Rust can add abstraction without extra costs while staying fast. There is also Rust for Embedded systems. Rust has a lot of potential in this area, but looks to me that it will be way more difficult for it to gain ground against C.

For Java, it might or might not make sense. Rust compiler is really picky, which can be seen as an advantage for projects that require some sanity; But Java has more ways of protecting against misuses of code than Rust. In those cases where Java was used for speed, Rust might be compelling, specially because the memory footprint of Rust is really small. But nowadays Java is used for the availability of developers on the market and this is a big problem in Rust.

On scripting languages like Javascript, Python, PHP or Perl, Rust does not make sense because it requires too much effort to code compared to those.

Scientific oriented languages like F# also won’t see any benefit on Rust as it is too technical for the current audiences of those.

Functional languages like Haskell might seem some use in Rust, as it supports a lot of the functional approach while having blazing speed. Also, functional languages usually are picky on type coercion and other stuff, so Rust won’t be that surprising. But generally, Rust is too low level for those audiences.

Products that we might see using Rust

  • Browsers: Don’t forget Rust was created by Mozilla for Firefox to speed up development while guarding against memory errors.
  • Virtual machines: Rust is an ideal candidate as it can control interrupts and it’s even prepared for embedded systems.
  • Databases: They need blazing speed, small footprints, abstraction and thread safety. Rust brings all of those.
  • Game development: Speed is crucial here and Rust will enable modding in a fast, safe way.
  • Kernels and drivers: Did I say that Rust supports embedded? Also it should be able to create kernels. There is some traction on allowing Rust drivers in Linux already.
  • WebAssembly: Rust is pioneering Wasm support, and being the fastest language it makes sense to use it. Not surprising, as Mozilla is interested on this. Will we see web frameworks like Angular written on Rust someday?
  • Basic OS tools like GNU: Those are written in C or C++ currently and Rust is easier to hack with. Sadly, this is an area that does not require any improvement so I doubt anyone will try to rewrite them. But a Bash implemented in Rust can be interesting.

As stated before, Rust has a steep learning curve and don’t expect to see any of these in the mid term. We will have to wait for long before seeing the real potential of Rust.

Go

The biggest advantage of Go is its simplicity to read and write code. Go is taking over already and we will see a lot of applications based on it.

Its main competitors are scripted languages like Python, PHP, Perl and so on. Having more or less the same productivity and adding a nice type check will allow projects to grow in an organized way. Go is really opinionated and this helps preventing a mess of code when the team is big and people coming in and out.

For applications using C++ and C it will not see much benefit of using Go, as it is a bit slower, and compared to C++ it does not offer a good level of abstractions. Only those using C without grabbing all its performance and those using C++ with a really basic set of features will see a benefit moving to Go.

Java developers will probably not want to change to Go. It offers roughly the same speed with less abstraction and less checks. Not having proper OOP and exceptions will produce too much friction to move. The only benefits are the smaller memory footprint and avoiding deploying the JVM on the final system, which will make sense on containers and serverless only.

For scientific and functional languages Go doesn’t offer much, so I don’t expect anyone adopting Go from these.

Products that we might see using Go

  • OS tooling: Currently Bash, Perl and Python cope the majority of scripts used in Linux to perform the boot process as well other tasks like maintenance. Go will make these easier to grow and faster, delivering some performance gains.
  • Docker: This one already uses Go!
  • REST API: Go seems the ideal language, as there are speed gains and ease of development.
  • Backend frameworks and CMS: I don’t see why not. Go can generate HTML templates as any other language.
  • Applications for containers and serverless: The ease of deploying Go, threading support and it’s speed are ideal for these environments.
  • WebAssembly: While still experimental and behind of Rust, if it gets better, the ease of use of Go will make very appealing for people to write Wasm using Go.

The biggest issue I see with Go for scripting and Web is that we’re used to mess with the code as it runs. If Operating Systems start deploying binaries for these we might see a lack of freedom to hack the code, as you have to download it separately and rebuilt.

As Go can be cheaply built and the result can be cached, someone might come with solutions to run it like we do with scripts, like the answers in this StackOverflow questions.

For me, I will keep learning on both Go and Rust. Surely, I will give Go some use in the near term, but for Rust I’ll have to keep learning for some time before giving it an actual use.

What about you? Do you plan to give Rust or Go a try?

Benchmarking Python vs PyPy vs Go vs Rust

Since I learned Go I started wondering how well it performs compared to Python in a HTTP REST service. There are lots and lots of benchmarks already out there, but the main problem on those benchmarks is that they’re too synthetic; mostly a simple query and far from real world scenarios.

Some frameworks like Japronto exploit this by making the connection and the plain response blazing fast, but of course, as soon as you have to do some calculation (and you have to, if not what’s the point on having a server?) they fall apart pretty easily.

To put a baseline here, Python is 50 times slower than C++ on most benchmarks, while Go is 2-3 times slower than C++ on those and Rust some times even beats C++.

But those benchmarks are pure CPU and memory bound for some particular problems. Also, the people who submitted the code did a lot of tricks and optimizations that will not happen on the code that we use to write, because safety and readability is more important.

Other type of common benchmarks are the HTTP framework benchmarks. In those, we can get a feel of which languages outperform to others, but it’s hard to measure. For example in JSON serialization Rust and C++ dominate the leader board, with Go being only 4.4% slower and Python 10.6% slower.

In multiple queries benchmark, we can appreciate that the tricks used by the frameworks to “appear fast” no longer are useful. Rust is on top here, C++ is 41% slower, and Go is 43.7% slower. Python is 66.6% slower. Some filtering can be done to put all of them in the same conditions.

While in that last test which looks more realistic, is interesting to see that Python is 80% slower, which means 5x from Rust. That’s really really far better from the 50x on most CPU benchmarks that I pointed out first. Go on the other hand does not have any benchmark including any ORM, so it’s difficult to compare the speed.

The question I’m trying to answer here is: Should we drop Python for back-end HTTP REST servers? Is Go or Rust a solid alternative?

The reasoning is, a REST API usually does not contain complicated logic or big programs. They just reply to more or less simple queries with some logic. And then, this program can be written virtually with anything. With the container trend, it is even more appealing to deploy built binaries, as we no longer need to compile for the target machine in most cases.

Benchmark Setup

I want to try out a crafted example of something slightly more complicated, but for now I didn’t find the time to craft a proper thing. For now I have to fall back into the category of “too synthetic benchmarks” and release my findings up to this point.

The base is to implement the fastest possible for the following tests:

  • HTTP “Welcome!\n” test: Just the raw minimum to get the actual overhead of parsing and creating HTTP messages.
  • Parse Message Pack: Grab 1000 pre-encoded strings, and decode them into an array of dicts or structs. Return just the number of strings decoded. Aims to get the speed of a library decoding cache data previously serialized into Redis.
  • Encode JSON: Having cached the previous step, now encode everything as a single JSON. Return the number characters in the final string. Most REST interfaces will have to output JSON, I wanted to get a grasp how fast is this compared to other steps.
  • Transfer Data: Having cached the previous step, now send this data over HTTP (133622 bytes). Sometimes our REST API has to send big chunks over the wire and it contributes to the total time spent.
  • One million loop load: A simple loop over one million doing two simple math operations with an IF condition that returns just a number. Interpreted languages like Python can have huge impact here, if our REST endpoint has to do some work like ORM do, it can be impacted by this.

The data being parsed and encoded looks like this:

{"id":0,"name":"My name","description":"Some words on here so it looks full","type":"U","count":33,"created_at":1569882498.9117897}

The test has been performed on my old i7-920 capped at 2.53GHz. It’s not really rigorous, because I had to have some applications open while testing so assume a margin of error of 10%. The programs were done by minimal effort possible in each language selecting the libraries that seemed the fastest by looking into several benchmarks published.

Python and PyPy were run under uwsgi, sometimes behind NGINX, sometimes with the HTTP server included in uwsgi; whichever was faster for the test. (If anyone knows how to test them with less overhead, let me know)

The measures have been taken with wrk:

$ ./wrk -c 256 -d 15s -t 3 http://localhost:8080/transfer-data

For Python and PyPy the number of connections had to be lowered to 64 in order to perform the tests without error.

For Go and Rust, the webserver in the executables was used directly without NGINX or similar. FastCGI was considered, but seems it’s slower than raw HTTP.

Python and PyPy were using Werkzeug directly with no url routing. I used the built-in json library and msgpack from pip. For PyPy msgpack turned out to be awfully slow so I switched to msgpack_pypy.

Go was using “github.com/buaazp/fasthttprouter” and “github.com/valyala/fasthttp” for serving HTTP with url routing. For JSON I used “encoding/json” and for MessagePack I used “github.com/tinylib/msgp/msgp”.

For Rust I went with “actix-web” for the HTTP server with url routing, “serde_json” for JSON and “rmp-serde” for MessagePack.

Benchmark Results

As expected, Rust won this test; but surprisingly not in all tests and with not much difference on others. Because of the big difference on the numbers, the only way of making them properly readable is with a logarithmic scale; So be careful when reading the following graph, each major tick means double performance:

Here are the actual results in table format: (req/s)


HTTPparse mspencode jsontransfer data1Mill load
Rust128747.615485.435637.2019551.831509.84
Go116672.124257.063144.3122738.92852.26
PyPy26507.691088.88864.485502.14791.68
Python21095.921313.93788.767041.1620.94

Also, for the Transfer Data test, it can be translated into MiB/s:


transfer speed
Rust2,491.53 MiB/s
Go2,897.66 MiB/s
PyPy701.15 MiB/s
Python897.27 MiB/s

And, for the sake of completeness, requests/s can be translated into mean microseconds per request:


HTTPtransfer dataparse mspencode json1Mill load
Rust7.7751.15182.30177.39662.32
Go8.5743.98234.90318.031,173.35
PyPy37.72181.75918.371,156.761,263.14
Python47.40142.02761.081,267.8147,755.49

As per memory footprint: (encoding json)

  • Rust: 41MB
  • Go: 132MB
  • PyPy: 85MB * 8proc = 680MB
  • Python: 20MB * 8proc = 160MB

Some tests impose more load than others. In fact, the HTTP only test is very challenging to measure as any slight change in measurement reflects a complete different result.

The most interesting result here is Python under the tight loop; for those who have expertise in this language it shouldn’t be surprising. Pure Python code is 50x times slower than raw performance.

PyPy on the other hand managed under the same test to get really close to Go, which proves that PyPy JIT compiler actually can detect certain operations and optimize them close to C speeds.

As for the libraries, we can see that PyPy and Python perform roughly the same, with way less difference to the Go counterparts. This difference is caused by the fact that Python objects have certain cost to read and write, and Python cannot optimize the type in advance. In Go and Rust I “cheated” a bit by using raw structs instead of dynamically creating the objects, so they got a huge advantage by knowing in advance the data that they will receive. This implies that if they receive a JSON with less data than expected they will crash while Python will be just fine.

Transferring data is quite fast in Python, and given that most API will not return huge amounts of it, this is not a concern. Strangely, Go outperformed Rust here by a slight margin. Seems that Actix does an extra copy of the data and a check to ensure UTF-8 compatibility. A low-level HTTP server probably will be slightly faster. Anyway, even the slowest 700MiB/s should be fine for any API.

On HTTP connection test, even if Rust is really fast here, Python only takes 50 microseconds. For any REST API this should be more than enough and I don’t think it contributes at all.

On average, I would say that Rust is 2x faster than Go, and Go is 4x faster than PyPy. Python is from 4x to 50x slower than Go depending on the task at hand.

What is more important on REST API is the library selection, followed by raw CPU performance. To get better results I will try to do another benchmark with an ORM, because those will add a certain amount of CPU cycles into the equation.

A word on Rust

Before going all the way into developing everything in Rust because is the fastest, be warned: It’s not that easy. Of all four languages tested here, Rust was by far, the most complex and it took several hours for me, untrained, to get it working at the proper speed.

I had to fight for a while with lifetimes and borrowing values; I was lucky to have the Go test for the same, so I could see clearly that something was wrong. If I didn’t had these I would had finished earlier and call it a day, leaving code that copies data much more times than needed, being slower than regular Go programs.

Rust has more opportunities and information to optimize than C++, so their binaries can be faster and it’s even prepared to run on crazier environments like embedded, malloc-less systems. But it comes with a price to pay.

It requires several weeks of training to get some proficiency on it. You need also to benchmark properly different parts to make sure the compiler is optimizing as you expect. And there is almost no one in the market with Rust knowledge, hiring people for Rust might cost a lot.

Also, build times are slow, and in these test I had always to compile with “–release”; if not the timings were horribly bad, sometimes slower than Python itself. Release builds are even slower. It has a nice incremental build that cuts down this time a lot, but changing just one file requires 15 seconds of build time.

Its speed it’s not that far away from Go to justify all this complexity, so I don’t think it’s a good idea for REST. If someone is targeting near one million requests per second, cutting the CPU by half might make sense economically; but that’s about it.

Update on Rust (January 18 2020): This benchmark used actix-web as webserver and it has been a huge roast recently about their use on “unsafe” Rust. I’m had more benchmarks prepared to come with this webserver, but now I’ll redo them with another web server. Don’t use actix.

About PyPy

I have been pleased to see that PyPy JIT works so well for Pure Python, but it’s not an easy migration from Python.

I spent way more time than I wanted on making PyPy work properly for Python3 code under uWSGI. Also I found the problem with MsgPack being slow on it. Not all Python libraries perform well in PyPy, and some of them do not work.

PyPy also has a high load time, followed by a warm-up. The code needs to be running a few times for PyPy to detect the parts that require optimization.

I am also worried that complex Python code cannot be optimized at all. The loop that was optimized was really straightforward. Under a complex library like SQLAlchemy the benefit could be slim.

If you have a big codebase in Python and you’re wiling to spend several hours to give PyPy a try, it could be a good improvement.

But, if you’re thinking on starting a new project in PyPy for performance I would suggest looking into a different language.

Conclusion: Go with Go

I managed to craft the Go tests in no time with almost no experience with Go, as I learned it several weeks ago and I only did another program. It takes few hours to learn it, so even if a particular team does not know it, it’s fairly easy to get them trained.

Go is a language easy to develop with and really productive. Not as much as Python is, but it gets close. Also, it’s quick build times and the fact that builds statically, makes very easy to do iterations of code-test-code, being attractive as well for deployments.

With Go, you could even deploy source code if you want and make the server rebuild it each time that changes if this makes your life easier, or uses less bandwidth thanks to tools like rsync or git that only transfer changes.

What’s the point of using faster languages? Servers, virtual private servers, server-less or whatever technology incurs a yearly cost of operation. And this cost will have to scale linearly (in the best case scenario) with user visits. Using a programming language, frameworks and libraries that use as less cycles and as less memory as possible makes this year cost low, and allows your site to accept way more visits at the same price.

Go with Go. It’s simple and fast.

HTTP Pipelining is useless

…and please stop publishing benchmarks with Pipelining enabled. It’s just lying about real-world performance.

Today I just found out that one of my favorite sources for HTTP framework benchmarks is indeed using pipelining to score the different programming languages and frameworks and I’m mad about it:

https://www.techempower.com/benchmarks/

The first time I saw this was with Japronto, which claimed one freaking million of requests per second, and of course this wasn’t replicable unless you had a specific benchmarking method with pipelining enabled.

Before HTTP/2 I was in favor of pipelining because we were so limited on parallel requests and TCP connections were so costly that it made sense. Now, with H2 supported on all major browsers and servers, pipelining should be banned from benchmarks.

What is HTTP pipelining?

In classic HTTP/1, we had to open a TCP connection for a single request. Open the socket, send the request, wait for response and close the socket. TCP connections have a big cost to open, so this was a real problem back in the days.

With HTTP/1.1 we had keep-alive, where after the request was completed, we can feed another request on the same TCP socket. This alleviated the problem. But still, if your computer is far from the server (usually is), the server will sit idle waiting for the last packet sent to arrive to your computer, then waiting for your next request back. In most servers this is 80ms of delay from one request to the following one.

So here enters the scenario the named HTTP pipelining, where we could send another request before the response was received, effectively queuing the requests on the server and receiving them in order. Wikipedia has a nice graph on this:

This looks nice, but HTTP/1.1 never got Pipelining working; it was there, not mandatory, with some clients and servers supporting it; but as it seemed that most web servers at the moment were failing to reply properly with pipelining, and there was no reliable way for the client to tell if the server actually supports pipelining, all major browsers didn’t add the support at all. What a shame!

It was a really good idea, but then HTTP/2 came with multiplexing and this problem vanished. There are still challenges in this area, but nothing that Pipelining will solve. So now, we’re happy with multiplexing.

HTTP/2 does not have Pipelining

This is a common misunderstanding. Yes, you can send several requests; even several thousands without waiting to receive anything. This is really good, but it’s not pipelining. Why?

Pipelining, as it’s name implies, acts like a pipe: First In, First Out. The request will be queued in the server in order and will be replied in order.

HTTP/2 has instead multiplexing, which seems similar, but better. Multiplexing means that you get several streams inside one at the same time, so you can receive data as it is produced. The requests are not queued and are not returned in the same order. They come back at the same time.

Why pipelining gives so good results

Because it’s equivalent to copy a file over the network, specially under synthetic benchmarks where localhost is the target, pipelining reduces a lot of effort to get the different packets.

Instead of grabbing a packet for a request and processing it, you can let it buffer, then grab a big chunk in one go that might contain hundreds of requests, and reply back without caring at all if the client is getting the data or not.

Even more, as the benchmark is synthetic, servers might know beforehand what to serve more or less, reducing time for look what is requested and just replying back approximately the same data again and again.

The benchmark clients also do way less effort, because they only need to fill a connection with the same string repeated millions of times.

If you think about it carefully, this is even faster than copying files over localhost: You don’t even need to read a file in the first place.

HTTP/2 multiplexing is slower

Compared to pipelining, of course. Because you’re not serving a clear stream of data but thousands of interleaved streams, your server has to do more work. This is obvious.

Of course, we could craft a cleartext HTTP/2 server that does multiplexing in effectively one single stream, replying in order. This will result in closer performance to pipelining because it’s indeed pipelining.

But this will be naive to be implemented on a production site, as the same applies if HTTP/1.1 pipelining was a thing. HTTP/2 proper multiplexing is far superior in real world scenarios.

And my question is, do you want your benchmark to return higher results or do you want your users to have the best experience possible?

Because if you only care on benchmarks, maybe is just easier to change the benchmark so it returns better results for your servers, right?

Pipelining will not help serve more requests

I can hear some of you saying “If we enable Pipelining in our production, we will be able to serve millions of results!”. And… surprise!

Why? you might ask. Well, depending on the scenario the problem is different, but it will conclude always to the same two: You need to be able to reply out-of-order to avoid bottlenecks and a single user will never cause thousands of pipelined requests like your benchmark tool.

First pitfall: Requests are heterogeneous, not homogeneous.

Requests will not have the same size, nor reply size. They will have different computing times or wait times to reply. Does your production site reply with a fortune cookie for every single request? Even CSS and JPEG queries? No, I don’t think so.

Why this matters? Well, say your client is asking for a CSS and a JPEG for the same page and you’re replying back with pipelining. If the JPEG was requested first, the CSS will stall until the image completed, making the page not render for some time.

Imagine now we have a REST API, and we get thousands of requests from a client. One of the requests contains an expensive search on the database. When that one is processed, the channel will sit idle and your client will be frozen.

Second pitfall: Real users will never pipeline thousands of requests.

Unless your site is really bad designed, you’ll see that more than 50 parallel request do not make much sense. I tried myself HTTP/2 with an Angular site aggressively sending requests for tiny resources, and the results were quite good, but less than 100 requests in parallel. And the approach was pretty stupid. Aside of this, popular servers and browser lack support for HTTP/1.1 pipelining, so enabling it in your product will not make any difference.

Let’s consider this for a second. Why do we want to pipeline in the first place? Because the client is far from the server and we want to reduce the impact of round-trip time. So, say our ping time to the server is 100ms (which is higher than the usual), and we pipeline 100 requests at a time.

Effectively, in one round-trip, we served 100 requests, so this equates to 1ms RTT per HTTP response. What haves 1ms RTT? Local network! So when you reach this parallelism, the client works as fast as from your local network given the same bandwidth is available. Try the same math for one thousand and ten thousand requests pipelined: 0.1ms and 0.01ms respectively.

So now the question is: Are you trying to save 0.9ms per request to the client, or are you just trying to get your benchmark numbers look better?

Scenario 1: API behind reverse proxy

Assume we have our shiny Japronto in port 8001 in localhost, but you want to serve it along the rest of the site, in port 80. So we put it behind a reverse proxy configuration; this might be Apache, Nginx or Varnish.

Here’s the problem: None of the popular web servers or reverse proxies support pipelining. In fact, even serving static data they will be slower than what your shiny pipelining framework claims it can do.

Even if they did, when they proxy the request, they don’t do pipeline on the proxied server either.

This approach renders pipelining useless.

Scenario 2: Main Web Server

So let’s put our framework directly facing public internet, over another port, who cares? We can send the requests from Angular/React/Vue to whatever port and the user will not notice. Of course this will add a bit of complexity as we need to add some headers here and there to tell the browser to trust our application running in a different port than the main page.

Nice! Does this work? Well, yes and no.

The main concern here is that we’re exposing a non well-tested server to the internet and this can be incredibly harmful. Bugs are most probably sitting there unnoticed, until someone actually notices and exploits them to gain access to our data.

If you want seriously to do that, please, put it inside a Docker container with all permissions cut down, with most mounting points as read only. Including the initial docker container image.

Did we enable HTTP/2 with encryption? If we’re lucky enough that our framework supports it, then it will consume extra CPU doing the encryption and multiplexing.

HTTP/2 over clear text does not work in any browser, so if you try, most users will just go with HTTP/1.1.

If we don’t use HTTP/2 at all, 99% of users have browsers that do not use pipelining at all.

For those cases where they do, the routers and hardware that makes internet itself work will mess up sometimes the data because they see HTTP in clear and they want to “manage” it because “they know the standard”. And they’re pretty old.

Scenario 3: Pipelining reverse proxy

I had an excellent idea: Let’s have our main web server to collect all requests from different users and pipeline them under a single stream! Then we can open several processes or threads to further use the CPU power and with pipelining, the amount of requests per second served will be astonishing!

Sounds great, and a patch to Nginx might do the trick. In practice this is going to be horrible. As before, we will have bottlenecks but now one user can freeze every other user because they asked a bunch of costly operations.

Conclusion 1

The only way this can work is if the framework supports HTTP/2 encrypted and is fast doing it. In this case you should have benchmarked frameworks with HTTP/2 multiplexing.

If your framework does not multiplex properly so indeed, pipelines the data, then users will see unexplainable delays under certain loads that are hard to reproduce.

Conclusion 2

In some scenarios, the client is not a user browser. For example for RPC calls if we implement the microservices approach. In this case, pipeline indeed works given the responses are homogeneous.

But, just it turns out that HTTP is not the best protocol for those applications. There are tons of RPC protocols and not all of these use HTTP. In fact, if you search for the fast ones, you’ll see that HTTP is the first thing they drop out.

I did in the past an RPC protocol myself called bjsonrpc. I wanted speed, and dropping HTTP was my main motivation to create it.

If you need HTTP for compatibility, just have two ports open, one for each protocol. Clients that can’t understand a specific protocol are likely to not understand pipelining either. Having a port for each thing will give you the best performance in the clients that support it while still allowing other software to connect.

Brief word on QUIC

The new old QUIC protocol by Google is being standarized at the moment by the IETF as a base for the future HTTP/3 protocol. QUIC does support fast encryption (less round trips) and has a lot of tolerance against packet loss as well supporting IP Address changes.

This is indeed the best protocol possible for RPC calls, except for its massive use of CPU compared to raw TCP. I really hope that someone standarizes a non-HTTP protocol on top of it aimed to application connections, to be supported by browsers.

The takeaway: managing the protocol takes a lot of CPU, we have to do it in production, and skipping part of that for some frameworks that support it is unfair for the others. Please be considerate and disable pipelining when publishing benchmarks, otherwise a lot of people will be taking the wrong decision based on YOUR results.

PostgreSQL 12 Released!

Good news for the fans of PostgreSQL, the latest version 12 has just been released. As usual, this version adds a ton of optimizations, more security and more features.

Before diving in, I want to add special emphasis of the old&new scram authentication methods. They were added in PostgreSQL 10, and if you’re still using MD5 for password authentication, this is a encouraged change: MD5 is old, weak, and if an attacker can get access to the hashed password it could attack it offline, easily guessing user password accounts. In contrast, scram-sha-256 offers maximum security; even if the rare even of a hashed password leak, the attacker has to do a brute force. Because it uses “scram”, the attacker has to spend thousands of times more CPU to guess even the simplest passwords. It’s really easy to change, so go and fix your installation!

Partitioning enhancements

Foreign keys now can refer to parent tables (of partitioned ones). So this means that if you partitioned for example a table called “invoice_lines” into hundreds of tables, your table “invoice” can have now a FK to “invoice_lines” without specifying to which partition it refers. PostgreSQL manages that.

Attaching new partitions now does not need any lock, so you can do it on-the-fly with your production system running with no downtime.

Queries to partitioned tables that match only a small subset, for example if you have 15000 partitions and your query matches 10, now is faster than in PostgreSQL 11.

Insert and Copy SQL commands to partitioned tables are also faster now.

Partition bound expressions now can be any expression instead of simple constraints.

Other performance improvements

Common Table Expressions (CTE) are inlined now if certain requirements are met, allowing the planner and optimizer to treat them like they do in regular JOINs. This means that starting in this version, if you prefer to write your queries with CTEs because are easier to read, you’ll have no performance hit by doing that. Before this version, as in any other database, CTEs were always an optimization barrier, incurring in bad performance penalties, which now are gone.

B-Tree indexes were optimized, reducing the space used and improving read and write performance. B-Tree indexes are the default in PostgreSQL for almost every column, and they have been always fast. Being smaller improves the CPU caching efficiency, and also locking was optimized as well. Other sort of optimizations on speed and efficiency have been done as well, which makes B-Tree faster than ever. Just upgrading to PostgreSQL 12 will improve performance on any database.

Index only scans, GIN and GiST indexes have seen performance improvements as well.

Multi-column statistics can be created now, manually. This is a huge improvement for complex tables where columns have some kind of relationship and PostgreSQL was not estimating correctly before. For example if we were querying “invoice” filtering by “country” and “VAT type”, PostgreSQL didn’t know that these two are related, so it would apply a statistical chance for hitting a single “country” multiplied by the chance of “VAT type”, underestimating by several orders of magnitude the amount of rows returned. We can now issue a command to create a multi-column statistics that will store the combined statistics of several columns together, improving the planning of those queries.

REINDEX CONCURRENTLY now does not block writes to the table, which helps a lot when we want to repack an index in a live production system without downtime.

Partial indexes with IS NOT NULL can be used in queries, for example with “column IN (1,2,…)”.

New features

Generated columns have arrived to PostgreSQL with this release with two variants: stored and virtual. They’re defined on CREATE TABLE and they act as regular columns that cannot be written to, and their value is always calculated from the other fields of the table. This is a long time wanted functionality, as it can simplify the use of complex partial indexes. Virtual generated columns are always computed on the fly, while stored ones are actually saved into the database. This will save us from having triggers for those cases, speeding up our databases.

Non-deterministic Collations have been added too. This allows PostgreSQL to report equality when the underlying data is not equal in binary format but it’s equal for humans. For example, think about case sensivity (“hello” == “Hello”) or accents (“balón” == “balon”). Of course, to use this you’ll need to specify one of these new collations for your table or column, so it’s always in a controlled way.

SQL/JSON language has been added, allowing us to do queries on json or jsonb columns with a language similar to XPath, for example ‘$.track.segments[*].location’. They added a ton of features, like nested queries, conditions and regular expressions. It’s its own full language indeed!

CREATE INDEX, REINDEX, CLUSTER, VACUUM FULL, and pg_checksums now report progress. This is a small improvement but very welcome, because whenever we did a VACUUM FULL of a big table we were waiting for long time in big tables without knowing if it is doing any progress at all. Now we can get a sense of how much is remaining and if it is going to take way too long.

pg_checksums can now add or remove the checksums of a database cluster given that the cluster is offline. Before we had to create a brand new cluster with the option in order to change it, then dump & restore the database. Checksums are the ability of detecting data corruption in PostgreSQL, and they’re recommended for critical systems. Adding them now, or just testing them is way easier than before, so you can see how much of a performance impact have in your system.

COPY FROM now allows a WHERE clause. This is useful when importing data from a file to filter on the fly what we want to import. For example, imagine we want to restore from backup some rows. Before we had to import the full file into a temporary table, then SELECT…INSERT. Now we can specify which ones using this new WHERE clause. Sweet!

COMMIT AND CHAIN and ROLLBACK AND CHAIN commands have been added. They allow to close a transaction and open a new one in a single command.

VACUUM and ANALYZE commands have now a SKIP_LOCKED option that will skip tables that cannot be locked. This is really useful while doing maintenance, specially those automated in a cron job, because it could lead into a deadlock. With this, those tables are just skipped and the database will be always working.

psql command and others now have a colored output as an option.

pg_upgrade has now a –clone option to take advantage of filesystems with Copy-On-Write (COW) capabilities like Btrfs or ZFS, so the upgrade can happen safely with the old cluster still running. This can reduce even more the downtime of an upgrade, allowing read-only queries on a running database while the upgrade happens, given you have a proxy/load-balancer on top to redirect the queries to the sensible cluster.

postgres_fdw has seen improvements too, pushing to the other server LIMIT and ORDER BY clauses in more scenarios; also the optimizer is aware now of postgres_fdw costs. This is useful when sharding PostgreSQL.

Conclusion

This new release of PostgreSQL contains a lot of new stuff, improving on performance on any already existing database while adding a new set of features for partitioning and JSON processing. This gets PostgreSQL even closer to NoSQL databases.

For me, the main missing feature for PostgreSQL at the moment to put it on the level of NoSQL is automatic sharding. Currently we still have to declare the partitions per shard and table manually, connecting them via postgres_fdw. I hope that at some point someone will come with tools for automating this, or even some PostgreSQL native tooling in some future version; for now, at least this seems on the scope in this wiki.