More than a year ago I moved to Dublin and I started to work for another company. In this new company instead of PostgreSQL, which I habe been using for 10 years, they use Microsoft SQLServer. My first feelings were like stepping on a time machine and going back for 6 years. Not much seems to have changed since I used SQL Server 15 years ago. Yeah, sure, some important new features, but the whole thing still has this smell from the Windows 2000 era.
So I miss a lot of features from my beloved PostgreSQL. I have to say that PostgreSQL is probably faster than SQLServer if it runs under Linux or any *nix system. But PostgreSQL probably doesn’t has the best performance under Windows. So I don’t recommend switching over unless you can move the DB to a *nix server.
Well, let’s go to the task!
IF (NOT) EXISTS clause
When you’re used to PostgreSQL, one of the most used and less thought feature is the IF EXISTS / IF NOT EXIST clauses of several commands. Seems stupid, but this does not exist in SQLServer and you have to fallback to If statements and querying the internal schemas. In PostgreSQL you can do:
- CREATE TABLE IF NOT EXISTS
- DROP TABLE IF EXISTS
- ALTER TABLE IF EXISTS
- ALTER TABLE DROP FIELD IF EXISTS
- CREATE OR REPLACE FUNCTION
- ALTER VIEW IF EXISTS
- ALTER INDEX IF EXISTS
Probably the list is longer, but you get the idea. Simple SQL statements that do what you want. Also the CREATE OR REPLACE FUNCTION is handy, as it can overwrite the previous function (or view) given it is compatible in arguments.
Column dependency tracking
So a function or a view, and it does refer to other tables and its columns. You rename or change the type of those, ¿what happens to your views and functions?
In SQLServer they may fail when called as the old name does no longer exist. You have to remember to update them to keep consistency.
In PostgreSQL this is done automatically, and if not possible, the ALTER command will fail straight away. No inconsistency.
Of course, if you had a function that runs in Python, or a function that starts a query from a text string, those are not translated.
Really renaming fields and tables
This maybe surprising to some people. No, SQLServer does not have capacity to rename fields or tables. It provides some special functions, out of the sql standard, that more or less does the job. But what they do? They copy the data and drop the old one. This means ugly locks and bad performance.
In PostgreSQL, following the SQL standard you have ALTER TABLE xyz RENAME TO ijk. Simple, fast (no rewrite) and almost lock free.
One stupid thing I would like to see on PostgreSQL someday is reordering columns, in such a way that does not affect disk layout. Not very important, but would be convenient sometimes to ensure that “SELECT * FROM x” always returns the same columns in the same order across all your servers, because after several ADD COLUMN the logical layout might look confusingly different.
Oh, this is so 1999 for SQLServer. While it has new types conforming to standards, still somehow manages to favour the old ones. We can see things like WCHAR which comes from Windows Api for Wide Chars (UTF16) (Seriously?) so the regular CHAR is still 8 bytes ASCII. Damn, is this people serious?
No, that’s not how it should be. The SQL Standard talks about characters, not about bytes. PostgreSQL gets this right. Any char/string-alike type is encoding-aware and it supports any character on it. Also has support for many, many encoding and languages for collation. Its default behaviour just works on most languages if not all. And you can set COLLATION per server, per database, per table and per column. This means that is smart enough to compare equally a,á and à given a spanish collation.
PostgreSQL has a whole load of types that can be expanded with extensions as well. They are super useful; I’ll put a few examples:
- bytea: Array of bytes. Save entire files in PostgreSQL, don’t waste an extra single byte on encodings or hexencoding.
- interval: of time. Oh, I love this “now() + ‘3 days 2 hours’::interval”. Flexible, with tons of useful operators. Datetime pain just is easy in PostgreSQL
- jsonb: JSON data, but binary stored. Hah, with support indexes to get subkeys and items on arrays, so you can play with NoSQL while retaining the PostgreSQL speed. And tons, huge tons of functions to transform tables from/to, fields, arrays… endless.
- numeric(x,y): Tired of losing precision for your accounting apps? Try numeric! They have base 10, like our decimal numbers, and they have an astonishing precision. The only downside is, they are mathematically more correct than what some of us would like (if you call that a downside)
- box, line, point: There are a few geometrical types in standard PostgreSQL. They may serve for range queries but also they are handy when you are dealing with points in space, like GPS locations. Indexes also work on all of them.
- Arrays: of almost anything. Instead of jumping to your many-to-many table, you can have these list of primary keys on your main table, avoiding you to do extra queries. And guess what… they are indexable as well, you can ask things like “it contains A” or “it does not contain A” or “it has exactly A,B,C”, etc.
- Range types: of numbers, dates and timestamps. Useful for schedules and calculating overlapping… in an indexed fashion.
- Enums: The best of two worlds, you ensure that a column only has some possible values, without foreign key/table, and it is efficient as an integer column.
Index Expressions and functions
I was suprised that SQLServer did not come with this highly needed feature. Instead, you create computed columns stored on disk (which waste space) and then index them. PostgreSQL in the other hand has no sense of stored computed columns, but you can roll your own easily with triggers.
But, as for indexing, you can in PostgreSQL index the result of a calculation or the result of a function, given the function is pure and you don’t use any other column that wasn’t on the table.
This is useful, because sometimes you need to recurrently filter a query by an expression, for example stock_qty – reserved_qty <= 0. In PostgreSQL you just index “stock_qty – reserved_qty”.
Speaking of these, let me add the WHERE part of an index, to create a partial index. So if you have billions of rows and your app is concerned about a tiny fraction each time, having an index still means iterating over O(log n) records. But if you do a partial index that matches your WHERE expression, you just skip all the uninteresting rows. Partitioning? I tried that, and compared to partial indexes they give almost zero benefit compared to partial indexes up to 50 million rows. Partial indexes are easier to maintain and create, and you can have as many as you want. Can you partition a table by 6 dimensions at the same time? Does make sense?
Array related functions
Yeah, I wrote before about arrays, and they have functions, right? And they are indexed… fine. But there are a full load of them that they need a special mention. Or at least, there is just two: array_agg and array_to_string.
There is this common case: You are joining the table X to A, B, C and D. They are all one to many relationships from X. And we want just a single row in the output per row in X, and all/some columns of every other table. This is usually done to create nice CSV exports of complex data. How do you aggregate these tables in a way that you don’t lose data? Just concatenate all values in a comma separated string.
In PostgreSQL, just do “SELECT array_to_string(array_agg(a.col1), ‘, ‘) as a_col1” and done!
Now, search to do the same in SQL Server, it is possible, it does work. Seriously, search for the solution, I’ll wait. Now multiply that code for how many tables and columns we want, imagine the result. How many pages long is the resulting SQL?
And I don’t want to end this section without an honorable mention to “unnest”. This is the swiss army knife of the arrays, transforms an array into several rows, and in the recent versions of postgres, it can add the number of the record. You can pack a table into an array or unpack an array into a table and do joins on them.
Why we do want this? well, let me add another array function: generate_series. Somehow close to the range(x) function in Python, returns a set of integers from number to number, counting. So we have a table that contains labels for printing, and we do have a column with the number of copies. We want to produce a query that repeats the row N times depending on this column and get the record number (copy number). With generate_series and unnest, it creates the rows for us, numbered, and ready to consume for any reporting tool.
In which year are you? in 1999 or 2018? Get PostgreSQL and jump 19 years to the future like a boss!
And by the way, this list hasn’t ended yet. Just I thout the post was long enough already.
Stay tuned for more!