A small shift on this blog! Advocating for a simpler Rust

Lately I’ve been writing about Rust because I like it a lot and I think it deserves more attention. But since the last post things are going to change gears a bit. I shared the last post on Reddit and it got shared by This Week In Rust too. This has caused an spike on traffic that’s hard to believe, getting a week’s worth of visits in 12 hours.

I’m getting more visibility than I never would have thought of. And comments both in the blog and Reddit that are positive, correcting me (because, heh, I’m not perfect!) and very constructive. I feel that my overall view (so far) is not rejected by the Rust community and it’s possible that my opinions here can have some effect (even if minor) on Rust itself in the future.

Therefore I believe it’s worth trying to advocate for a simpler Rust for beginners. Now it’s not (only) for the visits, it’s about trying to convince other people that Rust can change to be perceived as an easy language. And trying to show other people that Rust, currently, it’s not that hard either.

I currently work for Google as an SRE (See “Contact / About me” to get my current status), and Google is recently getting involved in Rust as well so I need to get this out of the way: What I write in this blog has nothing to do with my employer, opinions are my own. At work I don’t write any Rust currently and sadly I don’t expect this to change. Rust is only my hobby and I would like to continue that way, as it makes a separation of hobby and work time easier. I love coding as a hobby, it’s fun to me. But when you do the same stuff at work as in your personal time the line gets very blurry; so I’d prefer things to stay as they are.

Rust has a lot of untapped potential. The community is right now so focused on the performance aspect that does not realize yet that Rust can take over other general programming languages. Rust is gaining adoption slowly but steady an I expect to see “Rust everywhere” roughly around 2030. If I can help in any way to make this happen, it’s worth it for me.

I realized this when coding zzping. Suddenly I saw that I was getting things done really fast, that the speed of development was near Python. And when I needed something quick and dirty and went back to Python it was really unpleasant, it felt that in Rust I could have done it faster.

There are several pain points in Rust for me. Mutable statics and lifetimes are the most common parts that slow me down. But I got used to them now, so usually I remember how to workaround them.

But the changes I would like to see are not only for me, but mainly for newcomers. I didn’t take very long to learn Rust and for the most part it makes sense to me how the language is designed. But not a lot of people has good C++ background and for these most of the concepts may feel fanciful, hard to understand and use, and might steer some of these people away from Rust.

It doesn’t need to be that way. The main problem is that the learning process for Rust is currently crashing against the borrow checker until you get it. It’s like playing Paperboy but for programmers.

I hope I can contribute something to get more people in Rust and see a wider usage outside of performance reasons. My Rust knowledge is still limited, but now that I still am somewhat a newcomer it’s the moment to try to push for this. If I get too comfortable with Rust I will no longer see any of its problems anymore.

Rust – What made it “click” for me (Ownership & memory internals)

This is aimed for people who are coming to Rust from garbage collected languages, such as Python or JavaScript and have trouble with the compiler throwing errors without reason. This guide assumes you already know some Rust.

For those like me that worked with non-GC languages (C, C++) the borrow checker still feels hard to understand at first, but the learning curve is quite less and the documentation and design makes quite a lot of sense.

So I want to add some basic context for those that never managed manually requesting and freeing memory such that the underlying design in Rust ownership system makes more sense and clicks. Because this is all it takes, once it clicks, everything seems to fall together. (If you already know C or C++, you’ll already know most of the stuff I’m going to talk about, and probably notice that I’m oversimplifying a lot; but it might be worth the read as it contains how Rust “clicked” for me)

First of all I want you to consider a small snippet of code in Rust:

let a: String = "Hello world".to_string();
let b: String = a;

Here’s the question: What is the final value for “a” and “b” variables?

Think about it.

Usually in GC languages assignment can do two things depending on the types involved. For basic types (i.e. numeric) the value inside ‘a’ is usually copied into ‘b’, so you end with two variables with the same content. For complex types it is common to instead of copying the data just make “b” point to “a”, so internally they share the same data.

But this is not the case for Rust. In the snippet above “a” is moved into “b”, and “a” is freed afterwards. It does not exist anymore and does not hold any value. Puzzled? So was I. But don’t worry, this will make sense at the end.

By default Rust will move the data instead of copying it. This means that it will copy byte by byte from one place to the new one and then it will remove the original copy, returning the memory to the operating system.

I know. This feels stupid, but it comes from the ownership rules in Rust. It also has its special cases, for example, for simple types it will copy the contents but the original variable will not be freed. I will cover this later on.

Basic overview of a program’s memory

As said earlier, C++ developers have an advantage on understanding Rust because they have a mental model of what the memory looks like, pointers and so on. We need to cover some ground here to get proper understanding, but I don’t want to get in-depth, so I’ll try to get it as simple as possible.

But let’s be fair: this is going to be more in-depth than most people would like to be. I’m sorry, but I think this is needed for later.

Memory pointers

Have you thought about how the values are actually stored in memory? How does it organize different things as integers, floating points and strings? How can the program differentiate between two variables?

Memory can be thought as a list of bytes of roughly 264 items:

let mem: Vec<u8> = vec![0; 18_000_000_000_000_000_000];

(Note: This Rust syntax just creates a Vector of unsigned numbers, 8 bit each, with a size of 18,000,000,000,000,000,000 elements, initialized with the value ‘0’ for all elements)

When you declare a variable, the program needs to decide where to put it in this list. The position of a variable in this list would represent the memory address. And of course you could store this index in the list in another variable, this is called a pointer and it’s represented by the ampersand (&). Memory addresses are usually represented in hexadecimal.

Let’s say we want the variable “a” to be stored at index 0x100, so we could create a pointer to “a”:

let p_a: &i32 = 0x100;

Now we could use the dereferencing operator (*) to read the contents of the memory at that address:

let a: i32 = *p_a;

In our imaginary example, “*p_a” will equate to “mem[p_a]” with a catch. Because memory is bytes and the variable is 4 bytes long, it needs to read all 4 indices, and would do something like this:

let a: i32 = mem[p_a] << 24 + mem[p_a + 1] << 16 
           + mem[p_a + 2] << 8 + mem[p_a + 3] << 0;

Notice that in this example we decided that the first byte will be mapped to the highest value of the integer while the last byte is the least value of it. This depends on the processor and for this case big-endian is assumed. Most consumer processors are little-endian and store the lower part of the value first.

All these nifty details are done under the hood by your programming language, including Rust, C and C++. Some of this is even done internally in your processor directly.

While we don’t need any of this to code Rust or C, a basic understanding here does help to understand the design decisions of programming languages, specially those without GC.

Now we have a variable of 4 bytes in memory. Where would we put another variable of 8 bytes? Consider this:

let p_a: &i32 = 0x100;  // original pointer
// ----
let p_b: &i64 = 0x102;
let p_c: &i64 = 0x0FA;
let p_d: &i64 = 0x106;

For these three positions (b,c,d), they all have problems as they’re meant to hold a 8 byte variable (i64 is 64bits long, which is 8 bytes). The previous variable actually spans from 0x100 to 0x103 (both included), so this means that p_b is overlapping two bytes. If this is done, changing “b” would change “a” and vice-versa, in a very strange manner.

The reverse happens for “c”, because it needs 8 bytes, it will span from 0x0FA to 0x101, also overlapping 2 bytes with “a”.

The last one, “d”, does not cause any overlap, and it would work. But the problem here is that it leaves a gap of a few bytes between “d” and “a”, which will be hard to fill later. Usually the compiler and the operating system want the values in memory packed together so they use less memory and don’t leave gaps, as gaps are almost impossible to fill.

Key points to remember:

  • Memory is a flat list of bytes. The compiler and processor takes this into account to be able to have anything bigger than a byte.
  • Pointers are used internally all over the place to make any program work.
  • Compilers must know how big is the data behind a pointer to memory in order to read/write it correctly.
  • Endianness (big or little) matters when handling memory manually byte by byte.

Virtual memory and initialization

Going back to the memory example:

let mem: Vec<u8> = vec![0; 18_000_000_000_000_000_000];

You might wonder why I decided to give this imaginary vector roughly 264 elements, almost 16 Exbibytes which is clearly above any quantity in RAM possible.

(Note: I keep saying “roughly” 264 bytes because some parts might be reserved)

Let me ask a different question. Do you think that the memory from other programs running would be in the same place? in such a way, a program would need to avoid colliding variables on the same memory as other programs to avoid corruption.

The answer is usually no, but it depends. Your program’s memory is isolated from other programs and you cannot see or touch their memory. In fact a pointer to 0x100 in different programs maps to different physical locations memory. This is because the Operating System provides a virtual memory layout. This virtual memory is usually 264 in size for 64 bit platforms, so each program technically has its own 16 EiBytes of memory available even on computers with 1 GiByte of RAM.

The “depends” part is because some OS/platforms do not provide this abstraction, and also because you might be running a program without an operating system at all. But since my guess is that you’re currently using a GC language, most likely you’re not interested at all in doing this, so we can assume that a program always works with virtual memory. So for now on we’ll be assuming that our program runs in virtual memory.

The next question to ask is the initial value in memory. Do you think it comes initialized at zero or some crap/past data? Actually it can be anything, this truly depends on the OS and its configuration. So never assume that the initial value is zero or that the initial value is crap data. Zeroing may be used to avoid leaking private information to other programs.

Most GC languages (like Go) will initialize memory at zero for you. And most non-GC languages (like C++ or Rust) will try to prevent you from reading uninitialized memory, which you have to do manually.

Rust can allow reading uninitialized memory, but only within “unsafe” blocks. This can be used to avoid initializing memory twice on complex algorithms. C, C++ and Rust trusts the programmer knows what they’re doing, but the difference with Rust is that regular Rust has all the safeguards in, whereas in C++ you need to be careful at all times. (I almost never use unsafe, and if you come from GC languages you should also avoid using it. All programs you’re used to do can be written without using unsafe)

Key points to remember:

  • Programs typically run in virtual memory, which is way larger than the installed RAM size.
  • Memory usually comes uninitialized, with random data on it. But some OS might initialize it for security reasons.
  • Programming languages either zero it for you or try to prevent you from reading uninitialized memory.

Memory allocation and deallocation

Last thing on memory and we’ll move to less theoretical topics. Let’s talk about allocating memory.

Before, we were assuming you could do something like this:

let p_a: &i32 = 0x100;
let a: i32 = *p_a;

This does not work in Rust. But the counterpart in C kind of does:

int &p_a = 0x100;
int a = *p_a;

Rust will not let us manipulate memory directly unless we’re using unsafe code. I’m no expert on unsafe, so I’m not even going to try that. The point here is that the equivalent C code, even if it compiles, doesn’t work.

The reason is that “p_a” points to unallocated memory, which is different from uninitialized memory. A program cannot access any point in memory unless it’s allocated by the OS. To do this it needs to call the OS to request memory. In C this is done by the alloc() function family, where the most known one is malloc().

When a program requests memory, it doesn’t ask for a particular point in memory, but for a specific size instead:

int &p_a = (int*) malloc(sizeof(int));

So the memory address is chosen by the operating system (or the allocator), and the program has no influence over it.

Now we can use that memory, read and write into it. Just remember that until you write on it, the contents are undefined and platform dependent.

When we finish with that memory and we no longer need it we should free it; basically we should tell the OS that we’re finished with it so they can reuse that chunk for other programs. If we keep allocating but we never free our program would have a memory leak and will keep growing in size.

In Rust we don’t need to worry about allocating and freeing memory as it’s managed for us. There’s no risk of a memory leak, except for recursive data structures (depending on implementation) for which Rust has the same risks as any GC language.

This directly contrasts with C where you need to manually allocate and free the memory properly.

In C freeing memory looks like this:

free(p_a);

And in Rust would be:

drop(a);

As said before, in Rust you don’t need to worry about this. But if you want to release memory early, you can. This is commonly used to invoke the destructor early rather than to actually free memory (it’s something useful when handling locks between threads).

Other common pitfalls in C with freeing memory are the use-after-free and double-free. The names are self explanatory: If you free something and then use it (read or write), it’s an error. If you free something twice, it’s an error.

Key points to remember:

  • Internally, memory needs to be allocated and freed from/to the operating system.
  • A program cannot choose which address will be allocated.
  • Freeing is important to avoid memory leaks, but this is handled by Rust.
  • Forget that unsafe exists in Rust. Regular Rust is enough for anything you can imagine in a GC language. Leave unsafe for the experts (which is definitely not me).

Stack and Heap

Over this whole section I addressed only dynamic memory allocation with manual malloc and free, which is for heap memory. There’s also the stack for which is managed even in C. For what I want to explain I don’t think we need to really understand what the stack or heap is or what are the differences. 

Rust extends on that approach of automatic allocation and deallocation based on scopes to avoid having a GC.

If you’re confused about stack and heap, and which should you use, let me say that you don’t need to care about this at all. If you’re interested, it’s a great topic, but same as you don’t care in Python or Go, you also don’t need to care in Rust.

Rust will place some stuff on the heap and the majority of variables into the stack. For example, Box<T> places stuff into the heap.

In simple terms, the stack refers to the variables that are tied to a specific code block (between some braces), while the heap refers to dynamically allocated memory.

Key points to remember:

  • Stop worrying about stack or heap and move on.

Composite types in memory

Now that I already gave you a headache with all that stupid stuff about memory internals that no one cares about, we can begin to understand how objects are layed out internally. This will come in handy in the next part.

What objects actually are

In C++ we could do something like this to create an object:

class Card {
  public:
    int number;
    int suit;
};
int main() {
  Card aceOfSpades;
  aceOfSpades.number = 1;
  aceOfSpades.suit = 4;
}

In Rust this would be:

pub struct Card {
   pub number: i64,
   pub suit: i64,
}
fn main() {
    let ace_of_spades = Card {
        number: 1,
        suit: 4,
    };
}

If you’re wondering why I’m using C++ as a reference and not PHP, Javascript or any other “simple” language, the reason is that C++ shares syntax with all those so you should be familiar enough to be able to read it. But those languages don’t map exactly into memory as C++ or Rust does, so to be as correct as possible, I prefer to use C++ as an example. And you usually need types to get something that can be mapped onto memory. 

So, if you come from Python and don’t have any other language to leverage, the best I can come out with is with typed Python:

class Card:
    number: int
    suit: int

ace_of_spades = Card()
ace_of_spades.number = 1
ace_of_spades.suit = 4

Anyway, notice how most languages let you instantiate the object without specifying the contents, so you can write to them later. But not Rust, as it forces us to define the actual content values to instantiate the object.

Remember before when we were talking about uninitialized data? What is happening here is that C++ and Python (and mostly all languages) initialize the contents to zero implicitly when the object is created. So this means that we’re writing twice in order to define a particular value.

The worst problem in all other languages is not performance but lack of exhaustiveness: If you forgot to define the value of any member it will be silently set to zero, no warning. When adding a new field down the line it can be very hard to track all the places where it’s created and some bugs might appear because we forgot to set this new field to something. This simply can’t happen in Rust.

Back to the original topic. How is this layed out internally in memory? Simple, it uses 16 bytes where the first 8 are used for “number” and the latter 8 are used for “suit”. We get something like this:

(Note: I’m using big-endian above. Most computers are little-endian and would write 0x0100000000000000 and 0x0400000000000000 instead. Unless you want to manipulate bytes manually, you don’t need to worry about endianness.)

So far so good, right? This object has a known size at compile time of 16 bytes. If the compiler has an address to a Card, it knows it’s 16 bytes long and knows that the suit is 8 bytes to the right.

This means that:

&ace_of_spades.suit == (&ace_of_spades + 8)

(Note: Be aware that some compilers, specially Rust, has the right to reorder the fields in memory so it’s not guaranteed that the second field will appear after the first one)

Now let’s talk about strings. How big is a String type? If the compiler has a memory address to a string, how many bytes it has to read?

The problem here is that strings can be of any size. From empty strings to full books inside a single variable. It’s not possible to always know the size of a string at compile time. How does it store them?

In C, strings were basically unspecified in length but terminated by 0x00 (or ‘\0’) so all functions had to keep reading until they found this character. This has an obvious downside: if your string contains the ASCII character ‘\0’ in the middle, you cannot read it completely.

In Rust, the String type is basically a pointer to somewhere else and a length:

pub struct String {
  buf: *mut u8,
  len: usize,
}

(Note: Actually, String in Rust is just a Vec<u8>, but Vec itself is using something similar as above; also here I left out the capacity which would add another 8 bytes to the space used)

This makes the String type sized, and in 64 bit platforms this will be 16 bytes long. Regardless of the string contents, the String object is always 16 bytes.

It doesn’t mean that all strings would use only 16 bytes. Obviously the memory required to hold the text is still used, but it’s allocated elsewhere. Compared to C strings, they don’t have any of their downsides, but the problem with this approach is that instead of wasting 1 byte per string, it wastes 16 bytes.

What about the methods? Are they in memory as well? Yes, code is also in memory, but this code is only once in memory and not copied per each object instantiation. It doesn’t matter how many methods a type has, the object has the same size.

Key points to remember:

  • Objects are usually laid out in memory by just concatenating the fields.
  • Objects may contain memory addresses (pointers).
  • Methods do not use space in the object.

Ownership

The most important Rust concept is probably ownership, what does this mean?

Imagine we’re passing data through several functions. Where should the data be allocated and where it should be freed? A garbage collector will wait until no part of code has access or pointers to that data and then proceed to freeing it. This is done while the program is running and does cost cycles. Go is known for having “pauses” where the GC runs.

But C and C++ have a cool way of handling this by leveraging the stack and the scopes. For example, in this code:

int main() {
  Card aceOfSpades;
  aceOfSpades.number = 1;
  aceOfSpades.suit = 4;
}

The variable aceOfSpades is allocated by C++ automatically and when it exits the scope, it is freed also automatically. Cool, right? For such simple cases the memory is managed for us.

It would be nice if this system could be extended to all other cases, when data is passed over other functions and methods, because sometimes data needs to be dropped in an inner function. It would be awesome if a function could know beforehand if it should free after using the data; but the problem is, if a function frees some data and some caller expects that this data still exists, we would get a use-after-free error.

It is hard in C++ to follow this semantic properly across the project; some might use some naming schema for functions to convey this, others might just workaround this by copying/cloning the data instead of passing pointers to avoid the risk.

In Rust, this is enforced by the compiler as it tracks “who owns the data”, and the owner has the duty of freeing the data. Obviously there can be only one owner at any point in time, because if not you’ll get double-free errors.

When a function creates some data, it becomes the owner of this data:

fn main() {
    let ace_of_spades = Card {
        number: 1,
        suit: 4,
    };
}

In here, main owns ace_of_spades and it’s responsible for freeing this data at the end. Therefore, there’s an implicit “drop(ace_of_spades)” at the end of the code block.

So far this is the same. But here’s the cool part in Rust: Ownership can be transferred:

fn main() {
   let ace_of_spades = Card {
       number: 1,
       suit: 1,
   };
   print_card(ace_of_spades);
}
fn print_card(card: Card) {
   println!("Number: {} Suit: {}", card.number, card.suit);
}

Now in the above code print_card receives the ownership of Card and will drop its contents at the end of the function. This means that main() now does not free the memory for ace_of_spades anymore.

But wait, how does Rust know that print_card will drop the data at the end? Because it takes the type “Card” instead of “&Card” or “&mut Card”. Whenever you see a full type in Rust without the ampersand, the value is owned.

For &Card and &mut Card what your function owns is the pointer to the memory address, but not the contents. In the same fashion, for things like Rc<T> or Box<T> the function owns the outer Rc or Box, and the behavior on the inner value depends on the actual type used.

What if we don’t want the type to be dropped? There are two solutions: one is to change print_card to receive a borrowed object (similar to a pointer), while the other way would be to copy the data before sending. The latter is what C++ does under the hood:

   print_card(ace_of_spades.clone());

In Rust we would need to implement how cloning works, but we could also just use “#[derive(Clone)]” on our struct to add the default implementation that just copies everything as-is.

So we can see that Rust by default “moves” the data, while C++ by default copies it.

If we wanted to change the print_card function to avoid freeing inside, it would be just adding an ampersand:

fn print_card(card: &Card) {

That tells the compiler that this function will not own that data and cannot free it. “card” is then treated as a pointer to a “Card” address and dropping the variable will drop only the pointer, not the underlying data.

The remaining function code doesn’t need any change. In C++ you’ll need to change the dot with an arrow to dereference the pointer, while Rust is smart enough to do this for you. Nice! you don’t need to think about dereferencing pointers in Rust.

(Note: “dereferencing” is to apply the asterisk operator “*ptr” where we signal that we want to operate on the contents of what the pointer is pointing to, instead of trying to work on the pointer itself as a variable)

But there is another line that needs to be changed, the call to print_card. As we want to retain ownership, we need to signal this to the compiler as well by adding an ampersand:

   print_card(&ace_of_spades);

By these simple rules Rust can always make a clear cut decision on how memory is allocated and freed, so we don’t have to worry about it. Instead, we need to worry about ownership and borrowing, but these are checked by the compiler and there’s no way to fool it. It is guaranteed that a program that compiles is memory safe and sound.

Could we instead just return the Card object to avoid freeing it? Sure! Let’s see:

fn main() {
    let mut ace_of_spades = Card {
        number: 1,
        suit: 1,
    };
    ace_of_spades = print_card(ace_of_spades);
}

fn print_card(card: Card) -> Card {
    println!("Number: {} Suit: {}", card.number, card.suit);
    return card;
}

This does the trick, “card” is no longer freed on the print_card function while retaining ownership. But this is not a good idea. First, we depend on the compiler to be smart enough to avoid moving the data twice. And second, this is almost an anti-pattern on Rust, it tends to cause more trouble than it solves. As a rule of thumb, if you don’t want the function to consume the data (to free it), don’t ask for ownership.

Ownership and borrowing can be thought as a permission system:

  • The owner has the highest permission. It can do everything it pleases with the data, because it’s their data. If I have my own car, I can do what I please with it, including disposing of it. There is always an owner, as the car needs to be registered to someone.
  • The &mut borrow comes next. It can change the contents of the data because the owner allowed them to do so. On a car, this is the mechanic where they can add/remove stuff from it, but they definitely can’t dispose of it. The car can only be on one mechanic at a time, and while the car is on the mechanic the owner cannot use it. In the same sense, while a &mut exists, the owner must wait until it finishes to be able to use it.
  • Finally the & borrow is shared read-only. It’s like when you allow your friends to see your car and take photos of it. There can be many people doing this at the same time, but while this happens you cannot send the car to the mechanic or dispose of it.

One important thing to remember is that others can make copies of the borrowed data. So for example, if in a function that it uses a shared borrow (&Card), if it needs to change the data it could copy it and then do the pertinent modifications.

For example, consider this function:

fn next_card(card: &Card) -> Card {
    let mut next = card.clone();
    next.number += 1;
    return next;
}

This function receives a borrowed card, and cannot change it. But we want to be able to add one to the number. What do we do? We clone it, then we change our copy. We can return the new copy afterwards.

We could have avoided the copy by having a “&mut Card” instead, then we could have mutated the same data in-place.

The beauty of this system is that the developer knows right away if a function will change the contents of the data we’re passing or not. A function receiving “&Card” will never change its contents, and the caller can continue using it for other stuff assuming it never changed.

Key points to remember:

  • Regular types in Rust are always Owned unless “&” or “&mut” is written before the type.
  • Ownership means that memory will be freed when it exits the scope.
  • There is always one owner. Not less, not more. One.
  • Cloning or copying the data is what other languages implicitly do. Don’t be afraid to do it.
  • Borrowing is the right tool to share data between functions when we don’t want the data to be freed at the end.
  • When creating functions, try to use the least permissive borrow that works.
  • Even if you need to change the data, remember that you can always change a copy of the data. Your function might not require changing the original data.

Copyable types

In Rust, copying has a special meaning. You might have noticed that we talk about cloning and copying as if they were two different things.

Well, because they are. Copying in Rust strictly means implicit byte by byte copying, while cloning is customizable and explicit.

Let’s forget about cloning for now and focus on just copying. Remember the byte representation of a Card struct we discussed before:

Copying this would mean that our program reads the bytes in memory and writes them elsewhere. For a second, it forgets that this is a “Card”. It just knows that it has 16 bytes of data so it does a copy-paste elsewhere.

The new data will have its owner which might be different from the old one. And as discussed before, this copy might happen from read only borrows. You don’t need ownership to be able to copy data.

Now, a question: Would this copying always work? Will the resulting data be correct?

Think about this for a second.

For regular values such as numbers and characters, it does work, no problem. 

But what about memory addresses? What would happen if it contained a pointer to somewhere else?

If the pointer is a shared read-only borrow, this will work out no problem. As there can be as many readers as we like, copying it works. When copying the pointer’s address, it still points to the same position so it must work.

For mutable borrows the only problem is that we would break the rules as there will be more than one pointer to the same address. Other than that, this would work: the pointer is still valid. But Rust will not let you copy a struct containing a &mut pointer to prevent you from breaking the rules.

Therefore there is data that can be copied and data that cannot be copied.

But wait! There is another possibility. The pointer might be something that is actually owned by the struct!

Remember the String implementation from before?

pub struct String {
  buf: *mut u8,
  len: usize,
}

This “buf” is actually owned by the String: when you create a new string, memory is also reserved for the buffer; and when it’s dropped, the contents buffer must be freed as well.

You might be asking now, is this some kind of trickery that only the Rust internals can do? Can we do the same in our Rust structs?

Yes! This is done by the type Box<T>. This stores a pointer to another datatype (or the same if you want to do something recursive) and the underlying data is owned by the same struct. For example:

pub struct HandOfCards {
    card1: Box<Card>,
    card2: Box<Card>,
    card3: Box<Card>,
    card4: Box<Card>,
}

This would make HandOfCards contain 4 pointers to 4 different cards. (Be aware that this implementation does not make sense in real code. I wrote this just to show Box<T>, but in this case it just wastes memory with no benefit.)

In this case, the memory of those “Card” needs to be allocated before creating HandOfCards, but it will be freed automatically when it exits the scope, as usual.

If we wanted to store an indeterminate amount of cards we could use Vec<Card> instead. Vec is similar to Box in the sense that stores a pointer to somewhere else, and when Vec is freed, the contents of that pointer are dropped as well.

Back to copying. If we try to copy those structs byte by byte the problem is that we will copy the pointer, but not the internal data; therefore that data will have now two owners, and not only this breaks the rules, it also will create a double free at some point; because once all the copies are dropped, the internal Box<T> will be freed more than once.

And this is the real reason why not all types can be copied byte by byte. In some cases it will create a double free error.

Which types are copyable is something defined by the Copy Trait. If the struct implements Copy, it is copyable. As easy as this:

impl Copy for Card {}

And now our struct Card is copyable. No need to explain Rust how to do this, or implement any method, as there is only one way of doing this. Usually we implement copy using the derive macro instead; this is just convenience, the macro just writes the code above:

#[derive(Copy)]

But as I said, not all types can be Copy. For example if we try the same for HandOfCards, this happens:

error[E0204]: the trait `Copy` may not be implemented for this type
   --> src/main.rs:234:6
    |
228 |     card1: Box<Card>,
    |     ---------------- this field does not implement `Copy`
...
234 | impl Copy for HandOfCards {}
    |      ^^^^

Because Box<T> does not implement Copy, a struct containing Box<T> can’t implement Copy either.

Turns out that it’s much easier to work with types that implement Copy than with types that don’t. For example, the following code does work only if Card implements Copy:

fn main() {
    let ace_of_spades = Card {
        number: 1,
        suit: 1,
    };
    print_card(ace_of_spades);
    print_card(ace_of_spades);
}

If it doesn’t implement Copy, the first print_card consumes ace_of_spades and it no longer exists when the second call is done. This program would not compile unless the Copy trait is implemented. When this is the case, ace_of_spades is copied for each of the calls, similar to what C++ does.

Key points to remember:

  • Copy in Rust means a byte by byte copying, without understanding the contents of the type.
  • Memory addresses can prevent Copy from working correctly, therefore some of their uses in a struct will forbid it from being copyable.
  • Shared read-only borrows (&var) are fine for copy but mutable ones (&mut) are not.
  • Box<T> can be used to have a pointer to owned data, but this also prevents the struct from implementing Copy.
  • Remember to implement Copy if possible. This will make your life easier.

Exceptions on copying and cloning everything

I know, I said to copy or clone everything and forget. But there are a few gotchas that we should cover about this. I lied a bit to make things easier.

First and foremost, C++ does not always clone stuff. Cloning is kind of a deep copy. For simple types it will copy them by default, but for complex ones it depends on the implementation. So take that with a pinch of salt.

Cloning too much has its drawbacks, obviously this will waste cycles. For small things it will not make a difference, but if you clone big values of course it will take time. And of course this also depends on how many times your program does this clone. (i.e. it’s not the same to do 10 clones than doing a single clone in a for loop of 1 million iterations)

Same applies to implicit copies. It’s a bit harder to get a lot of data copied than cloned, but it’s definitely possible (for example with arrays [T]).

Passing borrowed values to functions (&T or &mut T) instead of the value (T) will help preventing unnecessary copies.

Always copying/cloning can also lead to disappearing changes: you might accidentally write to a different copy than the one you intended to. Using a &mut reference or a Rc<T> could help.

Finally, implementing Copy on some types might be a bad idea as it could end with unintended behavior. For example, Iterators are not expected to implement Copy, as you could end with different copies of the iterators instead of consistently using the first one created. Deliberately not implementing Copy is a way for the author to convey how the type is intended to be used.

The main reason for me to insist on cloning being “good” is that I had a hard time when I started on Rc<T> types (kinda like garbage collected). Once I got the hang of cloning, it was much easier, and it turned out that Rc<T> (and other similar types) is meant to be cloned. Cloning is not bad unless there’s a lot of data to clone, and even if there is, it can be still fine when used in non performance critical parts of the program.

Move semantics

In Rust, all types are “move”, which means that creating a copy of the data must be valid as long as the initial version is destroyed. So, in this fashion I should be able to get any type and change its memory address from 0x100 to 0x200 by copying the data and removing the original and it should work.

This, of course, only works if there are no pointers to the initial data; this means that there are no borrows, either immutable or mutable. In the end, what this tells us is that ownership is required to move the data.

With one exception. Rust has std::mem::swap which accepts two &mut pointers and this moves the data. Because the contents are switched with another instance of the same type, this must be valid as well.

Now, does this work for every data type? Can we put anything we want in a struct and this trick still works?

Almost. There is one case where this fails completely. (I kind of hate that there are so many exceptions to the rule in programming)

If you build a self-referential struct, this fails. Let me explain.

Imagine you want a struct where it owns some data, but exposes a read-only buffer to it via a member; the point of this could be just preventing others from modifying it without going through the controlled methods:

struct MyData {
    buf: Vec<u8>,
    pub buffer: &Vec<u8>, // This must always point to &buf
}

This struct will break when moved, because *buffer will point to the old place whenever this moves and changes the memory address of MyData. Remember, “MyData.buf” address is just “&MyData+0”, so it depends on where the struct is placed in memory.

Because of this reason, Rust will not allow you to safely build self-referential structs. It is forbidden via the lifetime of the borrow, as you’ll need it to exist for the lifetime of the struct, and it’s not possible to tie these two in this way.

(The solution for the above code is to return the borrowed pointer in a method, in this way you don’t need it to be stored; But in real code this happens in very contrived scenarios that might not have an easy solution)

In my way of learning Rust I tried several times to create a self-referential one without noticing and I ended fighting lifetimes with the borrow checker for hours until I gave up. It’s directly impossible, because this breaks move semantics but Rust will blame the lifetimes because it doesn’t understand that this is self-referential. And it doesn’t understand these because it can’t support them without unsafe code.

Did I say unsafe? Can this be built on unsafe code? Oh, yes, it can. But it will break in crazy ways every time it is used. Rust will move the contents in memory often without warning as it is implicit. It would be really hard to use them without making a mess.

So, are they really impossible to do in Rust in a correct way? Not quite. There’s something called Pin<T> for these purposes.

Let’s think a bit. If moving is only possible with ownership or a &mut reference, if we hide the variable under a type that doesn’t allow these, the inner variable is guaranteed to not move in memory. The outer type can expose its own memory address that can be moved freely, but the underlying pointer is fixed in memory. This basically means that only &T borrows are allowed.

This is exactly what Pin<T> does. But to use Pin<T> to effectively make self-referential structs still requires unsafe. This is because, again, Rust does not have tooling to explain this to the compiler.

(Note: Docs might also point to a different reason, Pin<T> will not allow creation of Pin<T> where T is Unpin)

Ideally, you don’t want to make a self-referential struct in Rust. Instead you should leverage other  types to accomplish the same thing; for example Rc<T> can be used for things like linked-lists or trees. Also, have a look for libraries that might do this work for you, as these are very easy to do wrong.

It’s possible to encounter Pin<T> in Rust, just unlikely. The most common place is async programming where the future trait requires Pinning to implement. Pin<T> can be created without unsafe as long as the data follows move semantics; so for regular programming Pin<T> is just something that you might need to create or access. A bit of a burden, but that’s it.

As a side note, let me add that Rust does not guarantee that memory does not leak. Correct Rust programs will not leak, but it’s actually quite easy to make a program in Rust that leaks memory without unsafe. As with GC languages, if you have Rc<T> values in a cyclic reference, they will fail to free memory. And also there’s std::mem::forget which will remove a variable from scope, making it unreachable, but will not call destructors and will not free it; therefore causing a leak.

Key points to remember:

  • All types in Rust are “Move”, meaning they can change their memory address without problems.
  • mem::swap can be used with two &mut references to swap the contents of two variables. This is also considered a move.
  • Self-referential data structures cannot be made in Rust without unsafe, because they can’t be moved safely.
  • Self-referential data with unsafe still must use Pin<T> to ensure the data does not move around.
  • The recommendation is to avoid them entirely and use Rc<T> instead. If possible, use libraries and don’t implement trees or linked lists yourself.
  • You might have to work with Pin<T> when doing async programming.

Back to the beginning

Remember we got puzzled by this simple code? Have a look again:

let a: String = "Hello world".to_string();
let b: String = a;

Now if I ask you what’s happening here, it should be easier to reason. First this creates a new String type and stores it into “a”. Then, because String can’t be Copy (it must contain an owned reference to a buffer for the text), it can only be moved. Therefore Rust copies the contents of “a” into “b” and drops the memory for “a”. There’s no other way around to make this possible, so now it does make sense.

(Note: This case is too simplistic and it doesn’t make sense for Rust to move the data. It will instead point “b” to the address of “a”, and forget “a”. This is one of many optimizations inside Rust that will come into play to reduce the code produced to the absolute minimum)

You wanted to have two different copies? Sure!

let b: String = a.clone();

You wanted to have two variables pointing to the same thing, so it doesn’t use twice the memory? Sure!

let b = &a;

Once we understand what’s happening under the hood, the behavior becomes self evident, right?

If instead of a string we had a number, these are copyable, so in this case Rust will copy and not drop:

let a: i64 = 123;
let b: i64 = a;
dbg!(a,b); // This will now print both variables

Because “a” doesn’t need to be dropped, as it still is consistent, it remains valid after assigning, and we end with two variables with the same content.

I hope this helps understanding why Rust acts in the way it does. I know that lifetimes are still missing, but this was a bit too much already; maybe later I’ll write something about them. In the meantime, please let me know if it helped or any questions!

Rust vs Python: Rust will not replace Python

I love Python, I used it for 10+ years. I also love Rust, I have been learning it for the last year. I wanted a language to replace Python, I looked into Go and became disappointed. I’m excited about Rust, but it’s clear to me that it’s not going to replace Python.

In some parts, yes. There are small niches where Rust can be better than Python and replace it. Games and Microservices seem ones of the best candidates, but Rust will need a lot of time to get there. GUI programs have also a very good opportunity, but the fact that Rust model is too different from regular OOP makes it hard to integrate with existing toolkits, and a GUI toolkit is not something easy to do from scratch.

On CLI programs and utilities, Go is probably to prevent Rust from gaining some ground here. Go is clearly targeted towards this particular scenario, is really simple to learn and code, and it does this really well.

What Python lacks

To understand what are the opportunities from other languages to replace Python we should first look to what are the shortfalls of Python.

Static Typing

There are lots of things that Python could improve, but lately I feel that types are one of the top problems that need to be fixed, and it actually looks it’s fixable.

Python, like Javascript, is completely not typed. You can’t easily control what are the input and output types of functions, or what are the types of local variables.

There’s the option now to type your variables and check it with programs like MyPy or PyType. This is good and a huge step forward, but insufficient.

When coding, having IDE autocompletion, suggestions and inspection helps a lot when writing code, as it speeds up the developer by reducing round-trips to the documentation. On complex codebases it really helps a lot because you don’t need to navigate through lots of files to determine what’s the type that you’re trying to access.

Without types, an IDE is almost unable to determine what are the contents of a variable. It needs to guess and it’s not good. Currently, I don’t know of any autocompletion in Python solely based on MyPy.

If types were enforced by Python, then the compiler/interpreter could do some extra optimizations that aren’t possible now.

Also, there’s the problem of big codebases in Python with contributions of non-senior Python programmers. A senior developer will try to assume a “contract” for functions and objects, like, what are the “valid” inputs for that it works, what are valid outputs that must be checked from the caller. Having strict types is a good reminder for not so experienced people to have consistent designs and checks.

Just have a look on how Typescript improved upon JavaScript by just requiring types. Taking a step further and making Python enforce a minimum, so the developer needs to specify that doesn’t want to type something it will make programs easier to maintain overall. Of course this needs a way to disable it, as forcing it on every scenario would kill a lot of good things on python.

And this needs to be enforced down to libraries. The current problem is that a lot of libraries just don’t care, and if someone wants to enforce it, it gets painful as the number of dependencies increase.

Static analysis in Python exists, but it is weak. Having types enforced would allow to better, faster, and more comprehensive static analysis tools to appear. This is a strong point in Rust, as the compiler itself is doing already a lot of static analysis. If you add other tools like Cargo Clippy, it gets even better.

All of this is important to keep the codebase clean and neat, and to catch bugs before running the code.

Performance

The fact that Python is one of the slowest programming languages in use shouldn’t be news to anyone. But as I covered before in this blog, this is more nuanced than it seems at first.

Python makes heavy use of integration with C libraries, and that’s where its power unleashes. C code called from Python is still going at C speed, and while that is running the GIL is released, allowing you to do a slight multithreading.

The slowness of Python comes from the amount of magic it can do, the fact that almost anything can be replaced, mocked, whatever you want. This makes Python specially good when designing complex logic, as it is able to hide it very nicely. And monkey-patching is very useful in several scenarios.

Python works really well with Machine Learning tooling, as it is a good interface to design what the ML libraries should do. It might be slow, but a few lines of code that configure the underlying libraries take almost zero time, and those libraries do the hard work. So ML in Python is really fast and convenient.

Also, don’t forget that when such levels of introspection and “magic” are needed, regardless of the language, it is slow. This can be seen when comparing ORMs between Python and Go. As soon as the ORM is doing the magic for you, it becomes slow, in any language. To avoid this from happening you need an ORM that it’s simple, and not that automatic and convenient.

The problem arises when we need to do something where a library (that interfaces C) doesn’t exist. We end coding the actual thing manually and this becomes painfully slow.

PyPy solves part of the problem. It is able to optimize some pure python code and run it to speeds near to Javascript and Go (Note that Javascript is really fast to run). There are two problems with this approach, the first one is that the majority of python code can’t be optimized enough to get good performance. The second problem is that PyPy is not compatible with all libraries, since the libraries need to be compiled against PyPy instead of CPython.

If Python were stricter by default, allowing for wizardry stuff only when the developer really needs it, and enforcing this via annotations (types and so), I guess that both PyPy and CPython could optimize it further as it can do better assumptions on how the code is supposed to run.

The ML libraries and similar ones are able to build C code on the fly, and that should be possible for CPython itself too. If Python included a sub-language to do high-performance stuff, even if it takes more time to start a program, it would allow programmers to optimize the critical parts of the code that are specially slow. But this needs to be included on the main language and bundled on every Python installation. That would also mean that some libraries could get away with pure-python, without having to release binaries, which in turn, will increase the compatibility of these with other interpreters like PyPy.

There’s Cython and Pyrex, which I used on the past, but the problem on these is that it will force you to build the code for the different CPU targets and python versions, and that’s hard to maintain. Building anything on Windows is quite painful.

The GIL is another front here. By only allowing Python to execute a instruction at once, threads cannot be used to distribute pure python CPU intensive operations between cores. Better Python optimizations could in fact relief this by determining that function A is totally independent of function B, and allowing them to run in parallel; or even, they could build them into non-pythonic instructions if the code clearly is not making use of any Python magic. This could allow for the GIL to be released, and hence, parallelize much better.

Python & Rust together via WASM

This could solve great part of the problems if it works easy and simple. WebAssembly (WASM) was thought as a way to replace Javascript on browsers, but the neat thing is that creates code that can be run from any programming language and is independent of the CPU target.

I haven’t explored this myself, but if it can deliver what it promises, it means that you only need to build Rust code once and bundle the WASM. This should work on all CPUs and Python interpreters.

The problem I believe it is that the WASM loader for Python will need to be compiled for each combination of CPU, OS and Python interpreter. It’s far from perfect, but at least, it’s easier to get a small common library to support everything, and then other libraries or code to build on top of it. So this could relief some maintenance problems from other libraries by diverting that work onto WASM maintainers.

Other possible problem is that WASM will have it hard to do any stuff that it’s not strictly CPU computing. For example, if it has to manage sockets, files, communicate with the OS, etc. As WASM was designed to be run inside a browser, I expect that all OS communication would require a common API, and that will have some caveats for sure. While the tasks I mentioned before I expect them to be usable from WASM, things like OpenGL and directly communicating with a GPU will surely have a lack of support for long time.

What Rust Lacks

While most people will think that Rust needs to be easier to code, that it is a complex language that it requires a lot of human hours to get the code working, let me heavily disagree.

Rust is one of the most pleasant languages to code on when you have the expertise on the language. It is quite productive almost on the level of Python and very readable.

The problem is gaining this expertise. Takes way too much effort for newcomers, especially when they are already seasoned on dynamic-typed languages.

An easier way to get started in Rust

And I know that this has been said a lot by novice people, and it has been discussed ad-infinitum: We need a RustScript language.

For the sake of simplicity, I named RustScript to this hypothetical language. To my knowledge, this name is not used and RustScript does not exist, even if I sound like it does.

As I read about others proposing this, please keep reading as I already know more or less what has been proposed already and some of those discussions.

The main problem with learning Rust is the borrow-checking rules, (almost) everyone knows that. A RustScript language must have a garbage collector built in.

But the other problem that is not so talked about is the complexity of reading and understanding properly Rust code. Because people come in, try a few things, and the compiler keeps complaining everywhere, they don’t get to learn the basic stuff that would allow them to read code easily. These people will struggle even remembering if the type was f32, float or numeric.

A RustScript language must serve as a bootstrapping into Rust syntax and features of the language, while keeping the hard/puzzling stuff away. In this way, once someone is able to use RustScript easily, they will be able to learn proper Rust with a smaller learning curve, feeling familiar already, and knowing how the code should look like.

So it should change this learning curve:

Into something like this:

Here’s the problem: Rust takes months of learning to be minimally productive. Without knowing properly a lot of complex stuff, you can’t really do much with it, which becomes into frustration.

Some companies require 6 months of training to get productive inside. Do we really expect them also to increase that by another 6 months?

What it’s good about Python it’s that newcomers are productive from day zero. Rust doesn’t need to target this, but the current situation is way too bad and it’s hurting its success.

A lot of programming languages and changes have been proposed or even done but fail to solve this problem completely.

This hypothetical language must:

  • Include a Garbage Collector (GC) or any other solution that avoids requiring a borrow checker.
    Why? Removing this complexity is the main reason for RustScript to exist.
  • Have almost the same syntax as Rust, at least for the features they have in common.
    Why? Because if newcomers don’t learn the same syntax, then they aren’t doing any progress towards learning Rust.
  • Binary and Linker compatible with Rust; all libraries and tooling must work inside RustScript.
    Why? Having a complete different set of libraries would be a headache and it will require a complete different ecosystem. Newcomers should familiarize themselves with Rust libraries, not RustScript specific ones.
  • Rust sample code must be able to be machine-translated into RustScript, like how Python2 can be translated into Python3 using the 2to3 tool. (Some things like macro declarations might not work as they might not have a replacement in RustScript)
    Why? Documentation is key. Having a way to automatically translate your documentation into RustScript will make everyone’s life easier. I don’t want this guessing the API game that happens in PyQT.
  • Officially supported by the Rust team itself, and bundled with Rust when installing via RustUp.
    Why? People will install Rust via RustUp. Ideally, RustScript should be part of it, allowing for easy integration between both languages.

Almost any of these requirements alone is going to be hard to do. Getting a language that does everything needed with all the support… it’s not something I expect happening, ever.

I mean, Python has it easier. What I would ask to Python is way more realizable that what I’m asking here, and yet in 10 years there’s just slight changes in the right direction. With that in mind, I don’t expect Rust to ever have a proper RustScript, but if it happens, well, I would love to see it.

What would be even better is that RustScript were almost a superset of Rust, making Rust programs mostly valid in RustScript, with few exceptions such as macro creation. This would allow developers to incrementally change to Rust as they see fit, and face the borrow checker in small amounts, that are easy to digest. But anyway, having to declare a whole file or module as RustScript would still work, as it will allow devs to migrate file by file or module by module. That’s still better than having to choose between language X or Y for a full project.

Anyway, I’d better stop talking about this, as it’s not gonna happen, and it would require a full post (or several) anyways to describe such a language.

Proper REPL

Python is really good on it’s REPL, and a lot of tools make use of this. Rust REPL exist, but not officially supported, and they’re far from perfect.

A REPL is useful when doing ML and when trying out small things. The fact that Rust needs to compile everything, makes this quite useless as it needs boilerplate to work and every instruction takes time to get built interactively.

If Rust had a script language this would be simpler, as a REPL for scripting languages tends to be straightforward.

Simpler integration with C++ libraries

Given that both Rust and Python integrate only with C and not C++ would make anyone think that they are on the same level here; but no. Because Python’s OOP is quite similar to C++ and it’s magic can make for the missing parts (method overloading), in the end Python has way better integration with C++ than Rust.

There are a lot of ongoing efforts to make C++ integration easier in Rust, but I’m not that sure if they will get at any point something straightforward to use. There’s a lot of pressure on this and I expect it to get much, much better in the next years.

But still, the fact that Rust has strict rules on borrowing and C++ doesn’t, and C++ exceptions really don’t mix with anything else in Rust, it will make this hard to get right.

Maybe the solution is having a C++ compiler written in Rust, and make it part of the Cargo suite, so the sources can be copied inside the project and build the library for Rust, entirely using Rust. This might allow some extra insights and automation that makes things easier, but C++ is quite a beast nowadays, and having a compiler that supports the newest standards is a lot of work. This solution would also conflict with Linux distributions, as the same C++ library would need to be shipped twice in different versions, a standard one and a Rust-compatible one.

Lack of binary libraries and dynamic linking

All Rust dependencies currently rely on downloading and building the sources for each project. Because there so many dependencies, building a project takes a long time. And distributing our build means installing a big binary that contains everything inside. Linux distributions don’t like this.

Having pre-built libraries for common targets it would be nice, or if not a full build, maybe a half-way of some sort that contains the most complex part done, just requiring the final optimization stages for targeting the specific CPU; similar to what WASM is, *.pyc or the JVM. This would reduce building times by a huge amount and will make development more pleasant.

Dynamic linking is another point commonly overlooked. I believe it can be done in Rust but it’s not something that they explain on the regular books. It’s complex and tricky to do, where the regular approach is quite straightforward. This means that any update on any of your libraries require a full build and a full release of all your components.

If an automated way existed to do this in Cargo, even if it builds the libraries in some format that can’t be shared across different applications, it could already have some benefits from what we have. For example, the linking stage could take less time, as most of the time seems to be spent trying to glue everything together. Another possible benefit is that as it will produce N files instead of 1 (let’s say 10), if your application has a way to auto-update, it could update selectively the files needed, instead of re-downloading a full fat binary.

To get this to work across different applications, such as what Linux distributions do, the Rust compiler needs to have better standards and compatibility between builds, so if one library is built using rustc 1.50.0 and the application was built against 1.49.0, they need to work. I believe currently this doesn’t work well and there are no guarantees for binary compatibility across versions. (I might be wrong)

On devices where disk space and memory is constrained, having dynamic libraries shared across applications might help a lot fitting the different projects on such devices. Those might be microcontrollers or small computers. For our current desktop computers and phones, this isn’t a big deal.

The other reason why Linux distributions want these pieces separated is that when a library has a security patch, usually all it takes is to replace the library on the filesystem and you’re safe. With Rust applications you depend on each one of the maintainers of each project to update and release updated versions. Then, a security patch for an OS instead of being, say, 10MiB, it could be 2GiB because of the amount of projects that use the same library.

No officially supported libraries aside of std

In a past article Someone stop NodeJS package madness,┬áplease!!, I talked about how bad is the ecosystem in JavaScript. Because everyone does packages and there’s no control, there’s a lot of cross dependency hell.

This can happen to Rust as it has the same system. The difference is that Rust comes with “std”, which contains a lot of common tooling that prevents this from getting completely out of hand.

Python also has the same in PyPI, but turns out that the standard Python libraries cover a lot more functionality than “std”. So PyPI is quite saner than any other repository.

Rust has its reasons to have a thin std library, and probably it’s for the best. But something has to be done about the remaining common functionality that doesn’t cover.

There are lots of solutions. For example, having a second standard library which bundles all remaining common stuff (call it “extra_std” or whatever), then everyone building libraries will tend to depend on that one, instead of a myriad of different dependencies.

Another option is to promote specific libraries as “semi-official”, to point people to use these over other options if possible.

The main problem of having everyone upload and cross-depend between them is that these libraries might have just one maintainer, and that maintainer might move on and forget about these libraries forever; then you have a lot of programs and libraries depending on it unaware that it’s obsolete from long ago. Forking the library doesn’t solve the problem because no one has access to the original repo to say “deprecated, please use X”.

Another problem are security implications from doing this. You depend on a project that might have been audited on the past or never, but the new version is surely not audited. In which state is the code? Is it sound or it abuses unsafe to worrying levels? We’ll need to inspect it ourselves and we all know that most of us would never do that.

So if I were to fix this, I would say that a Rust committee with security expertise should select and promote which libraries are “common” and “sane enough”, then fork them under a slightly different name, do an audit, and always upload audited-only code. Having a group looking onto those forked libraries means that if the library is once deprecated they will correctly update the status and send people to the right replacement. If someone does a fork of a library and then that one is preferred, the security fork should then migrate and follow that fork, so everyone depending on it is smoothly migrated.

In this way, “serde” would have a fork called something like “serde-audited” or “rust-audit-group/serde”. Yes, it will be always a few versions behind, but it will be safer to depend on it than depending on upstream.

No introspection tooling in std

Python is heavy on introspection stuff and it’s super nice to automate stuff. Even Go has some introspection capabilities for their interfaces. Rust on the other hand needs to make use of macros, and the sad part is that there aren’t any officially supported macros that makes this more or less work. Even contributed packages are quite ugly to use.

Something that tends to be quite common in Python is iterating through the elements of a object/struct; their names and their values.

I would like to see a Derive macro in std to add methods that are able to list the names of the different fields, and standardize this for things like Serde. Because if using Serde is overkill for some program, then you have to cook these macros yourself.

The other problem is the lack of standard variadic types. So if I were to iterate through the values/content of each field, it becomes toilsome to do and inconvenient, because you need to know in advance which types you might receive and how, having to add boilerplate to support all of this.

The traits also lack some supertraits to be able to classify easily some variable types. So if you want a generic function that works against any integer, you need to figure out all the traits you need. When in reality, I would like to say that type T is “int-alike”.

Personal hate against f32 and f64 traits

This might be only me, but every time I add a float in Rust makes my life hard. The fact that it doesn’t support proper ordering and proper equality makes them unusable on lots of collection types (HashMaps, etc).

Yes, I know that these types don’t handle equality (due to imprecision) and comparing them is also tricky (due to NaN and friends). But, c’mon… can’t we have a “simple float”?

On some cases, like configs, decimal numbers are convenient. I wouldn’t mind using a type that is slower for those cases, that more or less handles equality (by having an epsilon inbuilt) and handles comparison (by having a strict ordering between NaN and Inf, or by disallowing it at all).

This is something that causes pain to me every time I use floats.

Why I think Rust will not replace Python

Take into account that I’m still learning Rust, I might have missed or be wrong on some stuff above. One year of practising on my own is not enough to have enough context for all of this, so take this article with a pinch of salt.

Rust is way too different to Python. I really would like Rust to replace my use on Python but seeing there are some irreconcilable differences makes me believe that this will never happen.

WASM might be able to bridge some gaps, and Diesel and other ORM might make Rust a better replacement of Python for REST APIs in the future.

On the general terms I don’t see a lot of people migrating from Python to Rust. The learning curve is too steep and for most of those replacements Go might be enough, and therefore people would skip Rust altogether. And this is sad, because Rust has a lot of potentials on lots of fronts, just requires more attention than it has.

I’m sad and angry because this isn’t the article I wanted to write. I would like to say that Rust will replace Python at some point, but if I’m realistic, that’s not going to happen. Ever.

References

https://blog.logrocket.com/rust-vs-python-why-rust-could-replace-python/

https://www.reddit.com/r/functionalprogramming/comments/kwgiof/why_do_you_think_data_scientists_prefer_python_to/glzce8e/?utm_source=share&utm_medium=web2x&context=3

Released new ping tool in Rust!

A lot of time has passed since my last post. To be sincere, these quarantines have ostracized me and haven’t keep up with almost anything, like hibernating waiting for this thing to go away. After almost a year seems I got some energy back to start writing and doing some other stuff.

I have been playing with Rust a lot. Played with several exercises and different things to get comfortable with it. And now I’m reaching a point where I see that Rust can actually be almost as fast to code as Python (there are still a lot of rough edges though).

In the meantime, during this WFH period, I noticed that my home network is kinda strange. I get some disconnections or weird behavior in anything that requires a real-time connection over the internet. For example, video calls tend to break up often, on-line games display random spikes of lag.

Because of this, I have been looking to ping my router and diagnose the problem. But the thing is, regular ping tools show more or less normal behavior, and to catch any packet loss I need a really aggressive ping that actually is really hard to see.

I searched for other ping tools that better suit this purpose, but what I found was basically paid stuff. It was hard to believe that there wasn’t any open source tool for this. So I thought that this is a good idea for a new Rust project.

And this is how zzping was born. This is a tool that features a daemon pinger that will ping a configured set of hosts at roughly 50 pings per second, store the data on disk and also sends it via UDP to a GUI.

After 1-2 weeks of waiting for approval from my employer to release this, I pushed the changes into my github:

https://github.com/deavid/zzping/

(Just note that even if Google is in the license, this is just a result of the review process. The only relationship between this project and Google is the fact that I was working on it while being employed by Google)

I thought that Rust would not have mature enough GUI libraries, so I played a bit with Python+Qt5. My idea was that Python could handle well enough the data size and Qt would be better than any Rust GUI. But after some trial and error, realized that Qt charting libraries were mostly for office use, like 100 points or static viewing.

As I wanted something that was able to display > 1000 points changing in real time, Qt was out of the question, and with this, Python was also out of the question as well. So I went to Rust Discord servers to ask for advice on a Rust GUI library for this.

Turns out that, obviously, there’s no GUI aside of FFI to GTK that is capable of graphing. But, as they quickly pointed out, Iced can paint into a Canvas quite well and that should do.

So I coded zzping-gui in Rust, and receiving the UDP events from the daemon, I could paint in real-time the ping timings and packet loss up to 10,000 lines in screen. Still it takes “too much” time to draw, to the point that I found it deceiving; I thought that would be faster. But after profiling, this seems to come from my own NVidia drivers drawing, therefore on the Vulkan side of things.

It’s possible that Iced is not optimized enough for this kind of stuff, or maybe (surely) I’m missing optimizations and caching. But I saw that it was fast enough and I moved on.

This is what it looks like when displaying real-time data:

real-time

It can only show one host at a time, and if restarted it loses the history.

Up to this point it’s what I released as 0.1 in the main branch. I’ve continued working in 0.2 in a beta branch.

A bit of trivia

Most people I talked about zzping they said that I surely used threads for the pinger. Wrong! In fact the first library I found was internally creating threads all over the place, so I looked at the sources and coded something similar myself but single-threaded. I purposefully removed the threading ability and instead used a non-blocking approach.

Why? Because it uses less CPU and it’s more efficient. But threads are more performant! Yes, but no. A threading model would allow me to push more pings per second, sure. But this misses the fact that a single thread in a 10 year old CPU can send over 1,000 pings per second or more, haven’t tested the limits.

And at those rates, one would think if our objective was to test the network or to cause a DoS attack and freeze any networking gear we’re trying to ping. It has near to zero value to send a ping hundreds of microseconds apart.

In contrast, threading has a cost. Yes, it does. Programs using threads use more CPU per unit of work done. Threading means that the CPU and OS scheduler have to do more task switches over time, and those switches aren’t exactly free. OS threads also have memory requirements, and have some CPU cost to initialize.

Going all for threads misses a big point here: zzping-daemon is an utility meant to run all the time in background, as a service. The computer that runs this might not have a lot of CPU, or it might be a gaming machine. Every tiny bit of CPU consumed maybe less FPS while gaming and might be a motivation to shut it down.

Therefore, removing threads is a better strategy to keep the CPU as free as possible and do as much work as possible with the absolute minimum CPU required. Rust also helps there, by optimizing the binary to the maximum.

On another topic, I went for UDP communication to the GUI because I wanted real-time and I preferred to drop packets if the connection between zzping-gui and zzping-daemon was flaky. But now I see this as a problem, as it’s connection-less and it’s not reliable, preparing for a next step when a GUI can subscribe and get the last hour to do a prefill is quite complicated. Therefore I’m thinking on going to TCP instead.

TCP has other problems, it might buffer, and it doesn’t display connection problems. But maybe I’m overthinking it, as this tool is thought over local networks, and they should be more or less stable. In any case, if there’s a problem, it should be solved when it appears, not before.

I have quite a hard time when designing how to store the data to disk. Even settling with storing statistics every 100ms instead of every single ping, turns out that this still can account for 50 messages per second, depending on config. And over a year, this is quite easily a lot of gigabytes.

MessagePack has been quite helpful. Is one of my favourite formats, being really compatible with JSON, flexible, really fast, and small. Here I realized that actually using this specification reduced the messages to a really small size (maybe a half by just not storing directly u32, but allowing MP to choose the smallest size).

I played a lot with compression techniques, but nothing was really helping. I settled with a log quantization that can bring files from 20MB/hour to 12MB/hour with an acceptable precision loss. Other techniques like Huffman, delta encoding or FFT quantizing, yielded negligible results while over-complicating the file format. I might at some point go back to them, as I probably overlook a lot of stuff that can be done.

This produced a new data format. I named the old FrameData, and the new FrameDataQ (quite original, hah). zzping-daemon still saves the old, and I wrote several utilities to read and transform it to the new one, which in turn is the one that the GUI can read.

Yeah, I forgot. zzping-gui in the beta branch can read a file if passed via command line options. This opens a completely new mode and refurbished graph:

Three Windows Synced

In the image above, there are three zzping-gui instances, each opening a different file for a different host.

This allows for zoom, and pan. There is also another way of zooming into the Y axis, I named this “scale factor” (sf) and changes the axis into a semi-logarithmic, depending on how you move the slider.

The tool also does some pre-aggregation at different zoom levels and does a seamless zoom transition. It’s quite interesting that it’s able to navigate millions of points in real-time.

And that’s it, for now. I have plans to make this better. But it’s taking time as the design is not quite clear yet.

https://github.com/deavid/zzping/

I love Rust language!

After a few months of learning Go and Rust seems I’ve come to my conclusion: I love Rust. Go isn’t bad, and I’m using it at my work in Google; in fact I would consider Go for lots of things as well. But Rust is on a whole different level, it’s making me happy and amused to learn it; I’ve recovered the joy from programming that I had when I learnt Python.

What makes Rust special?

There are lots of things to gets me inspired with Rust, so I don’t know where to start from. I guess that this depends on your interests. For me as Python is the king of the interpreted languages, Rust is the king of the compiled ones.

Rust was created by a Mozilla employee and later adopted in Firefox Quantum for a CSS renderer that replaced the previous C++ engine, this has made Firefox faster and safer. Since then, I jumped back from Chrome to Firefox.

C++ is one of the fastest languages out there. It’s hard to tell if C is faster than C++ or not, it depends a lot on what you want to do and how you do it. Rust was created as a replacement for C++ that should be equally fast, or even faster if possible. So nowadays Rust joins the club of top performing languages. Every other language is going to be slower than these three by at least 2x or 4x, and that’s the best case scenario. For example, Haskell can compile and get in some scenarios 50% of C++ speed. Java, around 25%. But these numbers are in reality way worse, because as you grow your programs and stop heavily optimizing and adding complex parts, they get even slower.

This is not the case for Rust. In fact, it might have at some point an advantage to C++. All three languages (C, C++ and Rust) have this mantra of “zero cost abstractions”, which means that you can build more and more complex systems and get almost no penalty from doing so. So they’re ideal for creating operating systems, browsers, game engines and so on.

The main difference between C++ and Rust is that Rust is memory safe. You have to purposely screw up with the memory by using “unsafe {}” blocks of code, and you can do 99% of your work without even thinking of using that. Well, probably 100%, as long you’re not trying to build the next-generation fastest library ever, Rust has lots of ways of implementing really fast data types.

Rust memory allocation is like C++ but uses a “borrow checker” that ensures the memory is freed properly in a guaranteed way. And trust me, I tried to screw with it and I couldn’t manage to confuse enough the compiler to let me build something that would at any point, leak or double free memory. In contrast with Go or Java, where they use a garbage collector, Rust is not garbage collected. Instead it’s “statically collected” if you like the term. This means that Rust programs are the fastest possible without having to worry about pointers and memory. For example, Go suffers from micro-freezes where the garbage collector has to run and every possible code will halt at that point for a few milliseconds.

Another cool thing in Rust is the “zero cost abstractions” thingy. There are so many ways of building crazy abstractions that in fact, they almost have no impact on the execution time. Generics and Traits for example. So you can build libraries that automate a lot of stuff for you in Rust without being any slower than doing the same thing manually every time (well, some penalty may appear in some cases, but it’s tiny).

Rust also has a lot of inspiration from functional languages. Rust isn’t functional, but like Python, it allows you a lot of constructs that resemble to functional languages and they work really well. The compiler doesn’t care about pure or non-pure functions, but they also borrowed this “intelligence” from their compilers as for guessing data types. It’s amazing that types can be inferred not just from the current and past lines, but also from future code. This greatly reduces the amount of code to be written, as you only have to define types in a few critical places. The remaining code is clutter-free most of the time.

You should learn Rust!

I will admit it. Rust is not an easy language to start on. The learning curve is a bit steep and either you’ll have to learn a lot of new mechanics on memory ownership (if you don’t know C or C++) or you’ll have to unlearn a lot of bad practice in pointers (if you knew C or C++ before). Rust makes things in a different way and it’s a bit of fresh air into programming since the last 20 years.

I really believe that Rust got the real way of programming ™ and eventually other languages will follow. So face it now or face it later, the concepts of ownership and borrowing will get onto your favorite programming language at some point, or your language will remain garbage collected and slow forever. Yes, Java, I’m looking at you.

Even if you will not use Rust at work or in production, it will teach you how to be a better programmer. Because one of the best things in Rust is that the compiler does not blame you, it teaches you. You can translate that knowledge to your regular programming in another language, and you’ll get safer, better code.

Rust should be a good fit for REST API services that have lots of logic and/or manage lots of data. In my tests, Rust ORM (Diesel) has shown to be 20x faster than Go ORM. On top of that add abstractions and logic on your own and the difference could be even bigger. Millions of users and huge data? No problem!

It also has good support for all platforms. Developing from Windows, Linux or Mac OSX shouldn’t be a problem at all. It also works on embedded or other exotic platforms. They’re working hard to increase the amount of target platforms to try to cover everything that C covers, but for now they have already a lot.

In short, Rust is an all round language. Multipurpose, multiplatform, fast, safe,… what else do you want? Give it a try!

Actix-web is dead (about unsafe Rust)

Update 2020-01-20: Actix oficial web repository is back and the maintainer has stepped down. Actix will continue to be maintained.

Recently the maintainer of Actix webserver took down the GitHub repository and left the code in his personal repository, deleting lots of issues, enraging a lot of people. He left a post-mortem:

https://github.com/actix/actix-web/blob/7f39beecc3efb1bfdd6a79ffef166c09bf982fb0/README.md

What happened? I did my own read of the postmortem, and from Reddit I also found this article which summarizes the situation pretty well:

https://words.steveklabnik.com/a-sad-day-for-rust

To summarize it in a few words in case you don’t feel like reading those: Rust community is heavily focused on a safe use of Rust where proper memory handling can be proven. For Rust, unless you use the “unsafe” keyword, the compiler guarantees no memory errors in a provable way, so usually for those small parts where the compiler is unable to prove the code, it’s okay to use “unsafe”. The remaining code should be small and easy to prove correct.

Actix was found by third parties abusing unsafe and when they were auditing most libraries found for Rust on the internet. When the unsafe code was audited it was found that on misuse, it can lead to serious vulnerabilities. So they opened a bunch of issues and added a lot of patches and PR’s in GitHub.

The response from the maintainer was that he doesn’t care, didn’t accept almost any of the patches, deleted the issues and the conversation heated up a lot and finally he deleted the repository itself from the official source and left it under his own username.

This is sad. Actix was known by its amazing speed on different benchmarks and was used by a lot of people. While it’s bad that the community sometimes is too harsh and some people lacks a lot of politeness (which makes maintainer life really hard), I’m going to be polemic here and say: It’s good that this happened and Actix-web got deleted.

I have been using Actix-web, seduced by its speed and I never thought I could be promoting a vulnerable webserver. I was assuming that because the library was coded on Rust, the author was taking care of not using unsafe where possible. But I was so wrong. Luckily I had other things to do and never released the article where I was going to promote Actix-web. Now I’ll have to redo the benchmarks before releasing anything.

The same happened for lots of other people, and all those uses combined, Actix-web has increased the surface area of attack for a lot of deployments.

I would have argued in other cases that for certain use cases, having software that prioritizes speed to security is good on certain scenarios where the inputs or the environment is not exposed to the internet. But this is a webserver, it’s main job is to serve as a facade for the internet. But even the project documentation never mentioned this aspect that the target was just to make the fastest webserver even if that meant to sacrifice security.

There’s no point on running Actix-web behind anything to reduce its potential problems: It is several times faster than raw Nginx or Apache serving static content. Adding anything on front will slow it down a lot. Also, there’s no reason to use it for internal networks: If it’s just serving HTTP to internal users, any web server will do, as internal networks have much less traffic. If it’s used to pipe commands along several machines, then HTTP is just a bad choice. use RPC’s instead like gRPC.

To be completely fair let me state that Actix-web never had a real issue as far as I know. It’s just that its correctness cannot be proven. Is this a problem? For me, yes, because if I wanted otherwise I would go with C or C++ instead. There are lots of really good, really fast web servers using raw C++. The point of using Rust in the first place is having memory guarantees, like using Java but without paying the performance penalties.

I understand that the maintainer just wanted to have fun with coding and there’s nothing wrong with that. But when your product starts getting recommended by others you have to care. This can’t be avoided: with great powers come great responsibilities.

This is the good thing with the Rust community. They’re fully committed to even inspect the sources of every single library out there and help by reducing the amount of unsafe code, even patching them.

It’s sad that the repository has been “deleted”, but this is good for Rust. Quality needs to be there and definitely they need to prevent unsafe code from gaining ground. There’s no point of having Rust if most libraries you can practically use are memory unsafe.

To conclude this: please be polite, everyone. It’s quite hard when you get a ton of people bashing at you. But also, keep the good job up!

Benchmarking Python vs PyPy vs Go vs Rust

Since I learned Go I started wondering how well it performs compared to Python in a HTTP REST service. There are lots and lots of benchmarks already out there, but the main problem on those benchmarks is that they’re too synthetic; mostly a simple query and far from real world scenarios.

Some frameworks like Japronto exploit this by making the connection and the plain response blazing fast, but of course, as soon as you have to do some calculation (and you have to, if not what’s the point on having a server?) they fall apart pretty easily.

To put a baseline here, Python is 50 times slower than C++ on most benchmarks, while Go is 2-3 times slower than C++ on those and Rust some times even beats C++.

But those benchmarks are pure CPU and memory bound for some particular problems. Also, the people who submitted the code did a lot of tricks and optimizations that will not happen on the code that we use to write, because safety and readability is more important.

Other type of common benchmarks are the HTTP framework benchmarks. In those, we can get a feel of which languages outperform to others, but it’s hard to measure. For example in JSON serialization Rust and C++ dominate the leader board, with Go being only 4.4% slower and Python 10.6% slower.

In multiple queries benchmark, we can appreciate that the tricks used by the frameworks to “appear fast” no longer are useful. Rust is on top here, C++ is 41% slower, and Go is 43.7% slower. Python is 66.6% slower. Some filtering can be done to put all of them in the same conditions.

While in that last test which looks more realistic, is interesting to see that Python is 80% slower, which means 5x from Rust. That’s really really far better from the 50x on most CPU benchmarks that I pointed out first. Go on the other hand does not have any benchmark including any ORM, so it’s difficult to compare the speed.

The question I’m trying to answer here is: Should we drop Python for back-end HTTP REST servers? Is Go or Rust a solid alternative?

The reasoning is, a REST API usually does not contain complicated logic or big programs. They just reply to more or less simple queries with some logic. And then, this program can be written virtually with anything. With the container trend, it is even more appealing to deploy built binaries, as we no longer need to compile for the target machine in most cases.

Benchmark Setup

I want to try out a crafted example of something slightly more complicated, but for now I didn’t find the time to craft a proper thing. For now I have to fall back into the category of “too synthetic benchmarks” and release my findings up to this point.

The base is to implement the fastest possible for the following tests:

  • HTTP “Welcome!\n” test: Just the raw minimum to get the actual overhead of parsing and creating HTTP messages.
  • Parse Message Pack: Grab 1000 pre-encoded strings, and decode them into an array of dicts or structs. Return just the number of strings decoded. Aims to get the speed of a library decoding cache data previously serialized into Redis.
  • Encode JSON: Having cached the previous step, now encode everything as a single JSON. Return the number characters in the final string. Most REST interfaces will have to output JSON, I wanted to get a grasp how fast is this compared to other steps.
  • Transfer Data: Having cached the previous step, now send this data over HTTP (133622 bytes). Sometimes our REST API has to send big chunks over the wire and it contributes to the total time spent.
  • One million loop load: A simple loop over one million doing two simple math operations with an IF condition that returns just a number. Interpreted languages like Python can have huge impact here, if our REST endpoint has to do some work like ORM do, it can be impacted by this.

The data being parsed and encoded looks like this:

{"id":0,"name":"My name","description":"Some words on here so it looks full","type":"U","count":33,"created_at":1569882498.9117897}

The test has been performed on my old i7-920 capped at 2.53GHz. It’s not really rigorous, because I had to have some applications open while testing so assume a margin of error of 10%. The programs were done by minimal effort possible in each language selecting the libraries that seemed the fastest by looking into several benchmarks published.

Python and PyPy were run under uwsgi, sometimes behind NGINX, sometimes with the HTTP server included in uwsgi; whichever was faster for the test. (If anyone knows how to test them with less overhead, let me know)

The measures have been taken with wrk:

$ ./wrk -c 256 -d 15s -t 3 http://localhost:8080/transfer-data

For Python and PyPy the number of connections had to be lowered to 64 in order to perform the tests without error.

For Go and Rust, the webserver in the executables was used directly without NGINX or similar. FastCGI was considered, but seems it’s slower than raw HTTP.

Python and PyPy were using Werkzeug directly with no url routing. I used the built-in json library and msgpack from pip. For PyPy msgpack turned out to be awfully slow so I switched to msgpack_pypy.

Go was using “github.com/buaazp/fasthttprouter” and “github.com/valyala/fasthttp” for serving HTTP with url routing. For JSON I used “encoding/json” and for MessagePack I used “github.com/tinylib/msgp/msgp”.

For Rust I went with “actix-web” for the HTTP server with url routing, “serde_json” for JSON and “rmp-serde” for MessagePack.

Benchmark Results

As expected, Rust won this test; but surprisingly not in all tests and with not much difference on others. Because of the big difference on the numbers, the only way of making them properly readable is with a logarithmic scale; So be careful when reading the following graph, each major tick means double performance:

Here are the actual results in table format: (req/s)


HTTPparse mspencode jsontransfer data1Mill load
Rust128747.615485.435637.2019551.831509.84
Go116672.124257.063144.3122738.92852.26
PyPy26507.691088.88864.485502.14791.68
Python21095.921313.93788.767041.1620.94

Also, for the Transfer Data test, it can be translated into MiB/s:


transfer speed
Rust2,491.53 MiB/s
Go2,897.66 MiB/s
PyPy701.15 MiB/s
Python897.27 MiB/s

And, for the sake of completeness, requests/s can be translated into mean microseconds per request:


HTTPtransfer dataparse mspencode json1Mill load
Rust7.7751.15182.30177.39662.32
Go8.5743.98234.90318.031,173.35
PyPy37.72181.75918.371,156.761,263.14
Python47.40142.02761.081,267.8147,755.49

As per memory footprint: (encoding json)

  • Rust: 41MB
  • Go: 132MB
  • PyPy: 85MB * 8proc = 680MB
  • Python: 20MB * 8proc = 160MB

Some tests impose more load than others. In fact, the HTTP only test is very challenging to measure as any slight change in measurement reflects a complete different result.

The most interesting result here is Python under the tight loop; for those who have expertise in this language it shouldn’t be surprising. Pure Python code is 50x times slower than raw performance.

PyPy on the other hand managed under the same test to get really close to Go, which proves that PyPy JIT compiler actually can detect certain operations and optimize them close to C speeds.

As for the libraries, we can see that PyPy and Python perform roughly the same, with way less difference to the Go counterparts. This difference is caused by the fact that Python objects have certain cost to read and write, and Python cannot optimize the type in advance. In Go and Rust I “cheated” a bit by using raw structs instead of dynamically creating the objects, so they got a huge advantage by knowing in advance the data that they will receive. This implies that if they receive a JSON with less data than expected they will crash while Python will be just fine.

Transferring data is quite fast in Python, and given that most API will not return huge amounts of it, this is not a concern. Strangely, Go outperformed Rust here by a slight margin. Seems that Actix does an extra copy of the data and a check to ensure UTF-8 compatibility. A low-level HTTP server probably will be slightly faster. Anyway, even the slowest 700MiB/s should be fine for any API.

On HTTP connection test, even if Rust is really fast here, Python only takes 50 microseconds. For any REST API this should be more than enough and I don’t think it contributes at all.

On average, I would say that Rust is 2x faster than Go, and Go is 4x faster than PyPy. Python is from 4x to 50x slower than Go depending on the task at hand.

What is more important on REST API is the library selection, followed by raw CPU performance. To get better results I will try to do another benchmark with an ORM, because those will add a certain amount of CPU cycles into the equation.

A word on Rust

Before going all the way into developing everything in Rust because is the fastest, be warned: It’s not that easy. Of all four languages tested here, Rust was by far, the most complex and it took several hours for me, untrained, to get it working at the proper speed.

I had to fight for a while with lifetimes and borrowing values; I was lucky to have the Go test for the same, so I could see clearly that something was wrong. If I didn’t had these I would had finished earlier and call it a day, leaving code that copies data much more times than needed, being slower than regular Go programs.

Rust has more opportunities and information to optimize than C++, so their binaries can be faster and it’s even prepared to run on crazier environments like embedded, malloc-less systems. But it comes with a price to pay.

It requires several weeks of training to get some proficiency on it. You need also to benchmark properly different parts to make sure the compiler is optimizing as you expect. And there is almost no one in the market with Rust knowledge, hiring people for Rust might cost a lot.

Also, build times are slow, and in these test I had always to compile with “–release”; if not the timings were horribly bad, sometimes slower than Python itself. Release builds are even slower. It has a nice incremental build that cuts down this time a lot, but changing just one file requires 15 seconds of build time.

Its speed it’s not that far away from Go to justify all this complexity, so I don’t think it’s a good idea for REST. If someone is targeting near one million requests per second, cutting the CPU by half might make sense economically; but that’s about it.

Update on Rust (January 18 2020): This benchmark used actix-web as webserver and it has been a huge roast recently about their use on “unsafe” Rust. I’m had more benchmarks prepared to come with this webserver, but now I’ll redo them with another web server. Don’t use actix.

About PyPy

I have been pleased to see that PyPy JIT works so well for Pure Python, but it’s not an easy migration from Python.

I spent way more time than I wanted on making PyPy work properly for Python3 code under uWSGI. Also I found the problem with MsgPack being slow on it. Not all Python libraries perform well in PyPy, and some of them do not work.

PyPy also has a high load time, followed by a warm-up. The code needs to be running a few times for PyPy to detect the parts that require optimization.

I am also worried that complex Python code cannot be optimized at all. The loop that was optimized was really straightforward. Under a complex library like SQLAlchemy the benefit could be slim.

If you have a big codebase in Python and you’re wiling to spend several hours to give PyPy a try, it could be a good improvement.

But, if you’re thinking on starting a new project in PyPy for performance I would suggest looking into a different language.

Conclusion: Go with Go

I managed to craft the Go tests in no time with almost no experience with Go, as I learned it several weeks ago and I only did another program. It takes few hours to learn it, so even if a particular team does not know it, it’s fairly easy to get them trained.

Go is a language easy to develop with and really productive. Not as much as Python is, but it gets close. Also, it’s quick build times and the fact that builds statically, makes very easy to do iterations of code-test-code, being attractive as well for deployments.

With Go, you could even deploy source code if you want and make the server rebuild it each time that changes if this makes your life easier, or uses less bandwidth thanks to tools like rsync or git that only transfer changes.

What’s the point of using faster languages? Servers, virtual private servers, server-less or whatever technology incurs a yearly cost of operation. And this cost will have to scale linearly (in the best case scenario) with user visits. Using a programming language, frameworks and libraries that use as less cycles and as less memory as possible makes this year cost low, and allows your site to accept way more visits at the same price.

Go with Go. It’s simple and fast.

Why should you learn Go for your next project

I have been hearing about Go for long time and along with Rust is one of the two new programming languages that seem to be gaining some attention in the last years.

After learning Go it seems to me a good alternative to other programming languages because is simple, beautiful, hassle free, and fast compared to everything else that is not C, C++ or Rust. Its simplicity is really appealing because you can start small and grow as big as you want.

Go is ideal for containerized apps and websites. Runs faster than other popular alternatives for web, with a small memory footprint, and their executables have no dependencies. It’s blazing fast to compile, so iterating with new versions and deploying is almost as fast as interpreted languages.

Being so simple to understand, it requires a very small training to be able to be productive in Go.

Comparing Rust with Go

  • Go is developed by Google. Rust is developed by Mozilla.
  • Go use case is a more practical C or more scalable Python. Rust is a high performance alternative to C++.
  • Both have garbage collection.
  • Both are compiled.
  • Go is productive. Rust is fast.

In short, Rust would be a better option where speed is key as it is as fast as C++; Also, it has some features that aim to have cost-less abstractions, so it looks promising for big complex projects like browsers. It should be able to deliver the same speed as C++ with less complexity on the code.

Five reasons to use Go

  • Simple and beautiful: Go is easy to read and easy to write
  • Runs fast: Go is faster than JavaScript and Python, comparable to Java.
  • Compiles fast: As far as I know, Go is the fastest language on compiling times. It’s one of its main purposes. It also includes an “interpreter”, so you can run go programs from source to avoid compiling while developing.
  • Static typing: When a program grows large, having static types may help a lot, also on safety purposes. Go is statically typed, so your programs can grow while staying safe that they will run as you expect.
  • Explicit but terse: Go language is explicit, so the meaning is clearly conveyed in the programs. But at the same time is terse, so we don’t spend much time writing Go.

Is Go better than Python?

While Go does not have the flexibility and magic that Python has, still has basic primitives for flexible arrays and maps (dictionaries) so with Go we don’t lose as much proficiency as in other compiled languages.

But Go is way faster than Python, so if our program has to do custom complex calculations, Go can be 30 times faster than Python. But beware, as Python links their libraries to plain C, in some (very) specific use cases it could beat Go or other languages.

There’s no much difference on the development cycles from Go to Python. Go also has the ability to run the programs on the fly, so the sequence of code-try-code is equally fast.

Go produces final binaries for the platform, and statically linked. So, for distributing, you don’t need to distribute sources (this can be good or bad depending on your point of view). But you also have to build different binaries for different platforms. Because is statically linked, there’s no need to account for the different Linux distributions, so the same binary should run across all Unix flavours that support ELF format and run the same architecture.

For distributing on Windows, Go could be easier as just produces an executable and runs across many Windows platforms while on Python you have to care on packaging for Windows and test it properly; or tell the user to install the whole development stack which is a hassle.

The main disadvantage of Go vs Python is that Go tries to statically compile everything, so the behavior of code is set at compile time. There are Go interfaces which can help creating this kind of “magic” abstractions that change behavior depending on the scenario, but aside of that it’s a bit limited. In contrast, Python is much more flexible.

Is Go better than C++ or Java?

While C++ and Java are more feature-rich, Go is simplified and more productive. Also Java tends to be memory-hungry, so Go it’s useful to run programs in constrained memory and disk requirements.

Because Go statically links everything inside, its executables will be bigger than their C++ counterpart, but still way smaller than Java as you have to carry the JVM and libraries which use a lot of disk space. This makes Go an excellent candidate for containerized applications.

The downsides of Go is the lack of abstractions, and is slower than C++; being more or less as fast as Java (but still a bit slower in some scenarios).

Go is strongly opinionated

While learning Go for the first time I found a lot of things surprising. For me Go is the plain old C language with a new Pythonic style. I like both C and Python a lot so I see a lot of influence from both languages in Go.

When designing Go they weren’t scared of breaking the rules, it is clear that they have a strong opinion on how things should be done. In the same sense that Python wanted to specifically ditch the braces for blocks and the “switch” statement, they had clear that they don’t want classes (on the usual OOP approach) and they don’t want exceptions.

Unlike Python, it doesn’t use whitespace for blocks and uses the classic braces and has an extended switch statement.

The braces don’t need any explanation, but the switch deserves a mention. The main problem of a switch statement is, by default, it follows from one case to the following, causing unintended bugs.

In Go they solved it by going the other way around: By default each case is independent unless you add the keyword “fallthrough”. This makes this construct less bug-prone and terser in the common case:

There are also special types of switch: with no condition so all conditions are in cases; for variable type detection; and finally for processing many asynchronous events at once.

As said before, there are no exceptions in Go. You’re expected to return the error using a conventional return statement, so the common approach is to return a tuple of (value, error) or (value, ok). This comes from C where we used to encode errors in the return value. But Go makes this way simpler by allowing tuples of values to be returned.

It also has an error primitive that can be used easily to convey error messages as text, and it can be extended to your needs.

This means that your code should be checking for error codes explicitly. Failing to do so means that the code will continue running using a default value instead. It does not fail the execution.

Go programs can fail completely as well and stop execution. This is known as panicking. Panic can be started from an internal call or manually by the programmer by using the “panic” function. So instead of throwing errors, you can just panic. In this case, checks are caller responsibility.

Now functions have two ways of exiting, returning and panicking. So they added a “defer” statement to compute cleanups at the end of the function. This is useful because it is run regardless of how or when the function exits.

Panicking unrolls the stack, calling all defer statements in the way. It is still possible to avoid the program from crashing by using the recover keyword. This actually looks like a flavor of try..catch, but is not recommended in Go. Although less performant than error codes, it can be clearer or easier to reason in some cases.

Going back to classes and object-oriented programming. Go does not have classes but it has some of the object-oriented ideas implemented, again, in a more flavored C style.

They use structs and interfaces. Structs are like regular C structs, so no code, just data. They can be inherited in the same sense as in C, stacking one in top of the other. This is called “struct embedding” and Go adds syntactic sugar to help this:

Multiple embedding is possible, just stacks the structs one in top of the other, much in the style of C. So no diamond problem, if a name appears twice, it will be stored twice.

The code for those is held outside of the struct, by defining functions for types. Much like Python, self/this is declared explicitly:

And then, there are interfaces to be able to write code that manages diverse types at once. It might resemble to Python duck typing, Java interfaces, C++ virtual functions, etc. But it is a thing on its own.

Interfaces define a set of methods that must exist in a type. A type does not declare if adheres to any interface. The fact that the type has all methods is enough to be able to use it for the said interface. So in this sense, it resembles duck typing.

And finally, Go is one of the few programming languages that I know of that is compiled and supports UTF-8 natively.