A small shift on this blog! Advocating for a simpler Rust

Lately I’ve been writing about Rust because I like it a lot and I think it deserves more attention. But since the last post things are going to change gears a bit. I shared the last post on Reddit and it got shared by This Week In Rust too. This has caused an spike on traffic that’s hard to believe, getting a week’s worth of visits in 12 hours.

I’m getting more visibility than I never would have thought of. And comments both in the blog and Reddit that are positive, correcting me (because, heh, I’m not perfect!) and very constructive. I feel that my overall view (so far) is not rejected by the Rust community and it’s possible that my opinions here can have some effect (even if minor) on Rust itself in the future.

Therefore I believe it’s worth trying to advocate for a simpler Rust for beginners. Now it’s not (only) for the visits, it’s about trying to convince other people that Rust can change to be perceived as an easy language. And trying to show other people that Rust, currently, it’s not that hard either.

I currently work for Google as an SRE (See “Contact / About me” to get my current status), and Google is recently getting involved in Rust as well so I need to get this out of the way: What I write in this blog has nothing to do with my employer, opinions are my own. At work I don’t write any Rust currently and sadly I don’t expect this to change. Rust is only my hobby and I would like to continue that way, as it makes a separation of hobby and work time easier. I love coding as a hobby, it’s fun to me. But when you do the same stuff at work as in your personal time the line gets very blurry; so I’d prefer things to stay as they are.

Rust has a lot of untapped potential. The community is right now so focused on the performance aspect that does not realize yet that Rust can take over other general programming languages. Rust is gaining adoption slowly but steady an I expect to see “Rust everywhere” roughly around 2030. If I can help in any way to make this happen, it’s worth it for me.

I realized this when coding zzping. Suddenly I saw that I was getting things done really fast, that the speed of development was near Python. And when I needed something quick and dirty and went back to Python it was really unpleasant, it felt that in Rust I could have done it faster.

There are several pain points in Rust for me. Mutable statics and lifetimes are the most common parts that slow me down. But I got used to them now, so usually I remember how to workaround them.

But the changes I would like to see are not only for me, but mainly for newcomers. I didn’t take very long to learn Rust and for the most part it makes sense to me how the language is designed. But not a lot of people has good C++ background and for these most of the concepts may feel fanciful, hard to understand and use, and might steer some of these people away from Rust.

It doesn’t need to be that way. The main problem is that the learning process for Rust is currently crashing against the borrow checker until you get it. It’s like playing Paperboy but for programmers.

I hope I can contribute something to get more people in Rust and see a wider usage outside of performance reasons. My Rust knowledge is still limited, but now that I still am somewhat a newcomer it’s the moment to try to push for this. If I get too comfortable with Rust I will no longer see any of its problems anymore.

Rust – What made it “click” for me (Ownership & memory internals)

This is aimed for people who are coming to Rust from garbage collected languages, such as Python or JavaScript and have trouble with the compiler throwing errors without reason. This guide assumes you already know some Rust.

For those like me that worked with non-GC languages (C, C++) the borrow checker still feels hard to understand at first, but the learning curve is quite less and the documentation and design makes quite a lot of sense.

So I want to add some basic context for those that never managed manually requesting and freeing memory such that the underlying design in Rust ownership system makes more sense and clicks. Because this is all it takes, once it clicks, everything seems to fall together. (If you already know C or C++, you’ll already know most of the stuff I’m going to talk about, and probably notice that I’m oversimplifying a lot; but it might be worth the read as it contains how Rust “clicked” for me)

First of all I want you to consider a small snippet of code in Rust:

let a: String = "Hello world".to_string();
let b: String = a;

Here’s the question: What is the final value for “a” and “b” variables?

Think about it.

Usually in GC languages assignment can do two things depending on the types involved. For basic types (i.e. numeric) the value inside ‘a’ is usually copied into ‘b’, so you end with two variables with the same content. For complex types it is common to instead of copying the data just make “b” point to “a”, so internally they share the same data.

But this is not the case for Rust. In the snippet above “a” is moved into “b”, and “a” is freed afterwards. It does not exist anymore and does not hold any value. Puzzled? So was I. But don’t worry, this will make sense at the end.

By default Rust will move the data instead of copying it. This means that it will copy byte by byte from one place to the new one and then it will remove the original copy, returning the memory to the operating system.

I know. This feels stupid, but it comes from the ownership rules in Rust. It also has its special cases, for example, for simple types it will copy the contents but the original variable will not be freed. I will cover this later on.

Basic overview of a program’s memory

As said earlier, C++ developers have an advantage on understanding Rust because they have a mental model of what the memory looks like, pointers and so on. We need to cover some ground here to get proper understanding, but I don’t want to get in-depth, so I’ll try to get it as simple as possible.

But let’s be fair: this is going to be more in-depth than most people would like to be. I’m sorry, but I think this is needed for later.

Memory pointers

Have you thought about how the values are actually stored in memory? How does it organize different things as integers, floating points and strings? How can the program differentiate between two variables?

Memory can be thought as a list of bytes of roughly 264 items:

let mem: Vec<u8> = vec![0; 18_000_000_000_000_000_000];

(Note: This Rust syntax just creates a Vector of unsigned numbers, 8 bit each, with a size of 18,000,000,000,000,000,000 elements, initialized with the value ‘0’ for all elements)

When you declare a variable, the program needs to decide where to put it in this list. The position of a variable in this list would represent the memory address. And of course you could store this index in the list in another variable, this is called a pointer and it’s represented by the ampersand (&). Memory addresses are usually represented in hexadecimal.

Let’s say we want the variable “a” to be stored at index 0x100, so we could create a pointer to “a”:

let p_a: &i32 = 0x100;

Now we could use the dereferencing operator (*) to read the contents of the memory at that address:

let a: i32 = *p_a;

In our imaginary example, “*p_a” will equate to “mem[p_a]” with a catch. Because memory is bytes and the variable is 4 bytes long, it needs to read all 4 indices, and would do something like this:

let a: i32 = mem[p_a] << 24 + mem[p_a + 1] << 16 
           + mem[p_a + 2] << 8 + mem[p_a + 3] << 0;

Notice that in this example we decided that the first byte will be mapped to the highest value of the integer while the last byte is the least value of it. This depends on the processor and for this case big-endian is assumed. Most consumer processors are little-endian and store the lower part of the value first.

All these nifty details are done under the hood by your programming language, including Rust, C and C++. Some of this is even done internally in your processor directly.

While we don’t need any of this to code Rust or C, a basic understanding here does help to understand the design decisions of programming languages, specially those without GC.

Now we have a variable of 4 bytes in memory. Where would we put another variable of 8 bytes? Consider this:

let p_a: &i32 = 0x100;  // original pointer
// ----
let p_b: &i64 = 0x102;
let p_c: &i64 = 0x0FA;
let p_d: &i64 = 0x106;

For these three positions (b,c,d), they all have problems as they’re meant to hold a 8 byte variable (i64 is 64bits long, which is 8 bytes). The previous variable actually spans from 0x100 to 0x103 (both included), so this means that p_b is overlapping two bytes. If this is done, changing “b” would change “a” and vice-versa, in a very strange manner.

The reverse happens for “c”, because it needs 8 bytes, it will span from 0x0FA to 0x101, also overlapping 2 bytes with “a”.

The last one, “d”, does not cause any overlap, and it would work. But the problem here is that it leaves a gap of a few bytes between “d” and “a”, which will be hard to fill later. Usually the compiler and the operating system want the values in memory packed together so they use less memory and don’t leave gaps, as gaps are almost impossible to fill.

Key points to remember:

  • Memory is a flat list of bytes. The compiler and processor takes this into account to be able to have anything bigger than a byte.
  • Pointers are used internally all over the place to make any program work.
  • Compilers must know how big is the data behind a pointer to memory in order to read/write it correctly.
  • Endianness (big or little) matters when handling memory manually byte by byte.

Virtual memory and initialization

Going back to the memory example:

let mem: Vec<u8> = vec![0; 18_000_000_000_000_000_000];

You might wonder why I decided to give this imaginary vector roughly 264 elements, almost 16 Exbibytes which is clearly above any quantity in RAM possible.

(Note: I keep saying “roughly” 264 bytes because some parts might be reserved)

Let me ask a different question. Do you think that the memory from other programs running would be in the same place? in such a way, a program would need to avoid colliding variables on the same memory as other programs to avoid corruption.

The answer is usually no, but it depends. Your program’s memory is isolated from other programs and you cannot see or touch their memory. In fact a pointer to 0x100 in different programs maps to different physical locations memory. This is because the Operating System provides a virtual memory layout. This virtual memory is usually 264 in size for 64 bit platforms, so each program technically has its own 16 EiBytes of memory available even on computers with 1 GiByte of RAM.

The “depends” part is because some OS/platforms do not provide this abstraction, and also because you might be running a program without an operating system at all. But since my guess is that you’re currently using a GC language, most likely you’re not interested at all in doing this, so we can assume that a program always works with virtual memory. So for now on we’ll be assuming that our program runs in virtual memory.

The next question to ask is the initial value in memory. Do you think it comes initialized at zero or some crap/past data? Actually it can be anything, this truly depends on the OS and its configuration. So never assume that the initial value is zero or that the initial value is crap data. Zeroing may be used to avoid leaking private information to other programs.

Most GC languages (like Go) will initialize memory at zero for you. And most non-GC languages (like C++ or Rust) will try to prevent you from reading uninitialized memory, which you have to do manually.

Rust can allow reading uninitialized memory, but only within “unsafe” blocks. This can be used to avoid initializing memory twice on complex algorithms. C, C++ and Rust trusts the programmer knows what they’re doing, but the difference with Rust is that regular Rust has all the safeguards in, whereas in C++ you need to be careful at all times. (I almost never use unsafe, and if you come from GC languages you should also avoid using it. All programs you’re used to do can be written without using unsafe)

Key points to remember:

  • Programs typically run in virtual memory, which is way larger than the installed RAM size.
  • Memory usually comes uninitialized, with random data on it. But some OS might initialize it for security reasons.
  • Programming languages either zero it for you or try to prevent you from reading uninitialized memory.

Memory allocation and deallocation

Last thing on memory and we’ll move to less theoretical topics. Let’s talk about allocating memory.

Before, we were assuming you could do something like this:

let p_a: &i32 = 0x100;
let a: i32 = *p_a;

This does not work in Rust. But the counterpart in C kind of does:

int &p_a = 0x100;
int a = *p_a;

Rust will not let us manipulate memory directly unless we’re using unsafe code. I’m no expert on unsafe, so I’m not even going to try that. The point here is that the equivalent C code, even if it compiles, doesn’t work.

The reason is that “p_a” points to unallocated memory, which is different from uninitialized memory. A program cannot access any point in memory unless it’s allocated by the OS. To do this it needs to call the OS to request memory. In C this is done by the alloc() function family, where the most known one is malloc().

When a program requests memory, it doesn’t ask for a particular point in memory, but for a specific size instead:

int &p_a = (int*) malloc(sizeof(int));

So the memory address is chosen by the operating system (or the allocator), and the program has no influence over it.

Now we can use that memory, read and write into it. Just remember that until you write on it, the contents are undefined and platform dependent.

When we finish with that memory and we no longer need it we should free it; basically we should tell the OS that we’re finished with it so they can reuse that chunk for other programs. If we keep allocating but we never free our program would have a memory leak and will keep growing in size.

In Rust we don’t need to worry about allocating and freeing memory as it’s managed for us. There’s no risk of a memory leak, except for recursive data structures (depending on implementation) for which Rust has the same risks as any GC language.

This directly contrasts with C where you need to manually allocate and free the memory properly.

In C freeing memory looks like this:

free(p_a);

And in Rust would be:

drop(a);

As said before, in Rust you don’t need to worry about this. But if you want to release memory early, you can. This is commonly used to invoke the destructor early rather than to actually free memory (it’s something useful when handling locks between threads).

Other common pitfalls in C with freeing memory are the use-after-free and double-free. The names are self explanatory: If you free something and then use it (read or write), it’s an error. If you free something twice, it’s an error.

Key points to remember:

  • Internally, memory needs to be allocated and freed from/to the operating system.
  • A program cannot choose which address will be allocated.
  • Freeing is important to avoid memory leaks, but this is handled by Rust.
  • Forget that unsafe exists in Rust. Regular Rust is enough for anything you can imagine in a GC language. Leave unsafe for the experts (which is definitely not me).

Stack and Heap

Over this whole section I addressed only dynamic memory allocation with manual malloc and free, which is for heap memory. There’s also the stack for which is managed even in C. For what I want to explain I don’t think we need to really understand what the stack or heap is or what are the differences. 

Rust extends on that approach of automatic allocation and deallocation based on scopes to avoid having a GC.

If you’re confused about stack and heap, and which should you use, let me say that you don’t need to care about this at all. If you’re interested, it’s a great topic, but same as you don’t care in Python or Go, you also don’t need to care in Rust.

Rust will place some stuff on the heap and the majority of variables into the stack. For example, Box<T> places stuff into the heap.

In simple terms, the stack refers to the variables that are tied to a specific code block (between some braces), while the heap refers to dynamically allocated memory.

Key points to remember:

  • Stop worrying about stack or heap and move on.

Composite types in memory

Now that I already gave you a headache with all that stupid stuff about memory internals that no one cares about, we can begin to understand how objects are layed out internally. This will come in handy in the next part.

What objects actually are

In C++ we could do something like this to create an object:

class Card {
  public:
    int number;
    int suit;
};
int main() {
  Card aceOfSpades;
  aceOfSpades.number = 1;
  aceOfSpades.suit = 4;
}

In Rust this would be:

pub struct Card {
   pub number: i64,
   pub suit: i64,
}
fn main() {
    let ace_of_spades = Card {
        number: 1,
        suit: 4,
    };
}

If you’re wondering why I’m using C++ as a reference and not PHP, Javascript or any other “simple” language, the reason is that C++ shares syntax with all those so you should be familiar enough to be able to read it. But those languages don’t map exactly into memory as C++ or Rust does, so to be as correct as possible, I prefer to use C++ as an example. And you usually need types to get something that can be mapped onto memory. 

So, if you come from Python and don’t have any other language to leverage, the best I can come out with is with typed Python:

class Card:
    number: int
    suit: int

ace_of_spades = Card()
ace_of_spades.number = 1
ace_of_spades.suit = 4

Anyway, notice how most languages let you instantiate the object without specifying the contents, so you can write to them later. But not Rust, as it forces us to define the actual content values to instantiate the object.

Remember before when we were talking about uninitialized data? What is happening here is that C++ and Python (and mostly all languages) initialize the contents to zero implicitly when the object is created. So this means that we’re writing twice in order to define a particular value.

The worst problem in all other languages is not performance but lack of exhaustiveness: If you forgot to define the value of any member it will be silently set to zero, no warning. When adding a new field down the line it can be very hard to track all the places where it’s created and some bugs might appear because we forgot to set this new field to something. This simply can’t happen in Rust.

Back to the original topic. How is this layed out internally in memory? Simple, it uses 16 bytes where the first 8 are used for “number” and the latter 8 are used for “suit”. We get something like this:

(Note: I’m using big-endian above. Most computers are little-endian and would write 0x0100000000000000 and 0x0400000000000000 instead. Unless you want to manipulate bytes manually, you don’t need to worry about endianness.)

So far so good, right? This object has a known size at compile time of 16 bytes. If the compiler has an address to a Card, it knows it’s 16 bytes long and knows that the suit is 8 bytes to the right.

This means that:

&ace_of_spades.suit == (&ace_of_spades + 8)

(Note: Be aware that some compilers, specially Rust, has the right to reorder the fields in memory so it’s not guaranteed that the second field will appear after the first one)

Now let’s talk about strings. How big is a String type? If the compiler has a memory address to a string, how many bytes it has to read?

The problem here is that strings can be of any size. From empty strings to full books inside a single variable. It’s not possible to always know the size of a string at compile time. How does it store them?

In C, strings were basically unspecified in length but terminated by 0x00 (or ‘\0’) so all functions had to keep reading until they found this character. This has an obvious downside: if your string contains the ASCII character ‘\0’ in the middle, you cannot read it completely.

In Rust, the String type is basically a pointer to somewhere else and a length:

pub struct String {
  buf: *mut u8,
  len: usize,
}

(Note: Actually, String in Rust is just a Vec<u8>, but Vec itself is using something similar as above; also here I left out the capacity which would add another 8 bytes to the space used)

This makes the String type sized, and in 64 bit platforms this will be 16 bytes long. Regardless of the string contents, the String object is always 16 bytes.

It doesn’t mean that all strings would use only 16 bytes. Obviously the memory required to hold the text is still used, but it’s allocated elsewhere. Compared to C strings, they don’t have any of their downsides, but the problem with this approach is that instead of wasting 1 byte per string, it wastes 16 bytes.

What about the methods? Are they in memory as well? Yes, code is also in memory, but this code is only once in memory and not copied per each object instantiation. It doesn’t matter how many methods a type has, the object has the same size.

Key points to remember:

  • Objects are usually laid out in memory by just concatenating the fields.
  • Objects may contain memory addresses (pointers).
  • Methods do not use space in the object.

Ownership

The most important Rust concept is probably ownership, what does this mean?

Imagine we’re passing data through several functions. Where should the data be allocated and where it should be freed? A garbage collector will wait until no part of code has access or pointers to that data and then proceed to freeing it. This is done while the program is running and does cost cycles. Go is known for having “pauses” where the GC runs.

But C and C++ have a cool way of handling this by leveraging the stack and the scopes. For example, in this code:

int main() {
  Card aceOfSpades;
  aceOfSpades.number = 1;
  aceOfSpades.suit = 4;
}

The variable aceOfSpades is allocated by C++ automatically and when it exits the scope, it is freed also automatically. Cool, right? For such simple cases the memory is managed for us.

It would be nice if this system could be extended to all other cases, when data is passed over other functions and methods, because sometimes data needs to be dropped in an inner function. It would be awesome if a function could know beforehand if it should free after using the data; but the problem is, if a function frees some data and some caller expects that this data still exists, we would get a use-after-free error.

It is hard in C++ to follow this semantic properly across the project; some might use some naming schema for functions to convey this, others might just workaround this by copying/cloning the data instead of passing pointers to avoid the risk.

In Rust, this is enforced by the compiler as it tracks “who owns the data”, and the owner has the duty of freeing the data. Obviously there can be only one owner at any point in time, because if not you’ll get double-free errors.

When a function creates some data, it becomes the owner of this data:

fn main() {
    let ace_of_spades = Card {
        number: 1,
        suit: 4,
    };
}

In here, main owns ace_of_spades and it’s responsible for freeing this data at the end. Therefore, there’s an implicit “drop(ace_of_spades)” at the end of the code block.

So far this is the same. But here’s the cool part in Rust: Ownership can be transferred:

fn main() {
   let ace_of_spades = Card {
       number: 1,
       suit: 1,
   };
   print_card(ace_of_spades);
}
fn print_card(card: Card) {
   println!("Number: {} Suit: {}", card.number, card.suit);
}

Now in the above code print_card receives the ownership of Card and will drop its contents at the end of the function. This means that main() now does not free the memory for ace_of_spades anymore.

But wait, how does Rust know that print_card will drop the data at the end? Because it takes the type “Card” instead of “&Card” or “&mut Card”. Whenever you see a full type in Rust without the ampersand, the value is owned.

For &Card and &mut Card what your function owns is the pointer to the memory address, but not the contents. In the same fashion, for things like Rc<T> or Box<T> the function owns the outer Rc or Box, and the behavior on the inner value depends on the actual type used.

What if we don’t want the type to be dropped? There are two solutions: one is to change print_card to receive a borrowed object (similar to a pointer), while the other way would be to copy the data before sending. The latter is what C++ does under the hood:

   print_card(ace_of_spades.clone());

In Rust we would need to implement how cloning works, but we could also just use “#[derive(Clone)]” on our struct to add the default implementation that just copies everything as-is.

So we can see that Rust by default “moves” the data, while C++ by default copies it.

If we wanted to change the print_card function to avoid freeing inside, it would be just adding an ampersand:

fn print_card(card: &Card) {

That tells the compiler that this function will not own that data and cannot free it. “card” is then treated as a pointer to a “Card” address and dropping the variable will drop only the pointer, not the underlying data.

The remaining function code doesn’t need any change. In C++ you’ll need to change the dot with an arrow to dereference the pointer, while Rust is smart enough to do this for you. Nice! you don’t need to think about dereferencing pointers in Rust.

(Note: “dereferencing” is to apply the asterisk operator “*ptr” where we signal that we want to operate on the contents of what the pointer is pointing to, instead of trying to work on the pointer itself as a variable)

But there is another line that needs to be changed, the call to print_card. As we want to retain ownership, we need to signal this to the compiler as well by adding an ampersand:

   print_card(&ace_of_spades);

By these simple rules Rust can always make a clear cut decision on how memory is allocated and freed, so we don’t have to worry about it. Instead, we need to worry about ownership and borrowing, but these are checked by the compiler and there’s no way to fool it. It is guaranteed that a program that compiles is memory safe and sound.

Could we instead just return the Card object to avoid freeing it? Sure! Let’s see:

fn main() {
    let mut ace_of_spades = Card {
        number: 1,
        suit: 1,
    };
    ace_of_spades = print_card(ace_of_spades);
}

fn print_card(card: Card) -> Card {
    println!("Number: {} Suit: {}", card.number, card.suit);
    return card;
}

This does the trick, “card” is no longer freed on the print_card function while retaining ownership. But this is not a good idea. First, we depend on the compiler to be smart enough to avoid moving the data twice. And second, this is almost an anti-pattern on Rust, it tends to cause more trouble than it solves. As a rule of thumb, if you don’t want the function to consume the data (to free it), don’t ask for ownership.

Ownership and borrowing can be thought as a permission system:

  • The owner has the highest permission. It can do everything it pleases with the data, because it’s their data. If I have my own car, I can do what I please with it, including disposing of it. There is always an owner, as the car needs to be registered to someone.
  • The &mut borrow comes next. It can change the contents of the data because the owner allowed them to do so. On a car, this is the mechanic where they can add/remove stuff from it, but they definitely can’t dispose of it. The car can only be on one mechanic at a time, and while the car is on the mechanic the owner cannot use it. In the same sense, while a &mut exists, the owner must wait until it finishes to be able to use it.
  • Finally the & borrow is shared read-only. It’s like when you allow your friends to see your car and take photos of it. There can be many people doing this at the same time, but while this happens you cannot send the car to the mechanic or dispose of it.

One important thing to remember is that others can make copies of the borrowed data. So for example, if in a function that it uses a shared borrow (&Card), if it needs to change the data it could copy it and then do the pertinent modifications.

For example, consider this function:

fn next_card(card: &Card) -> Card {
    let mut next = card.clone();
    next.number += 1;
    return next;
}

This function receives a borrowed card, and cannot change it. But we want to be able to add one to the number. What do we do? We clone it, then we change our copy. We can return the new copy afterwards.

We could have avoided the copy by having a “&mut Card” instead, then we could have mutated the same data in-place.

The beauty of this system is that the developer knows right away if a function will change the contents of the data we’re passing or not. A function receiving “&Card” will never change its contents, and the caller can continue using it for other stuff assuming it never changed.

Key points to remember:

  • Regular types in Rust are always Owned unless “&” or “&mut” is written before the type.
  • Ownership means that memory will be freed when it exits the scope.
  • There is always one owner. Not less, not more. One.
  • Cloning or copying the data is what other languages implicitly do. Don’t be afraid to do it.
  • Borrowing is the right tool to share data between functions when we don’t want the data to be freed at the end.
  • When creating functions, try to use the least permissive borrow that works.
  • Even if you need to change the data, remember that you can always change a copy of the data. Your function might not require changing the original data.

Copyable types

In Rust, copying has a special meaning. You might have noticed that we talk about cloning and copying as if they were two different things.

Well, because they are. Copying in Rust strictly means implicit byte by byte copying, while cloning is customizable and explicit.

Let’s forget about cloning for now and focus on just copying. Remember the byte representation of a Card struct we discussed before:

Copying this would mean that our program reads the bytes in memory and writes them elsewhere. For a second, it forgets that this is a “Card”. It just knows that it has 16 bytes of data so it does a copy-paste elsewhere.

The new data will have its owner which might be different from the old one. And as discussed before, this copy might happen from read only borrows. You don’t need ownership to be able to copy data.

Now, a question: Would this copying always work? Will the resulting data be correct?

Think about this for a second.

For regular values such as numbers and characters, it does work, no problem. 

But what about memory addresses? What would happen if it contained a pointer to somewhere else?

If the pointer is a shared read-only borrow, this will work out no problem. As there can be as many readers as we like, copying it works. When copying the pointer’s address, it still points to the same position so it must work.

For mutable borrows the only problem is that we would break the rules as there will be more than one pointer to the same address. Other than that, this would work: the pointer is still valid. But Rust will not let you copy a struct containing a &mut pointer to prevent you from breaking the rules.

Therefore there is data that can be copied and data that cannot be copied.

But wait! There is another possibility. The pointer might be something that is actually owned by the struct!

Remember the String implementation from before?

pub struct String {
  buf: *mut u8,
  len: usize,
}

This “buf” is actually owned by the String: when you create a new string, memory is also reserved for the buffer; and when it’s dropped, the contents buffer must be freed as well.

You might be asking now, is this some kind of trickery that only the Rust internals can do? Can we do the same in our Rust structs?

Yes! This is done by the type Box<T>. This stores a pointer to another datatype (or the same if you want to do something recursive) and the underlying data is owned by the same struct. For example:

pub struct HandOfCards {
    card1: Box<Card>,
    card2: Box<Card>,
    card3: Box<Card>,
    card4: Box<Card>,
}

This would make HandOfCards contain 4 pointers to 4 different cards. (Be aware that this implementation does not make sense in real code. I wrote this just to show Box<T>, but in this case it just wastes memory with no benefit.)

In this case, the memory of those “Card” needs to be allocated before creating HandOfCards, but it will be freed automatically when it exits the scope, as usual.

If we wanted to store an indeterminate amount of cards we could use Vec<Card> instead. Vec is similar to Box in the sense that stores a pointer to somewhere else, and when Vec is freed, the contents of that pointer are dropped as well.

Back to copying. If we try to copy those structs byte by byte the problem is that we will copy the pointer, but not the internal data; therefore that data will have now two owners, and not only this breaks the rules, it also will create a double free at some point; because once all the copies are dropped, the internal Box<T> will be freed more than once.

And this is the real reason why not all types can be copied byte by byte. In some cases it will create a double free error.

Which types are copyable is something defined by the Copy Trait. If the struct implements Copy, it is copyable. As easy as this:

impl Copy for Card {}

And now our struct Card is copyable. No need to explain Rust how to do this, or implement any method, as there is only one way of doing this. Usually we implement copy using the derive macro instead; this is just convenience, the macro just writes the code above:

#[derive(Copy)]

But as I said, not all types can be Copy. For example if we try the same for HandOfCards, this happens:

error[E0204]: the trait `Copy` may not be implemented for this type
   --> src/main.rs:234:6
    |
228 |     card1: Box<Card>,
    |     ---------------- this field does not implement `Copy`
...
234 | impl Copy for HandOfCards {}
    |      ^^^^

Because Box<T> does not implement Copy, a struct containing Box<T> can’t implement Copy either.

Turns out that it’s much easier to work with types that implement Copy than with types that don’t. For example, the following code does work only if Card implements Copy:

fn main() {
    let ace_of_spades = Card {
        number: 1,
        suit: 1,
    };
    print_card(ace_of_spades);
    print_card(ace_of_spades);
}

If it doesn’t implement Copy, the first print_card consumes ace_of_spades and it no longer exists when the second call is done. This program would not compile unless the Copy trait is implemented. When this is the case, ace_of_spades is copied for each of the calls, similar to what C++ does.

Key points to remember:

  • Copy in Rust means a byte by byte copying, without understanding the contents of the type.
  • Memory addresses can prevent Copy from working correctly, therefore some of their uses in a struct will forbid it from being copyable.
  • Shared read-only borrows (&var) are fine for copy but mutable ones (&mut) are not.
  • Box<T> can be used to have a pointer to owned data, but this also prevents the struct from implementing Copy.
  • Remember to implement Copy if possible. This will make your life easier.

Exceptions on copying and cloning everything

I know, I said to copy or clone everything and forget. But there are a few gotchas that we should cover about this. I lied a bit to make things easier.

First and foremost, C++ does not always clone stuff. Cloning is kind of a deep copy. For simple types it will copy them by default, but for complex ones it depends on the implementation. So take that with a pinch of salt.

Cloning too much has its drawbacks, obviously this will waste cycles. For small things it will not make a difference, but if you clone big values of course it will take time. And of course this also depends on how many times your program does this clone. (i.e. it’s not the same to do 10 clones than doing a single clone in a for loop of 1 million iterations)

Same applies to implicit copies. It’s a bit harder to get a lot of data copied than cloned, but it’s definitely possible (for example with arrays [T]).

Passing borrowed values to functions (&T or &mut T) instead of the value (T) will help preventing unnecessary copies.

Always copying/cloning can also lead to disappearing changes: you might accidentally write to a different copy than the one you intended to. Using a &mut reference or a Rc<T> could help.

Finally, implementing Copy on some types might be a bad idea as it could end with unintended behavior. For example, Iterators are not expected to implement Copy, as you could end with different copies of the iterators instead of consistently using the first one created. Deliberately not implementing Copy is a way for the author to convey how the type is intended to be used.

The main reason for me to insist on cloning being “good” is that I had a hard time when I started on Rc<T> types (kinda like garbage collected). Once I got the hang of cloning, it was much easier, and it turned out that Rc<T> (and other similar types) is meant to be cloned. Cloning is not bad unless there’s a lot of data to clone, and even if there is, it can be still fine when used in non performance critical parts of the program.

Move semantics

In Rust, all types are “move”, which means that creating a copy of the data must be valid as long as the initial version is destroyed. So, in this fashion I should be able to get any type and change its memory address from 0x100 to 0x200 by copying the data and removing the original and it should work.

This, of course, only works if there are no pointers to the initial data; this means that there are no borrows, either immutable or mutable. In the end, what this tells us is that ownership is required to move the data.

With one exception. Rust has std::mem::swap which accepts two &mut pointers and this moves the data. Because the contents are switched with another instance of the same type, this must be valid as well.

Now, does this work for every data type? Can we put anything we want in a struct and this trick still works?

Almost. There is one case where this fails completely. (I kind of hate that there are so many exceptions to the rule in programming)

If you build a self-referential struct, this fails. Let me explain.

Imagine you want a struct where it owns some data, but exposes a read-only buffer to it via a member; the point of this could be just preventing others from modifying it without going through the controlled methods:

struct MyData {
    buf: Vec<u8>,
    pub buffer: &Vec<u8>, // This must always point to &buf
}

This struct will break when moved, because *buffer will point to the old place whenever this moves and changes the memory address of MyData. Remember, “MyData.buf” address is just “&MyData+0”, so it depends on where the struct is placed in memory.

Because of this reason, Rust will not allow you to safely build self-referential structs. It is forbidden via the lifetime of the borrow, as you’ll need it to exist for the lifetime of the struct, and it’s not possible to tie these two in this way.

(The solution for the above code is to return the borrowed pointer in a method, in this way you don’t need it to be stored; But in real code this happens in very contrived scenarios that might not have an easy solution)

In my way of learning Rust I tried several times to create a self-referential one without noticing and I ended fighting lifetimes with the borrow checker for hours until I gave up. It’s directly impossible, because this breaks move semantics but Rust will blame the lifetimes because it doesn’t understand that this is self-referential. And it doesn’t understand these because it can’t support them without unsafe code.

Did I say unsafe? Can this be built on unsafe code? Oh, yes, it can. But it will break in crazy ways every time it is used. Rust will move the contents in memory often without warning as it is implicit. It would be really hard to use them without making a mess.

So, are they really impossible to do in Rust in a correct way? Not quite. There’s something called Pin<T> for these purposes.

Let’s think a bit. If moving is only possible with ownership or a &mut reference, if we hide the variable under a type that doesn’t allow these, the inner variable is guaranteed to not move in memory. The outer type can expose its own memory address that can be moved freely, but the underlying pointer is fixed in memory. This basically means that only &T borrows are allowed.

This is exactly what Pin<T> does. But to use Pin<T> to effectively make self-referential structs still requires unsafe. This is because, again, Rust does not have tooling to explain this to the compiler.

(Note: Docs might also point to a different reason, Pin<T> will not allow creation of Pin<T> where T is Unpin)

Ideally, you don’t want to make a self-referential struct in Rust. Instead you should leverage other  types to accomplish the same thing; for example Rc<T> can be used for things like linked-lists or trees. Also, have a look for libraries that might do this work for you, as these are very easy to do wrong.

It’s possible to encounter Pin<T> in Rust, just unlikely. The most common place is async programming where the future trait requires Pinning to implement. Pin<T> can be created without unsafe as long as the data follows move semantics; so for regular programming Pin<T> is just something that you might need to create or access. A bit of a burden, but that’s it.

As a side note, let me add that Rust does not guarantee that memory does not leak. Correct Rust programs will not leak, but it’s actually quite easy to make a program in Rust that leaks memory without unsafe. As with GC languages, if you have Rc<T> values in a cyclic reference, they will fail to free memory. And also there’s std::mem::forget which will remove a variable from scope, making it unreachable, but will not call destructors and will not free it; therefore causing a leak.

Key points to remember:

  • All types in Rust are “Move”, meaning they can change their memory address without problems.
  • mem::swap can be used with two &mut references to swap the contents of two variables. This is also considered a move.
  • Self-referential data structures cannot be made in Rust without unsafe, because they can’t be moved safely.
  • Self-referential data with unsafe still must use Pin<T> to ensure the data does not move around.
  • The recommendation is to avoid them entirely and use Rc<T> instead. If possible, use libraries and don’t implement trees or linked lists yourself.
  • You might have to work with Pin<T> when doing async programming.

Back to the beginning

Remember we got puzzled by this simple code? Have a look again:

let a: String = "Hello world".to_string();
let b: String = a;

Now if I ask you what’s happening here, it should be easier to reason. First this creates a new String type and stores it into “a”. Then, because String can’t be Copy (it must contain an owned reference to a buffer for the text), it can only be moved. Therefore Rust copies the contents of “a” into “b” and drops the memory for “a”. There’s no other way around to make this possible, so now it does make sense.

(Note: This case is too simplistic and it doesn’t make sense for Rust to move the data. It will instead point “b” to the address of “a”, and forget “a”. This is one of many optimizations inside Rust that will come into play to reduce the code produced to the absolute minimum)

You wanted to have two different copies? Sure!

let b: String = a.clone();

You wanted to have two variables pointing to the same thing, so it doesn’t use twice the memory? Sure!

let b = &a;

Once we understand what’s happening under the hood, the behavior becomes self evident, right?

If instead of a string we had a number, these are copyable, so in this case Rust will copy and not drop:

let a: i64 = 123;
let b: i64 = a;
dbg!(a,b); // This will now print both variables

Because “a” doesn’t need to be dropped, as it still is consistent, it remains valid after assigning, and we end with two variables with the same content.

I hope this helps understanding why Rust acts in the way it does. I know that lifetimes are still missing, but this was a bit too much already; maybe later I’ll write something about them. In the meantime, please let me know if it helped or any questions!

When to think about using Cloud for your service

Three years ago I wrote “The Cloud is overrated!“. Since then I joined Google as an SRE, and I’ve been asking myself if Cloud does make sense for me or not. Even before joining Google, their GCP was my first option to go for Cloud; it’s seems quite good and of the three major providers (along with AWS and Azure) is the cheapest option. And let’s be fair, my main complain on Cloud is price. Vendor lock-in is my second concern and Google again seems to be the fairest of the three. Anyway, this isn’t about which is better but more about if when it’s a good idea.

Proper Cloud deployments are pricey and also require a lot of resources from developers; if it has to be done right, it’s not about deploying a WordPress inside a VPS style service on the cloud.

What is Cloud about?

Cloud is about having easy to use tools to deploy scalable and reliable applications without having to worry yourself on how to implement the details.

We need to think about scaling and zero downtime. These are the only two factors that will determine if you should pay the extra cost or not.

Everything else are extra services that they provide for you, such as Machine Learning. If you want to use those services, you could always setup the minimum on the given Cloud to make it work and call it from the outside, no problem. So these are out of my analysis here.

Vertical Scaling

When you deploy an application in a server, if later you need more resources you’ll need to migrate it to a different, beefier server if it no longer fits. In an VPS you have usually the option to upgrade it to have more compute resources as well.

In Cloud, the range of machines you can run code on is quite big. From tiny (1/2 CPU, 1 GiB RAM) to enormous (32 CPU, 512 GiB RAM). This gives quite the flexibility to keep growing any service as needed.

The other thing is that they allow for fast upgrade and downgrade, and also automate it. This can be used to reduce the cost overnight when there are less load. But be aware that even with this, it’s highly unlikely that you’ll get a cheaper option than a bare metal server.

Same as an VPS, Cloud services usually guarantee data consistency; no need to do maintenance or migrations because the disks fail. This is the downside of bare metal servers: you need to handle the maintenance and migrate to a new server if the disks start to fail, having risks of data loss.

Horizontal Scaling

This kind of scaling refers to splitting the service into different partial copies so they can work together, in parallel. This is specially needed when the service itself won’t run in a single machine.

The problem here is that most of the time applications are stateful, and this means that the state needs to be split or replicated across the different instances.

Cloud here helps by having database services and file sharing services that can do it for you, so your service can be stateless and leaves the complexity to the Cloud provider.

In Cloud, you can also spawn dynamically more instances of your services to handle the load.

Reducing downtime to zero

This is basically done by replicating data and services across different data centers. If one goes down, your service will be still up somewhere else.

This is the most important part I believe, so I’ll leave the details for later.

When should we think about using Cloud?

This is an important decision as it’s hard to convert a typical service (monolithic, single thing that does it all) into something that it’s going to make good use of the Cloud benefits. It’s better to do this on the design phase if possible.

Sharding

In the recent years has been a boom between the “Big Data” and Cloud, and everyone talks about NoSQL, sharding (horizontal scaling), etc. But all this has been just a lot of buzzwords, a way of looking cool. Is it really that cool for everyone?

All these things are meant for horizontal scaling (sharding), which means that we expect to use more than one machine for one of the services (i.e. database).

It sounds really cool, but it’s not really worth it for the majority of cases. Unless you have a big project on hand, chances are that it fits in an average server.

Why not use sharding anyway? well, it’s usually more expensive to have 5 machines running than a single one with all that power together. Sharding will impose a lot of design restrictions that are quite hard to handle, so it will substantially increase the time to develop the application. Unexpected requirements along the way will sometimes require a full redesign, because sharding requires certain premises to be true (how to split the service), and cannot be changed on the way without a lot of effort.

The other problem of sharding is that it’s always less efficient to use X machines than X threads. And X threads is less efficient than using a single-thread CPU X times more powerful. Parallelizing does not linearly scale, there’s a trade-off, always think about this.

Cloud is not (only) sharding, and sharding is not Cloud. If your service will never need to span more than one computer, there’s no point of adding the complexity.

I would recommend to plot a forecast of growth for your service for 5-10 years. Also plot the forecast for server growth, it usually increases 2x every two years (See Moore’s law). If your growth seems to be close to that, definitely you need to consider sharding from the start. Also think that there are periods of stagnation, where there are no improvements on certain areas for years.

If you go for sharding, the databases provided by the Cloud provider will make your life much easier, but they will be your vendor lock-in. Once the application is coded with a particular Cloud DB in mind, it will be quite hard to move away from that provider later. If this is a concern, look on how to make it generic enough, there are usually projects that let you change the DB or offer a plugin to connect to these DB, so you can swap later with less effort.

If you doubt, go for sharding. If you already need >25% of the biggest machine available, go for sharding. Better safe than sorry.

Replication

For me, here lies what applies to most applications and companies: How much is worth your downtime? How much is worth your data?

A server can fail, an entire data center can be struck by a lightning or engulfed in flames. Assuming you have your backups off-site, how much data is lost in this scenario? hours, a day, or week? How much time will be needed to get it back and running in a new server?

For example, in a server I use for a personal project I do a on-site database backup every two days, and a off-site full disk backup every day. This means that I can have one or two days of data loss. But if it happens, it will take me 5 days to get it up and running (because it’s a weird setup and I can only use my spare time). In this case the downtime and the data is almost worth zero, as it generates no revenue for me while it costs money. Still, the amount of time that would be needed to set it back up is something that I need to fix.

To minimize these scenarios we use replication. This will always be off-site replication. Sharding must be in-site (same DC) and Replication is better if it is off-site.

If you use sharding while managing the database, you can choose to have a fraction of the servers for redundancy. In this case, N+2 is always recommended. If you need 5 servers to handle the load, have 7 so at least 2 servers can fail. When using RAID yourself, I would recommend RAID 6. In most cases this will not apply.

Regardless, you need a full working copy elsewhere. Here you can go N+1 or N+2. Having another set of servers far away that are running the software in parallel avoids having an outage that can last weeks.

When using Cloud you can take advantage of the huge network between the different data centers. That is, they usually have another network that it’s not internet that is blazing fast and small on ping times that you can use to communicate between them, making real-time replication across the servers possible. Anyway, don’t go crazy and don’t set up the different servers very far away, as fast as those networks they can be, they still have to obey physics and are tied to the speed of light limit (no kidding here, light travels roughly at 50% of c on fiber and this can be used to estimate ping times)

If you want to use a regular ISP with VPS services, check if they also have an internal network interconnecting the data centers; this is starting to become the norm lately.

The problem with replication is that the cost for running the service is now 2x or 3x, as you need way more space and servers than before.

If cost is a problem, I would recommend to do only primary + “warm” read-only secondary. This means that all writes go to primary, and the secondary is only writing back those changes at real time. In an incident, you might lose seconds of data that might have not been written to the secondary yet. If this is a problem, you can look if the database allows for waiting until the secondary confirms the data is there. This will come with a huge penalty on write speed and latency.

The secondary could be smaller than the primary, or be used for other stuff. Only writing back data uses a very small amount of resources (but the same amount on disk space). In this case, if the secondary needs to be promoted to primary is possible that it suffocates on the amount of load, and the application would be almost unavailable until a new server is turned up. So it’s best to avoid having small secondaries if possible as this approach only serves to back up data with a resolution of seconds, but it will not be good enough for taking over.

On Cloud, they can also automate this replication for your database and files, and even automate the change from secondary replica to primary when things fail. Sharded databases do this best.

My final thoughts

I find Cloud products prohibitively expensive for my personal projects, adding proper replication makes them even more out of reach.

But on the other hand, I find extremely difficult to properly prepare automation for replica and takeover. These things are difficult to do and to test to ensure they will not hurt instead of helping.

So it seems that either there is not much money involved and the risk of data loss or downtime is not a big deal, or it actually offsets and then Cloud seems to have a price that is quite justified.

In the end this is about if you want to take the risks yourself or you want to pay extra so someone else deals with it. Generally I would go with the second and rest easy.

What if cryptocurrencies were used to perform useful work?

With BitCoin using more than 140TWh per year or 15GW and growing, we must ask ourselves, are they really worth that much? Are they providing any useful work?

15GW is not that much globally speaking, but to put this into perspective a Nuclear Power plant on average produces 1GW, therefore this means that we need 15 nuclear plants to keep mining BitCoins.

I have never been a believer of BitCoin and similar, per-se they have a lot of costs and don’t provide that much usefulness. The idea surely is interesting, and I really like the concept of decentralising the money from banks, entities and governments, but the cost is currently just too high.

Also we need to keep in mind that money is anything that we give a value and we desire to exchange for goods, and with that, almost anything can be used as long is not perishable or easily obtained or duplicated.

“Almost anything” is certainly not 15 nuclear power plants in cost. Also, if people don’t switch to use the currency it is of no use. The amount of goods that can be purchased with cryptocurrencies is certainly dim.

A currency should also retain its value over time, and the volatility of the crypto market is so high that holding onto crypto can be either extremely profitable or completely wet paper from one day to the next. Product pricing having to change every day or hour is not something that anyone wants to do.

Chia is another cryptocurrency that is getting famous on the last months. The idea of being way more “eco-friendly” by not wasting so much energy and instead requiring disk space is somewhat encouraging. This, of course, has led to retailers to increase on prices of HDDs as they saw a surge in demand. And it’s still not without its cost, as it still consumes a lot of power, just way less than BitCoin or others using proof of work.

I feel that cryptocurrencies give most of their value from their features, as smart contracts and similar. Ethereum is one of the most cited for these and Chia also has their own set of features.

The energy and monetary cost of running crypto should be justified with useful work they provide. Regular paper money provides useful work, by removing the burden of trading goods for goods; this is also true for crypto but it’s not enough by several orders of magnitude.

Some features could aid on some legal aspects that would reduce human effort in a lot of areas, but this needs to be used by governments or accepted to be of any use. And usually governments are decades behind on tech stuff so I don’t see this happening on the near term. Also the fact that they have to put their trust on something they don’t manage sounds quite a blocker to me.

In short, the amount of money saved by doing something using crypto has to overcome the energy cost by a good margin. If not, it’s not a good solution. It’s that simple.

To give an example, computers for accounting purposes had to become cheaper than doing the same thing manually; if that weren’t the case we would be doing it with pen and paper. It’s not because “it’s convenient” or “faster”, it’s because having a human doing the same tasks costs a lot more than purchasing and owning a computer. As for speed, it’s also because time is money and you can translate it back. Having the right information faster, flawless, has a value and you can put a price to it.

So I think of blockchain systems as something still very cool but also very immature. It will get there, but unless something revolutionary happens in the middle, it will still take a lot of years to see wide usage. It came ahead of its time and probably we’re not there yet to profit from them.

(At the current moment https://chiacalculator.com/ reports that 1 PiB of space would gain $62,000 per month, investing less than $20,000 in a server; this is so ridiculous that I expect it to be corrected by supply in the next months. In fact users in r/Chia already report no gains from it; the amount of people entering because of the investment prospects is probably saturating the network and make it really hard to win anything)

An idea came to my mind recently

…and most probably is either stupid or unfeasible. I don’t have much background on blockchain and not enough maths to go for it. But in case it inspires someone, here it is.

Chia network basically seems to make servers store “trash data” to prove they actually allocated the space, thereby the proof of space (yes, I know it’s much more complicated, but I love oversimplifying).

I was thinking… what if instead of storing crap data they actually stored customer data?

Chia has recently reached the 1 Exabyte of storage. Storing someone’s data has a value. And selling that capacity can be worth millions, specially in Cloud scenarios.

A decentralised storage run by users has already a name and it’s called P2P; some implementations being BitTorrent.

But those networks relied on the willingness of users to serve files for free, and nowadays is mostly used to combine bandwidth of several servers so the download can get the fastest transfer possible.

Instead, what I’m talking about is more in-line on this famous Linus Torvalds quote:

Only wimps use tape backup: real men just upload their important stuff on ftp, and let the rest of the world mirror it 😉

Torvalds, Linus (1996-07-20). Message. linux-kernel mailing list. Source: Wikiquote

Can you upload your backups to BitTorrent and rely on others mirroring it? no. Heck, even FTP is no longer an option as no one uses it anymore.

Imagine a network where we could send data for retention and pay for that storage in cryptocurrency. Anybody could setup a server for it and be in the pool to store anybody’s else data for money.

The most basic usage of this network would be backups. Upload a backup, put a price on it, and people would start replicating it to cash that money. The more money you put, more replicas will be worth doing. You don’t need that old backup anymore? Stop paying for it. It will be gone in days as they began to find more profitable data to store.

Of course, anything that you upload would be public. So if you don’t want it to be public, you need to encrypt all the data with a secret key. The encryption used in particular would be the uploader’s choice; they might want to use symmetric or asymmetric encryption (although asymmetric is more risky because it has inherently more attack vectors).

The price for storage would fluctuate like the stock market does. As more people jump into the pool, the price would fall down until it no longer makes economic sense to do so. And as more people uploads more data into it the price per replica would rise up.

You don’t want to pay for the space you’re using elsewhere? It’s fine! just enter the network with the same server with the extra space you’ll need and you’ll be getting crypto for it at the current value, for which you can exchange to get your data saved elsewhere. If the price of storage goes up, so does your profits from storing other people data. Now you don’t need to pay for different servers in different regions to guarantee that the data will be recoverable if your only server fails. You could also use your home computer to do this to do the exchange if you like. Or tell your computer to prioritize your own data.

This idea could be expanded to a lot of interesting use cases, but at first glance it has several problems:

  • You don’t know what you’re storing, or from whom. This could mean that your server might contain illegal material without knowing it. But hopefully the payment is trackable.
    • ISPs and others also offer storage and can’t really check what’s on it, specially if it’s encrypted. I guess the law could track the payment and pursue the uploader/cut off the payment if that was a problem.
  • A single machine/location could try to claim that it has >1 copies of the same data, which in reality it’s pointless.
    • Filtering based on IP might not work as a machine can have >1 IP.
    • Ping time analysis to check that replicas are far apart could be tricked by having lots of small servers that actually fetch the data from the same place.
    • Encrypting with different keys the same data could ensure that the data is effectively copied. But it’s burdensome and anyway the network requires at least 2 plain copies to be able to verify that the other end actually contains the data.

I guess that I’m missing other risks and problems. And it’s also possible that they have some form of workaround; As I said, I’m not any kind of expert on these kind of systems to be able to outline a solution myself.

Nonetheless this seems an idea worth exploring. It’s possible that the usage could be extended outside of backups, for data that it’s modified often.

If all the state of an application could be stored in a network like this, then everything that requires to be deployed is basically stateless and can leverage Cloud very easily and cheaply. This, of course, would mean that a database can be run and modified quickly in this way, which is no easy feat.

But circling back to the beginning, a network like this would deliver actual work with actual value that would overcome the cost of running it. Therefore it will give use to the coin, and create a market based on supply-demand, not on speculation.

So as I said, it’s just an idea that crossed my mind. What are your thoughts? Seems interesting to you?

OVH lost a datacenter on a fire, what do we learn?

We all do mistakes, but it’s easy to point fingers at others. Twitter and other social media platforms have been full of hate against OVH for the outage and a lot of lack of empathy for the people working on the problem. I’m afraid we will not learn anything, and this will repeat on the future.

The problem is not OVH, but us. We think we’re smart because we’re saving pennies, but in the end we fail at calculating risks and costs in such events; we deem these as impossible, because anything with a probability lower than 0.001% is just 0% for the human brain.

It is also not about choosing a different provider, there’s nothing too special about OVH that makes it more risky than other solutions when purchasing the same type of products. Cloud services also have the same risks when doing the same (naive) stuff.

And we should stop pretending that we all do things in the right way. Because we don’t, and we know it. Basically we’re afraid to admit it.

So yes, I am also at fault. I have more things without proper backup than I would like to admit. (I am talking at personal level here; at my workplace it’s handled quite rigorously)

A short story – I also got impacted by OVH

Few weeks ago (or maybe a month, I don’t recall anymore) I started playing Valheim with friends. It’s a nice game to pass time on; ideal for these days where we don’t have much to do outside our homes. Because we had to rely on a single player to be able to join, I thought, let’s create a proper Valheim Server.

So I went and got a small VPS in OVH and followed the process to enable Valheim on that, configured a DNS to be easy to fill in, and we began gaming together on the server.

And a week or so after I saw the tweets from several people about OVH data center on fire. I checked my VPS and surprise surprise… it doesn’t work.

I checked and it seems that VPS is in SBG3, so the data is there. I decided to just patiently wait until they manage to put things back on service.

I didn’t do any single backup, and we could have lost the 10+ hours of gaming between two people there. Surely, this is almost nothing. I could just have started another server from scratch, no much problem. We would have lost some items and progress, but considering that I already have like 170 hours in-game, 10 hours doesn’t seem that much.

But as for how unlucky I might have been by having the incident so close after starting, I also see this as a blessing; other people and companies lost a lot of critical data with no means of recovering it ever — because even if they shipped the remaining of the drives to you, no amount of money will make it for any data recovery company to restore anything from such fire.

The moral of the story here for me is that off-site backups are cheap. Don’t wait for tomorrow to take care of it.

I even recall myself seeing the “backup” option on OVH when purchasing and dismissing it. And now I’m like… why? why did I do that?

Backups are cheap and easy

We have lots of complex problems in IT, and backups are not one of those. Still we lack diligence to perform them, automate them and verify them. I don’t understand why, but happens everywhere, to everyone.

Maybe this is something that should be managed by default, by a third party. It’s money well spent. And when you want to point fingers, you have all the right to do it.

A backup is not going to save us from downtime, in fact in such an event it could take from hours to days for a company to get the service back up. But definitely this scenario is still way better than losing all the data; this single event could put a company out of business if it weren’t for backups.

I see a lot of people blaming others for having the backups “in-server”. I know this seems obvious but anyway: The problem is not because they have the backups in-site, the problem is the lack of off-site ones.

This subtle distinction is important, as on-site backups should be done as well. The reason is that the same server is more reliable than another server; you don’t know if the other server is up when the backup schedules, or if the credentials expired. Off-site backups might fail for more reasons than in-site ones. Also, on-site ones are faster to retrieve and manage/edit. This overall leads to a lower MTR on typical failures.

When you have the on-site backup done, then you just copy it over to another server, service, tape, or whatever you like.

A very cheap way to do this is to just have a small server at the office and copy them from your server to the office. If the server you want to backup is at the office, then the backup needs to be sent elsewhere, you can take a disk home or use the internet connection to upload to a shared drive.

As it should become clear from the tone of this post, I’m trying to target people that don’t have a proper Disaster Recovery strategy. I hope this helps someone to avoid this kind of scenario in the future.

Just to be clear, this post is not about what should be the proper practice. It’s about the bare minimums. If I ranted here with the proper practices as an SRE, I would scare a lot of people off. And that’s not what I want.

So let’s keep things simple: If you’re not doing an off-site backup, start ASAP. Just add something. If you don’t know or don’t want to mess with this, ask your provider to do it for you for a price.

My opinion on how OVH handled this incident

These things happen. It is not unheard from me of a major incident in a datacenter. Fires, lightning, wildlife… these happen, just not frequently enough for most people to remember.

OVH did really well on handling this publically. They were transparent, shared a lot of details and tried to help all customers. It sounds like they also tried to go in a per-customer basis when those asked for support. That’s very good.

The velocity at they’re solving, fixing and rebuilding is also outstanding. Congrats to all the teams involved, that’s really hard work.

And again, the outage and data loss is not OVH’s fault. It’s customer’s fault. That is clear as a day for me.

What I think it’s not that good after this is, the customer portal (dashboards, etc) weren’t properly working until several days later. That suggests that OVH systems themselves aren’t completely prepared to handle an event of this scale.

After Strasbourg fire, OVHcloud plans to power servers up starting this  week - DCD

Also, the photos of the site show that the different buildings are really crumbled together. What’s the point of splitting the data centers in buildings if they’re so close that they will surely be impacted at the same time?

The building that got fire looks like it’s made from containers. I have zero insight on how they’re on the inside, but this suggests that OVH cheaped out when building sites. How this might had contributed on this event, no idea. They’ve an investigation ongoing and hope they also make some details public.

Something that I’m quite sure of, is that this event will hit OVH reputation and they will have to make thinks way better than the competition to regain it. Because of this, probably in the next 5 years OVH will be better prepared than most ISPs to prevent and handle these incidents.

They also stated that they’re going to do customer backups for free, without asking from the customer. And this is a great movement, I hope it gains traction and it makes other companies move on the same direction.

This incident made me realize that not only a datacenter is susceptible to major events, also the whole site is. And the major event it might not only take the site out of the grid, it might also destroy all the data inside.

Think for example on a scenario where a violent lightning strikes on the site. All power lines are usually connected to every building. Then, it’s not hard to imagine that an UPS might catch fire because of this. If the buildings are built equally and the lightning is bad enough, it could put on fire several UPS on every building. By the time firefighters are there, the whole site is already engulfed in flames. Say bye-bye to any data you had in there. (This is a far fetched scenario with zero idea on how DC are built)

What this made me realize is that backups that don’t leave the area are risky, and replication across buildings is, at least, not enough. Surely OVH took note as well.

How to avoid these events from impacting

Backups are great, but when something like this happens, you’ll have a hard time bringing stuff back up. For some cases this is an acceptable outcome, for others it isn’t.

I’m sure everyone heard by now about Docker and maybe Kubernetes too. This is one of the scenarios that they come in handy. Twitter also sometimes has some hate against Kubernetes, but those that were using it properly while this happened probably you had zero downtime.

First, we need to understand that any running application has installation/dependencies, code/binaries, configs, assets, and state. Maybe your application doesn’t have all of those, that’s ok. But it probably has state, for example a database. Or a folder with user uploaded data.

State is anything that changes upon user interaction, that records the current state of the application. We need the state to be isolated from anything else. That means that our application code should not be in the database, and the user uploaded content is not inside the application code folder.

We should be able to say: this folder is code, this other folder is user-data (state), this database is state, etc. Without mixing them up. And being able to back-up them separately.

There are lots of reasons to do this, but the one I want to highlight is that the state changes much more frequently than anything else and it’s outside of our control. The other pieces can usually be restored in an easier way.

For example, let’s say our application code in the server gets deleted. Is it a problem? Not much, I bet you have a copy in the office, in a Git repo, or in a co-worker workstation. Did you lost 10-20 commits? Well, that’s bad, but I bet you can remember more or less what were the last things you did and more or less code them again.

On the other hand, say some user photos got corrupted. Can you get them back? without a proper backup, you can’t. You aren’t who crafted that data, so without a backup all you can do is ask your users to re-upload. This is bad.

For backups you can get away by backing up everything together, but if we want to go further, to replication, this is no longer an option. The state needs to be treated differently.

With Docker you can bundle the dependencies of your application. Then deploying it is just a matter of running the code in the docker container.

The server installation can be also automated with Ansible, if you need it. But if you don’t install several servers, doing it manually can be less toilsome. A bash script can be a good mid-point as well.

These two parts are very easy to spread across a fleet of servers, they’re usually constant. And when you want to update the app, you either change the Docker image or replace the code (depends on your approach). The problem is, as you will see, the state.

Files or user uploaded content sometimes it can be just rsync’ed over. The problem is that even if you synchronize these every minute, there will be a time window where one server will not have the files and might fail requests.

Databases can be in their own docker, but the state needs to be stored outside, because Docker will remove it upon restart and/or limit their max size. Then the problem becomes on how to replicate this data. Having two servers with different state can be very harmful; and this can’t be copied over.

In databases, a primary-secondary replication can be useful for nearby replication. For truly off-site, in a different region or site, it becomes a problem because routing write queries to the primary has a latency and a possible network bottleneck. Another option for this case would be having the secondary as a Hot Stand-by: it doesn’t process any query, it’s just there waiting until an incident happens; in case of a failure, it could be quickly reconfigured as the primary and takeover.

In this way, the uploaded content can also use a simple rsync as stated before. As both servers are not actually serving at the same time, there’s simply no consistency problem. And in the event of a total failure, yes, you might lose a few seconds of database changes and a minute or two of user uploaded content. While this is bad, as long as you’re not handling user payments or similar, it should be fine.

This setup assumes that someone should manually configure the secondary when an event happens. And as with everything, it can be automated. But when compared to only backups, this, even as manual as it is, is way better. A change in configuration may take 5-15 minutes. Restoring a backup and potentially installing a new server takes lots of hours. And the backup is going to be older than 2-3 minutes; probably a day old. So I’d say this is way better.

And then we have Kubernetes and similar tools. These can handle application updates, load balancing and such. Investing on it will make these events just invisible; not only to your users, also to your own developers. If something happens during a weekend, automation will take care of it. No need to wake anyone up on the middle of the night.

Ideally, we would use more sophisticated approaches such as master-master replication and special filesystems in order to have a true replication working. NoSQL databases can also help simplifying the master-master replica.

If we deploy in a cloud service, such as AWS or Google Cloud, they also have their own products for doing a lot of this; so it’s easier to just get their database and storage options which has the needed options for doing all this stuff seamlessly, without having to worry.

The benefit of Cloud products is not that they’re more resilient than other ISPs, as I said at the beginning this kind of issue happens to everyone. The true difference is that you get access to their extra services that will make your life much easier when creating a true zero downtime application.

The Disaster Recovery Server

Sure, backup is easy but has drawbacks. Replication is nice but hard to set up. Cloud is expensive. Is there anything in the middle?

Yes! The Disaster Recovery is a quite easy concept. You just have another server, in another location. In this server, basically we have to repeat all the installation of our application. It should receive constant backups (or be the secondary in a replica if we want to do it better) and be more or less ready to take over.

This is quite straightforward to setup, if you were able to set up one server, you can set up two. The DR server just sits there, getting updated every X amount of time. And it should be more or less ready to take over.

When an even strikes, we just update the DNS to point to the DR server, and ensure everything is up. This might be around 30 minutes of work / downtime.

Because this server is only for short periods of time, if you want to save even more money, you could get something smaller than the regular server. All you need is that is able to run the application without crashing or freezing. So a slower hard drive and processor might be possible. Just be careful that when the load shifts there it might overload the server, and that would defeat the point of having it.

If you have several applications, you could host half of them on one server and the other half in the other server. In the event that one fails, just enable them all in the remaining server.

Then, the only thing remaining is to have a schedule to quarterly test the DR server and do a simulation of this scenario. Shift the load to the DR server for a few hours and inspect how it performs.

If it does worse than the main server, that’s okay, as long as it can deliver. After testing this, just consider: Could we hold on this server for a month if we need, or it will be a problem? if it’s fine for month, it’s fine for a DR server.

I think this approach is the easier and cheaper of them all. If you have any systems that it’s below this bar, you should seriously think to at least do this.

Conclusions

  • If you lost data, it’s your fault, not your provider.
  • If you don’t have automated off-site backups, start doing them now. Maybe you weren’t impacted by this instance, but you might on the next one, on this provider or on any provider.
  • If you don’t have a Disaster Recovery server (or better), think seriously about start doing it, and schedule tests quarterly.
  • The next time someone makes fun of containers, Kubernetes or similar I’ll ask them about their DR plan.

Bonus: If you ask me how it really should be done…

  • Follow 3-2-1 strategy or better: https://www.backblaze.com/blog/the-3-2-1-backup-strategy/
  • Test your backups regularly (monthly)
  • Set up monitoring for backups, and test you get mail if backup fails, or fails to copy off-site.
  • N+2 Replication. If you need just one server, get three and set up replication and load balancing.
  • Neither RAID or replication are backups. Do backups, always.

Rust vs Python: Rust will not replace Python

I love Python, I used it for 10+ years. I also love Rust, I have been learning it for the last year. I wanted a language to replace Python, I looked into Go and became disappointed. I’m excited about Rust, but it’s clear to me that it’s not going to replace Python.

In some parts, yes. There are small niches where Rust can be better than Python and replace it. Games and Microservices seem ones of the best candidates, but Rust will need a lot of time to get there. GUI programs have also a very good opportunity, but the fact that Rust model is too different from regular OOP makes it hard to integrate with existing toolkits, and a GUI toolkit is not something easy to do from scratch.

On CLI programs and utilities, Go is probably to prevent Rust from gaining some ground here. Go is clearly targeted towards this particular scenario, is really simple to learn and code, and it does this really well.

What Python lacks

To understand what are the opportunities from other languages to replace Python we should first look to what are the shortfalls of Python.

Static Typing

There are lots of things that Python could improve, but lately I feel that types are one of the top problems that need to be fixed, and it actually looks it’s fixable.

Python, like Javascript, is completely not typed. You can’t easily control what are the input and output types of functions, or what are the types of local variables.

There’s the option now to type your variables and check it with programs like MyPy or PyType. This is good and a huge step forward, but insufficient.

When coding, having IDE autocompletion, suggestions and inspection helps a lot when writing code, as it speeds up the developer by reducing round-trips to the documentation. On complex codebases it really helps a lot because you don’t need to navigate through lots of files to determine what’s the type that you’re trying to access.

Without types, an IDE is almost unable to determine what are the contents of a variable. It needs to guess and it’s not good. Currently, I don’t know of any autocompletion in Python solely based on MyPy.

If types were enforced by Python, then the compiler/interpreter could do some extra optimizations that aren’t possible now.

Also, there’s the problem of big codebases in Python with contributions of non-senior Python programmers. A senior developer will try to assume a “contract” for functions and objects, like, what are the “valid” inputs for that it works, what are valid outputs that must be checked from the caller. Having strict types is a good reminder for not so experienced people to have consistent designs and checks.

Just have a look on how Typescript improved upon JavaScript by just requiring types. Taking a step further and making Python enforce a minimum, so the developer needs to specify that doesn’t want to type something it will make programs easier to maintain overall. Of course this needs a way to disable it, as forcing it on every scenario would kill a lot of good things on python.

And this needs to be enforced down to libraries. The current problem is that a lot of libraries just don’t care, and if someone wants to enforce it, it gets painful as the number of dependencies increase.

Static analysis in Python exists, but it is weak. Having types enforced would allow to better, faster, and more comprehensive static analysis tools to appear. This is a strong point in Rust, as the compiler itself is doing already a lot of static analysis. If you add other tools like Cargo Clippy, it gets even better.

All of this is important to keep the codebase clean and neat, and to catch bugs before running the code.

Performance

The fact that Python is one of the slowest programming languages in use shouldn’t be news to anyone. But as I covered before in this blog, this is more nuanced than it seems at first.

Python makes heavy use of integration with C libraries, and that’s where its power unleashes. C code called from Python is still going at C speed, and while that is running the GIL is released, allowing you to do a slight multithreading.

The slowness of Python comes from the amount of magic it can do, the fact that almost anything can be replaced, mocked, whatever you want. This makes Python specially good when designing complex logic, as it is able to hide it very nicely. And monkey-patching is very useful in several scenarios.

Python works really well with Machine Learning tooling, as it is a good interface to design what the ML libraries should do. It might be slow, but a few lines of code that configure the underlying libraries take almost zero time, and those libraries do the hard work. So ML in Python is really fast and convenient.

Also, don’t forget that when such levels of introspection and “magic” are needed, regardless of the language, it is slow. This can be seen when comparing ORMs between Python and Go. As soon as the ORM is doing the magic for you, it becomes slow, in any language. To avoid this from happening you need an ORM that it’s simple, and not that automatic and convenient.

The problem arises when we need to do something where a library (that interfaces C) doesn’t exist. We end coding the actual thing manually and this becomes painfully slow.

PyPy solves part of the problem. It is able to optimize some pure python code and run it to speeds near to Javascript and Go (Note that Javascript is really fast to run). There are two problems with this approach, the first one is that the majority of python code can’t be optimized enough to get good performance. The second problem is that PyPy is not compatible with all libraries, since the libraries need to be compiled against PyPy instead of CPython.

If Python were stricter by default, allowing for wizardry stuff only when the developer really needs it, and enforcing this via annotations (types and so), I guess that both PyPy and CPython could optimize it further as it can do better assumptions on how the code is supposed to run.

The ML libraries and similar ones are able to build C code on the fly, and that should be possible for CPython itself too. If Python included a sub-language to do high-performance stuff, even if it takes more time to start a program, it would allow programmers to optimize the critical parts of the code that are specially slow. But this needs to be included on the main language and bundled on every Python installation. That would also mean that some libraries could get away with pure-python, without having to release binaries, which in turn, will increase the compatibility of these with other interpreters like PyPy.

There’s Cython and Pyrex, which I used on the past, but the problem on these is that it will force you to build the code for the different CPU targets and python versions, and that’s hard to maintain. Building anything on Windows is quite painful.

The GIL is another front here. By only allowing Python to execute a instruction at once, threads cannot be used to distribute pure python CPU intensive operations between cores. Better Python optimizations could in fact relief this by determining that function A is totally independent of function B, and allowing them to run in parallel; or even, they could build them into non-pythonic instructions if the code clearly is not making use of any Python magic. This could allow for the GIL to be released, and hence, parallelize much better.

Python & Rust together via WASM

This could solve great part of the problems if it works easy and simple. WebAssembly (WASM) was thought as a way to replace Javascript on browsers, but the neat thing is that creates code that can be run from any programming language and is independent of the CPU target.

I haven’t explored this myself, but if it can deliver what it promises, it means that you only need to build Rust code once and bundle the WASM. This should work on all CPUs and Python interpreters.

The problem I believe it is that the WASM loader for Python will need to be compiled for each combination of CPU, OS and Python interpreter. It’s far from perfect, but at least, it’s easier to get a small common library to support everything, and then other libraries or code to build on top of it. So this could relief some maintenance problems from other libraries by diverting that work onto WASM maintainers.

Other possible problem is that WASM will have it hard to do any stuff that it’s not strictly CPU computing. For example, if it has to manage sockets, files, communicate with the OS, etc. As WASM was designed to be run inside a browser, I expect that all OS communication would require a common API, and that will have some caveats for sure. While the tasks I mentioned before I expect them to be usable from WASM, things like OpenGL and directly communicating with a GPU will surely have a lack of support for long time.

What Rust Lacks

While most people will think that Rust needs to be easier to code, that it is a complex language that it requires a lot of human hours to get the code working, let me heavily disagree.

Rust is one of the most pleasant languages to code on when you have the expertise on the language. It is quite productive almost on the level of Python and very readable.

The problem is gaining this expertise. Takes way too much effort for newcomers, especially when they are already seasoned on dynamic-typed languages.

An easier way to get started in Rust

And I know that this has been said a lot by novice people, and it has been discussed ad-infinitum: We need a RustScript language.

For the sake of simplicity, I named RustScript to this hypothetical language. To my knowledge, this name is not used and RustScript does not exist, even if I sound like it does.

As I read about others proposing this, please keep reading as I already know more or less what has been proposed already and some of those discussions.

The main problem with learning Rust is the borrow-checking rules, (almost) everyone knows that. A RustScript language must have a garbage collector built in.

But the other problem that is not so talked about is the complexity of reading and understanding properly Rust code. Because people come in, try a few things, and the compiler keeps complaining everywhere, they don’t get to learn the basic stuff that would allow them to read code easily. These people will struggle even remembering if the type was f32, float or numeric.

A RustScript language must serve as a bootstrapping into Rust syntax and features of the language, while keeping the hard/puzzling stuff away. In this way, once someone is able to use RustScript easily, they will be able to learn proper Rust with a smaller learning curve, feeling familiar already, and knowing how the code should look like.

So it should change this learning curve:

Into something like this:

Here’s the problem: Rust takes months of learning to be minimally productive. Without knowing properly a lot of complex stuff, you can’t really do much with it, which becomes into frustration.

Some companies require 6 months of training to get productive inside. Do we really expect them also to increase that by another 6 months?

What it’s good about Python it’s that newcomers are productive from day zero. Rust doesn’t need to target this, but the current situation is way too bad and it’s hurting its success.

A lot of programming languages and changes have been proposed or even done but fail to solve this problem completely.

This hypothetical language must:

  • Include a Garbage Collector (GC) or any other solution that avoids requiring a borrow checker.
    Why? Removing this complexity is the main reason for RustScript to exist.
  • Have almost the same syntax as Rust, at least for the features they have in common.
    Why? Because if newcomers don’t learn the same syntax, then they aren’t doing any progress towards learning Rust.
  • Binary and Linker compatible with Rust; all libraries and tooling must work inside RustScript.
    Why? Having a complete different set of libraries would be a headache and it will require a complete different ecosystem. Newcomers should familiarize themselves with Rust libraries, not RustScript specific ones.
  • Rust sample code must be able to be machine-translated into RustScript, like how Python2 can be translated into Python3 using the 2to3 tool. (Some things like macro declarations might not work as they might not have a replacement in RustScript)
    Why? Documentation is key. Having a way to automatically translate your documentation into RustScript will make everyone’s life easier. I don’t want this guessing the API game that happens in PyQT.
  • Officially supported by the Rust team itself, and bundled with Rust when installing via RustUp.
    Why? People will install Rust via RustUp. Ideally, RustScript should be part of it, allowing for easy integration between both languages.

Almost any of these requirements alone is going to be hard to do. Getting a language that does everything needed with all the support… it’s not something I expect happening, ever.

I mean, Python has it easier. What I would ask to Python is way more realizable that what I’m asking here, and yet in 10 years there’s just slight changes in the right direction. With that in mind, I don’t expect Rust to ever have a proper RustScript, but if it happens, well, I would love to see it.

What would be even better is that RustScript were almost a superset of Rust, making Rust programs mostly valid in RustScript, with few exceptions such as macro creation. This would allow developers to incrementally change to Rust as they see fit, and face the borrow checker in small amounts, that are easy to digest. But anyway, having to declare a whole file or module as RustScript would still work, as it will allow devs to migrate file by file or module by module. That’s still better than having to choose between language X or Y for a full project.

Anyway, I’d better stop talking about this, as it’s not gonna happen, and it would require a full post (or several) anyways to describe such a language.

Proper REPL

Python is really good on it’s REPL, and a lot of tools make use of this. Rust REPL exist, but not officially supported, and they’re far from perfect.

A REPL is useful when doing ML and when trying out small things. The fact that Rust needs to compile everything, makes this quite useless as it needs boilerplate to work and every instruction takes time to get built interactively.

If Rust had a script language this would be simpler, as a REPL for scripting languages tends to be straightforward.

Simpler integration with C++ libraries

Given that both Rust and Python integrate only with C and not C++ would make anyone think that they are on the same level here; but no. Because Python’s OOP is quite similar to C++ and it’s magic can make for the missing parts (method overloading), in the end Python has way better integration with C++ than Rust.

There are a lot of ongoing efforts to make C++ integration easier in Rust, but I’m not that sure if they will get at any point something straightforward to use. There’s a lot of pressure on this and I expect it to get much, much better in the next years.

But still, the fact that Rust has strict rules on borrowing and C++ doesn’t, and C++ exceptions really don’t mix with anything else in Rust, it will make this hard to get right.

Maybe the solution is having a C++ compiler written in Rust, and make it part of the Cargo suite, so the sources can be copied inside the project and build the library for Rust, entirely using Rust. This might allow some extra insights and automation that makes things easier, but C++ is quite a beast nowadays, and having a compiler that supports the newest standards is a lot of work. This solution would also conflict with Linux distributions, as the same C++ library would need to be shipped twice in different versions, a standard one and a Rust-compatible one.

Lack of binary libraries and dynamic linking

All Rust dependencies currently rely on downloading and building the sources for each project. Because there so many dependencies, building a project takes a long time. And distributing our build means installing a big binary that contains everything inside. Linux distributions don’t like this.

Having pre-built libraries for common targets it would be nice, or if not a full build, maybe a half-way of some sort that contains the most complex part done, just requiring the final optimization stages for targeting the specific CPU; similar to what WASM is, *.pyc or the JVM. This would reduce building times by a huge amount and will make development more pleasant.

Dynamic linking is another point commonly overlooked. I believe it can be done in Rust but it’s not something that they explain on the regular books. It’s complex and tricky to do, where the regular approach is quite straightforward. This means that any update on any of your libraries require a full build and a full release of all your components.

If an automated way existed to do this in Cargo, even if it builds the libraries in some format that can’t be shared across different applications, it could already have some benefits from what we have. For example, the linking stage could take less time, as most of the time seems to be spent trying to glue everything together. Another possible benefit is that as it will produce N files instead of 1 (let’s say 10), if your application has a way to auto-update, it could update selectively the files needed, instead of re-downloading a full fat binary.

To get this to work across different applications, such as what Linux distributions do, the Rust compiler needs to have better standards and compatibility between builds, so if one library is built using rustc 1.50.0 and the application was built against 1.49.0, they need to work. I believe currently this doesn’t work well and there are no guarantees for binary compatibility across versions. (I might be wrong)

On devices where disk space and memory is constrained, having dynamic libraries shared across applications might help a lot fitting the different projects on such devices. Those might be microcontrollers or small computers. For our current desktop computers and phones, this isn’t a big deal.

The other reason why Linux distributions want these pieces separated is that when a library has a security patch, usually all it takes is to replace the library on the filesystem and you’re safe. With Rust applications you depend on each one of the maintainers of each project to update and release updated versions. Then, a security patch for an OS instead of being, say, 10MiB, it could be 2GiB because of the amount of projects that use the same library.

No officially supported libraries aside of std

In a past article Someone stop NodeJS package madness, please!!, I talked about how bad is the ecosystem in JavaScript. Because everyone does packages and there’s no control, there’s a lot of cross dependency hell.

This can happen to Rust as it has the same system. The difference is that Rust comes with “std”, which contains a lot of common tooling that prevents this from getting completely out of hand.

Python also has the same in PyPI, but turns out that the standard Python libraries cover a lot more functionality than “std”. So PyPI is quite saner than any other repository.

Rust has its reasons to have a thin std library, and probably it’s for the best. But something has to be done about the remaining common functionality that doesn’t cover.

There are lots of solutions. For example, having a second standard library which bundles all remaining common stuff (call it “extra_std” or whatever), then everyone building libraries will tend to depend on that one, instead of a myriad of different dependencies.

Another option is to promote specific libraries as “semi-official”, to point people to use these over other options if possible.

The main problem of having everyone upload and cross-depend between them is that these libraries might have just one maintainer, and that maintainer might move on and forget about these libraries forever; then you have a lot of programs and libraries depending on it unaware that it’s obsolete from long ago. Forking the library doesn’t solve the problem because no one has access to the original repo to say “deprecated, please use X”.

Another problem are security implications from doing this. You depend on a project that might have been audited on the past or never, but the new version is surely not audited. In which state is the code? Is it sound or it abuses unsafe to worrying levels? We’ll need to inspect it ourselves and we all know that most of us would never do that.

So if I were to fix this, I would say that a Rust committee with security expertise should select and promote which libraries are “common” and “sane enough”, then fork them under a slightly different name, do an audit, and always upload audited-only code. Having a group looking onto those forked libraries means that if the library is once deprecated they will correctly update the status and send people to the right replacement. If someone does a fork of a library and then that one is preferred, the security fork should then migrate and follow that fork, so everyone depending on it is smoothly migrated.

In this way, “serde” would have a fork called something like “serde-audited” or “rust-audit-group/serde”. Yes, it will be always a few versions behind, but it will be safer to depend on it than depending on upstream.

No introspection tooling in std

Python is heavy on introspection stuff and it’s super nice to automate stuff. Even Go has some introspection capabilities for their interfaces. Rust on the other hand needs to make use of macros, and the sad part is that there aren’t any officially supported macros that makes this more or less work. Even contributed packages are quite ugly to use.

Something that tends to be quite common in Python is iterating through the elements of a object/struct; their names and their values.

I would like to see a Derive macro in std to add methods that are able to list the names of the different fields, and standardize this for things like Serde. Because if using Serde is overkill for some program, then you have to cook these macros yourself.

The other problem is the lack of standard variadic types. So if I were to iterate through the values/content of each field, it becomes toilsome to do and inconvenient, because you need to know in advance which types you might receive and how, having to add boilerplate to support all of this.

The traits also lack some supertraits to be able to classify easily some variable types. So if you want a generic function that works against any integer, you need to figure out all the traits you need. When in reality, I would like to say that type T is “int-alike”.

Personal hate against f32 and f64 traits

This might be only me, but every time I add a float in Rust makes my life hard. The fact that it doesn’t support proper ordering and proper equality makes them unusable on lots of collection types (HashMaps, etc).

Yes, I know that these types don’t handle equality (due to imprecision) and comparing them is also tricky (due to NaN and friends). But, c’mon… can’t we have a “simple float”?

On some cases, like configs, decimal numbers are convenient. I wouldn’t mind using a type that is slower for those cases, that more or less handles equality (by having an epsilon inbuilt) and handles comparison (by having a strict ordering between NaN and Inf, or by disallowing it at all).

This is something that causes pain to me every time I use floats.

Why I think Rust will not replace Python

Take into account that I’m still learning Rust, I might have missed or be wrong on some stuff above. One year of practising on my own is not enough to have enough context for all of this, so take this article with a pinch of salt.

Rust is way too different to Python. I really would like Rust to replace my use on Python but seeing there are some irreconcilable differences makes me believe that this will never happen.

WASM might be able to bridge some gaps, and Diesel and other ORM might make Rust a better replacement of Python for REST APIs in the future.

On the general terms I don’t see a lot of people migrating from Python to Rust. The learning curve is too steep and for most of those replacements Go might be enough, and therefore people would skip Rust altogether. And this is sad, because Rust has a lot of potentials on lots of fronts, just requires more attention than it has.

I’m sad and angry because this isn’t the article I wanted to write. I would like to say that Rust will replace Python at some point, but if I’m realistic, that’s not going to happen. Ever.

References

https://blog.logrocket.com/rust-vs-python-why-rust-could-replace-python/

https://www.reddit.com/r/functionalprogramming/comments/kwgiof/why_do_you_think_data_scientists_prefer_python_to/glzce8e/?utm_source=share&utm_medium=web2x&context=3

Threading is not a magic wand for performance

I have a strong opinion for threading inside applications. Most of the time they are understood as a way of getting 100% of your CPU, but… things aren’t that simple.

From end users perspective

…the more cores in a CPU the better. Well, the problem as I stated on other posts is that most applications will use 1-16 threads. Once you go above 8 cores with SMT/Hyperthreading, the amount of applications making use of everyone of them gets quite low. And those applications aren’t going to use those all the time, probably on some operations only.

You could go overboard and get a 32 logical core CPU, that would allow to game, stream, video encode, compress files, run a database, browse, and compile, all at once. But seriously? (Yes, some video encoding will use all 32 threads, but it’s not as effective as you might think). More cores allow to more applications running in parallel at full speed, that’s for sure; but at some point the benefits diminish because it makes less and less sense, and most of the time you’re not going to have all those things at once; so your CPU stays unused.

Another thing to take into account is the TDP limit and cooling limits. Running expensive instructions like SIMD in all cores is likely going to go over the TDP design on the CPU, for which is going to throttle itself (Windows or other tools will not report this as throttling because it’s not thermal throttling, this is a TDP limit). In overclocking scenarios we could increase this TDP limit, sure. But this moves us to a different realm: can we cool that?

My new NH-D15, which is oversized for the 5800X, can keep it at 90ºC while having the Noctua adapters for low speed, at low noise. I have quite low tolerance for noise. But now, imagine a 5950X, which is double the cores. Moving that at full capacity will require removing the adapters and running the fans at full speed to keep things under control. Still, that chip will go over it’s TDP if SIMD instructions are run simultaneously on all cores. Moving up from there and increasing the TDP of the chip will mean that custom water cooling is the only way to keep things under control.

I’m getting a bit off-topic, but the point is that using too much CPU power will lead to tons of heat that needs to be dissipated or the CPU will start underperforming. TDP limit is a thing too. Keep that in mind.

Now we might think… well, over time applications will be using more threads as these CPUs become widespread. Heh, yes, but… no. Yes, because some applications (games) are traditionally single-threaded, and they will untap a huge amount of performance in almost every computer by adding just a few threads. But also no, because not all operations can be parallelized in threads (or processes, etc). And no, because threading is hard to do right. So I would expect quite a delay in years until most applications and games can make use of these CPUs. They will be most probably targeted to the low-end of cpu cores. If AMD keeps popularizing cheap 4 core CPUs and forces Intel to follow, we might see optimizations for 4-8 threads in 5-10 years on most places.

Why would they optimize for the lower end? Because threads are not free. For starters, an application that does a CPU intensive job in a single thread, compared to the same work in four threads, if it’s run in a single core CPU (no HT/SMT), the single thread one is going to end faster. Unless you have a clever way of enabling/disabling threads and changing its numbers, which is difficult in some cases, you’re better targeting the lower end (unless your application really benefits and requires the use of 100% of the CPU).

Why can’t we parallelize everything?

It depends on the task that the application is trying to perform.

Imagine your boss says you need to build a new PC from the parts, because this is time critical they assign you a team of 20 people to help you. With this team, you are expected to have the computer built 20 times faster than a single person would do.

Would that work out? I could imagine that 20 people will create more hassle than improve anything, and hence, the computer will be built even slower than if they assigned you alone to do it. Just having to talk to everyone requires a lot of time that it is not spent building the thing.

Now imagine that you’re asked to build not one, but 40 computers from parts and you also given a team of 5 people. Will this work? Surely, and you can even assign roles to each one so for example there’s one person mounting the CPU and RAM, and another building/preparing the case. That will lead to have the work finished before than doing it alone.

In programs a similar thing happens. In some requests there’s not much space for threads to work cooperatively on some types on problems. If what you ask are 10x the same task at the same time, it is simple to spread the load. But as we usually ask a single task, it depends on the internals of that task.

Any kind of compression is usually a hard problem to solve in parallel. This includes ZIP/RAR/GZIP compression (lossless archiving) as audio/video encoding (lossy). Compression basically works by avoiding repeating the same thing twice, so if we said that the previous image was black, and the new one is black, that should be avoided. But the problem is, the program doesn’t know that the image was black until it gets there. It’s hard for a program to send threads ahead and split this work. Still, it’s something that they do, but there are limitations.

Same applies in games. In order to calculate the next image or what the game would do next, it requires the current state. We can’t calculate future states based on a past state, it needs to be done one by one. Still, they can send threads to manage different aspects of the game; for example one thread might manage physics, another might manage enemies, other for lighting, etc. But there’s a limit on how many aspects you can split the work into.

There’s a non-stop research on those areas to improve parallelism. Physics engines might start learning how to split the computation into different chunks and then merge the simulation back. Video encoders are able to calculate ahead some of the work, for example motion estimation, or split the video in blocks to make them in parallel.

Still with all of this there are losses from parallelizing. These tasks are not perfectly isolated, they need to be directed and merged back, and that’s a cost that is simply not there if we use a single thread.

Because of these losses some developers might choose not to parallelize more, as it will hurt the performance in lower CPUs. If the application targets CPUs with minimal amount of cores, they might be forced to do that in order to keep the minimum requirements under control.

Diving into the details

Threads mostly require a shared memory between them to communicate, if not, how do we expect to give them work and retrieve the work done? Sure, there are exceptions, for example if the input comes from network or if the output goes directly to disk. But that’s atypical.

If two threads access the same piece of data it needs to be guarded to prevent concurrent accesses. This is not only because data consistency (as in databases), but mainly because it could be actual data corruption if it happens. Data can be mid-write when the read happens, or it also can cause divergence, when each thread sees different data for short periods.

These guards are usually Mutexes, which produce a lock both in software and in hardware level that prevents concurrent access. If a thread tries to access a locked Mutex it will usually block and wait for data.

They are expensive in CPU terms, we actually are burning several CPU cycles for those. Some libraries like Qt (at least the old versions) have a compile flag to enable or disable threading safety support (mutexes), because enabling them without need incurs in a performance penalty.

Threads themselves aren’t free either. Creating them requires memory and CPU. And having them causes a challenge into the OS scheduler: As the number of threads/processes increases, the number of context switches increases too. And a context switch is expensive for a CPU to do. This also increases the requirements on CPU Cache, pushing out data from registers, L1 and L2 cache.

You might think that this is “too technical”, “it doesn’t matter that much”, “there’s not that much difference in performance”. Well, turns out that the CPU registers are the fastest thing by a wide margin, followed by the L1 cache. A program running entirely on L1 cache will go 100x times faster than a program that needs to pull data from RAM for every single instruction. Yeah, RAM is super-fast, but the L1 Cache is like a warp-drive. CPU Registers are accessible on the same CPU clock, while a RAM access can take 240 cycles (depends on the CPU). So if the data requested produces a cache miss, your program will be halted 240 cycles. If it happens every single time, your program will be waiting 99.5% of the time. (This usually doesn’t happen so often as the CPU and compilers are smart, but trying to go all out on threads might cause similar scenarios if the access pattern appears random)

Don’t use too many threads!

Ideally what you need is the same amount of threads as logical cores you have. It’s not going to be faster if you reach already 100% of all cores with those.

Of course there are exceptions, and a good rule could be N+1 or N+2 threads. Sometimes threads need to wait, or are blocked (by mutexes?), so a few extra could help filling the gaps here and there, but the gains are minor.

Having a collection of 500 threads, each one waiting for a network request isn’t exactly efficient. First of all we might face mutex contention: With so many threads the probability of trying to access data at the same time raises almost exponentially. This is related to the Birthday problem.

The second problem is OS scheduler overhead. The OS doesn’t know which thread is ready now to do work, and hence, it will wake up threads almost randomly. And the thread probably will be just waking up, checking the mutex, and sleeping again. Here we wasted two context switches and a mutex check. This situation will keep repeating very fast, producing CPU heat that doesn’t produce any useful work.

We might be fooled thinking “I’m using 100% of the CPU, is as fast as it gets”. Wrong. It’s using 100% of the CPU, but mostly hogging resources and wasting CPU cycles… that might have been used better elsewhere by other application. We’re starving the system, causing a DoS-alike attack to the other programs that want to run on it.

When optimizing a program we need to think about CPU efficiency: How much work we can get done per CPU core cycle. And also get the right metrics: How much actual work was done per unit of time.

Threading is actually a trade-off. We’re paying with extra CPU cycles to be able to get the work parallelized and hopefully get it done faster. Don’t forget that! With every thread and every mutex, we’re wasting CPU resources.

Use non-blocking calls if possible

One of the reasons to spawn hundreds of threads is when the application is doing some I/O operation that blocks and makes the thread wait. For example, reading or writing to disk, network requests or maybe waiting for the GPU.

In case you’re not aware, there are blocking and non-blocking calls for most of these things. A blocking call is the traditional way, where you ask to read a file and the function blocks until the data is read and returns the data when finished. In a non-blocking call you’ll ask to read X amount of data and the function will return immediately without retrieving any data. In background the OS will fill the buffers with the data being read. Then, with another function (or a callback) you get the data that has been read. This allows your thread to do other things meanwhile.

The downside of non-blocking calls is that in most cases the flow of the program is broken into pieces in different places and no longer makes sense. To solve this there’s async programming, which will make your program look like it’s doing something serially but under the hood is switching to other tasks. This is also known as cooperative threads or green threads.

I haven’t used much async programming myself, just a bit in Python. Is one of the pending things I have to test in Rust. But it’s quite neat.

In zzping I just used non-blocking calls, because the design there is quite optimal for that type of approach and the code looks really neat.

You’ll be surprised on the amount of work that a single thread can do while using cooperative/async programming.

The key thing on async programming is that the task switching is done inside the application instead of the OS+hardware. This might sound worse, but in fact is better. Because the application is aware of which tasks are ready to be performed, switching is 100% efficient, there is no context switch that goes to an idle task. There are no context switching, because it’s a single thread.

In a single-thread CPU, async programming delivers a lot more work done than threads in these scenarios.

The final trick is to spawn one OS thread per CPU thread, each one using async programming to queue the tasks. This outperforms just threads by a wide margin, and keeps the computer responsive.

I was wrong on zzping rejecting threads

On zzping I initially designed it to remove all threads and use only non-blocking calls. Because I didn’t need the extra CPU performance, using non-blocking calls should suffice to do the pings at quite high speeds. I was both right and wrong at the same time.

From one side, it was correct because I could realize hundreds of pings per second with a single thread, doing all tasks required there. But when I tried to perfect it, it became clear that something was wrong.

At least in the Rust library I was using for sockets, doing a non-blocking recv was incurring on random waits that could not be predicted properly and accounted for. So the rate of pings was inconsistent and annoying.

It seems that if there’s something mid-way on the network card it might block, or maybe happens when there’s nothing left. I’m not sure. But the point is that this non-blocking call wasn’t exactly non-blocking.

So what I did in the end is to spawn a single OS thread to take care of receiving data from the network. The main thread is still taking care of sending data and computing everything else. Now the metrics are really accurate and does exactly what is asked for.

I don’t recall this happening to me in other programs, so it might be the Rust library that I’m using, or maybe it’s because this manages ICMP which is a bit special.

Avoid sharing memory at all costs!

A thread is optimally running when it works completely independent of others, and doesn’t know almost anything from other parts of the application. It has all the data it requires to work from the start, and it can spend a lot of time in its work alone.

It’s quite easy to fall for communicating with the shared memory whenever we like on the thread, locking it and so. The problem is that as I said, mutexes are expensive, lots of threads accessing the memory will create contention, and things will slow down a lot.

Threads require careful design. We should envision them in a producer-consumer fashion, where we have a queue of “tasks to do” where the threads can pick, and a queue of “tasks done” where the threads can output their results into. This reduces contention to a very low value. If an inter-communication is needed mid-way, think about caching some of the data on thread local memory for some time. Most of the time it’s not entirely needed to get always the latest value, but enough to get a quite recent value. A cache of just a few milliseconds (even only 1ms) might deliver real benefits while avoids working with completely stale data.

Even with these ideas, there might be contention. For example, if there are hundreds of threads, and the tasks to realize are quick, it’s possible that they end locking the queues very often, and the probability of contention raises. In these cases, think if it’s possible for a thread to work in batches: Retrieving N tasks at once and putting them onto local memory, then accumulate the output in local memory too, and output it to the shared memory in batches.

There are other possible tricks too, for example there might be specialized libraries for queues that can lock partially. The queues could be also sharded to reduce the amount of threads that can access the same data, and a thread can be used to move data from the main queue to the sharded ones and vice versa.

Closing thoughts

Threads are hard to do right. I did not cover the problems of debugging threads and unexpected behaviors that arise from them, as it’s a bit too much for this article.

I think that it’s quite clear that not all tasks satisfy this consumer-producer design, and some of them are hard to parallelize efficiently or at all. Expecting that all applications will use all cores at some point is naive. Threading is hard to do right and most programmers would avoid it if possible.

They also waste precious CPU resources. 2x threads don’t give 2x performance in return. While in some cases can reduce latency, they usually tend to increase it when abused.

Hope this was useful to get an understanding on why applications don’t use all CPU cores and the common challenges of doing it right. Let me know if I should cover something else with more detail.

Finally upgraded my computer to Ryzen 7 5800X

Waited for too long, and the performance jump is awesome. I was using an i7-920 up to last week, that is, the 1st generation of i7 that ever existed. The leap is so big that feels like I don’t deserve it, that I’m not doing a good use of it, because most of the time is idling.

The good thing about running Linux is that I can manage quite efficiently how the CPU is used and get the most of it. Running a CPU that was 12 years old wasn’t a problem for most tasks at all, and the system was performing properly on most cases. I was basically having problems on video compression scenarios (streaming, capturing, editing) and when compiling a full project in Rust from scratch. Everything else was more or less running smoothly.

After the stock problems on 5000 series processors, I had to wait even more, and I was expecting the 5900X to get cheaper, but didn’t happen on time, so I settled with the 5800X which should be good enough. And it is. No regrets on this.

The processor arrived on Saturday morning. I spent most of the day building it neatly and then, on the first boot attempt I ran on my first problem: No post, the motherboard is stuck on ‘0d’ code. Kind of expected; these motherboards require a BIOS update before running a 5000 series processor. Luckily for me, I was careful of buying a motherboard that can update the BIOS without CPU. So I followed the procedure, and after a few mistakes, finally it booted. Then I got a problem with some fans ramping too fast. Turns out that I connected these fans to a header that it’s meant for a pump (for water cooling), but the way I routed that cable made it almost impossible to change it; I would have needed to remove the motherboard entirely. I decided to disconnect the fans from the hub of the case and plug them directly, and got it fixed. Then a CPU fan wasn’t spinning, turns out that trying to hide the cables, I placed them too close to the fan and they were interfering. So another thing fixed.

Finally I got to the last problem. Grub2 (the bootloader) was freezing upon start. I did not buy any SSD/HDD for this PC, I wanted to migrate all devices from the old to the new computer. Because it does nothing, and doesn’t allow me to do anything on it, it was impossible to debug. It took me hours of trial and error, reinstalling Grub, nothing worked. As I noticed that a USB Stick with Ubuntu was successfully launching Grub and was working, I thought that maybe installing Ubuntu would, hopefully, fix the grub issue. And it did. From the new bootloader I can launch my old Debian installation without issue. The problem will be when I upgrade this Debian and tries to write the Grub again; I bet it will break.

The underlying problem is that this installation is borked from before; the SSD lacks a UEFI partition, and it is inside my HDD. This strange setup makes it quite hard to boot as it needs to jump around. At some point I would buy a new NVMe SSD and do a clean install on it. For now, I want to hold it, because I don’t need that (yet) and SSD are getting cheaper, so when it’s time I will get more speed and capacity for the same money. It’s working now, I don’t care much on fixing it. Maximum laziness.

Configuration of this new machine

PCPartPicker Part List: https://ie.pcpartpicker.com/list/p8pfGq

  • CPU: AMD Ryzen 7 5800X 3.8 GHz 8-Core Processor
  • CPU Cooler: Noctua NH-D15 CHROMAX.BLACK 82.52 CFM CPU Cooler
  • Motherboard: Gigabyte X570 AORUS MASTER ATX AM4 Motherboard
  • Memory: Corsair Vengeance LPX 64 GB (2 x 32 GB) DDR4-3600 CL18 Memory
  • Storage: Samsung 850 EVO-Series 1 TB 2.5″ Solid State Drive (migrated)
  • Storage: Western Digital Caviar Green 2 TB 3.5″ 5400RPM (migrated)
  • Video Card: EVGA GeForce GTX 1060 6GB GAMING Video Card (migrated)
  • Case: Fractal Design Meshify S2 ATX Mid Tower Case
  • Power Supply: Corsair HX Platinum 850W 80+ Platinum Fully Modular ATX

There’s no RGB at all on this build. While it looks cool, I feel that lights are a problem overnight, and running exclusively Linux it might be a problem controlling some of the RGB. So I don’t need the hassle. Dark is good, dark is simple.

On the cooler, I wondered about water cooling this, but seeing that the old lasted 12 years and I did zero maintenance and water cooling may require some, I went with air cooling. I like big fat coolers, as I want to keep noise as low as possible. I used the cables from Noctua for lowering the fan speed. This gives me around 32ºC idling, 40ºC browsing, 70ºC on video encoding, and some peaks at 90ºC when it goes all out. All of this without making almost any sound.

The case, I wanted something slightly big, that would keep things cool. After watching Gamers Nexus, I settled with the Meshify S2, as it is one of the best cases for airflow. I’m really happy with it, not only keeps things really cool and looks really good; it also is a pleasure to build in it.

The motherboard, I started looking at B550, but the connectivity was not something I liked. The case has a frontal USB-C which I liked, and most B550 boards lack this. So I went for something high-end in the X570. It’s still mostly unused, but over the years I can upgrade this over and over.

I’m aware that this generation might be the last one with AM4 socket, but anyways at some point I could be able to get a cheap deal on a better chip if needed. I also expect AMD to release a refresh of these CPUs as they did on the 3000 series. But the CPU is not something that I really expect to upgrade here; I’m more interested on PCI Express stuff that might pop up on the next years. PCIe v4 is opening the door to other cool stuff, like blazing fast drives; currently it makes almost no sense because SSDs are not fast enough to make proper use of it, but in a few years… we’ll see. (Yes, I’m aware of NVMe for these speeds, but on random reads they’re just too slow to make it worth)

The power supply is a bit oversized as well. Not only because it’s 850W (I wanted something that has quite a good extra margin in case I fit in a stupidly big GPU), but mainly because is 80+ Platinum certified. I found out later that this is actually removing heat of the system and making things cooler. Hah, who would have thought, having a better PSU actually makes the system to produce less heat.

On memory, I was using 16Gb before and I was having problems because I run too many services. As I like to tinker I end up with a lot of stuff over time (MySQL, Postgresql, Docker, …), and then I needed to go back and turn down some of these… just to end debugging a few weeks later why service X is not running, where it was, etc. So I thought, let’s put so much memory that I don’t need to turn down anything! Well… yes, so far it’s using less than 48Gb even counting disk caching.

The graphics card definitely needs an upgrade, but on the current situation… I’ll have to wait. What can I do? I refuse to pay a thousand euros for a GPU.

How fast is this 5800X

First thing I tried is to build zzping-gui from scratch. This is a Rust program I’m doing. Because it has a GUI, it has way too many dependencies and it was taking like 15 minutes to build on the old computer. And now? 48 seconds! I have also sccache set up, and with it is barely 26 seconds (from 5 minutes). Building a small change is 6 seconds (from 21 seconds). Now coding in Rust is quite a pleasure.

Next thing I tried is video capturing. Before I had to limit it to FullHD (1920×1080) and 30fps, or it would skip a lot of frames. Now I can do 1440p 60fps, and even go with slower settings on ffmpeg. Roughly 20% CPU used.

Kerbal Space Program tended to give me problems on big crafts and also on Aerodynamic FX. I had to mostly disable Aerodynamic FX or the game would go down to 10 fps when moving fast through the atmosphere. Not anymore. I ramped up all settings and I get a smooth 60fps no matter the craft. (Well, I haven’t yet tried to build a too crazy craft, but surely I tried crafts that were problematic before and it’s smooth)

Finally I tried some video editing. It’s quite pleasant now, and on some encodings the CPU can encode at 1x speed (taking as long as the video takes to play). On 1440p 60fps it’s more 1:2 ratio (taking twice). I still need to figure out why most of these encoders use only 70% of the CPU. Not entirely sure if it’s fixable.

One thing I noted is that YouTube is smoother now at 1440p and 4K resolutions. Before, they were quite good, but a bit of stutter appeared which I couldn’t explain. Now they’re buttery smooth, nice!

The bad thing is that I noticed now the GPU bottlenecks. If I run KSP fullscreen and I have a video playing on top (picture-in-picture), everything would stutter a lot unless I change the video to 720p or lower. Not an issue, as when using picture-in-picture I use a small frame that looks nice even at 480p. Basically the GPU can’t keep up with this, and I also noted that the GPU ramps up a lot the fans, making noise (And I hate it). So by upgrading the CPU I made the life of my GPU hard.

I will need a new GPU as I also plan to change the desk and add another monitor. It will be even harder to the 1060 to keep up with all this data. Hopefully by August things will settle and I will be able to get some good deal on a Radeon 6800.

On single-thread performance

Now I realize that 99% of the time the machine is under-utilized. Compiling, Video Editing… doesn’t matter. Most of the time is spent waiting for a few threads to complete. Yes, more cores would make these times smaller, but not by much.

Because there are 16 logical cores (threads), when they’re used, they’re so fast that they finish on record time, and what it’s left are a few tasks that cannot be further parallelized.

This means that to get better performance CPUs need to get better on single-thread. That’s why I returned the 3800X I bought by mistake and why I waited for the 5800X to be available. The 20% jump in single thread actually matters a lot.

It’s not about the performance of a single thread in a single core, don’t get me wrong. It’s about how fast 4-6 threads can go.

A 10% increase on single-thread performance will affect 100% of the load types we can do in the computer (unless the combination of load is above TDP). A 50% increase on multi-thread by adding extra cores seems to only affect 5% of the waiting times (if the CPU compared against has at least 8 cores). Unless you do a lot of video encoding, and the programs are fully optimized, multi-core performance is becoming less and less important given the amount of cores available to most people. Adding more is not going to have that much benefit.

Seems to me that the 5800X is the sweet spot for intensive use for hobbyists like me. Enough cores for video encoding and compiling relatively fast, yet with quite fast single thread performance for day to day use.

Sure, the 5950X would encode way faster. But am I uploading 4K videos? no. Am I uploading hour long ones? nope. So it would benefit me? nope.

In contrast, if there’s a new generation of CPUs that gain another 20% in single thread, that will make all programs run 20% faster. It will not be worth upgrading from a 5800X, but in 3 generations this will add up to 73% faster and definitely worth upgrading. At a rate of 2 years per generation, this is in 6 years.

If you have an old computer, but can wait 2 years more with it (not like me that I was holding onto a 12 year old computer), it might be worth to wait and see if DDR5 and AM5 socket appears. That will give you a path to upgrade the machine for lots of time. AM4 has lived 4 years already. If AM5 has the same run, it would mean that you could potentially upgrade the CPU one or two generations later.

For those that want to upgrade “now”, let me recommend again the 5600X. It’s an awesome CPU, with the main difference of being 30% slower on video encoding. For streaming, there shouldn’t be any difference. And waiting a video to encode from 10 minutes to 13 minutes isn’t going to be important at all.

And that’s it… for now. I’ll be back with more adventures on this new computer.

Actix and Rust unsafe

Here I go again writing about Unsafe Rust. Turns out that today I received a pingback to my article Actix-web is dead (about unsafe Rust) from Rust is a hard way to make a web API blog post. It also was featured in r/Rust reddit community! (Strangely, the post in reddit goes to the original author’s blog: Rust is a hard way to make a web API)

There, Tom wrote the following piece referring to my post:

Heck, if you ask some people, Rust is less secure than a GC’ed language for web apps if you use any crates that have unsafe code – which includes Actix, the most popular web framework, because unsafe code allows things like deferencing raw pointers.

https://macwright.com/2021/01/15/rust.html

It seems that there was a bit of a misunderstanding, because I don’t agree with this wording at all. If I would fix it to match better what I intended to say, I would write instead: “Rust is less secure than a GC’ed language for web apps if you use any crates that abuse unsafe code”.

But still this is overly simplistic, and it’s hard to put it in few words.

Let me try to summarize. About unsafe:

  • All Rust programs depend at some level on unsafe code. It’s near to impossible to get rid of it, as it is one of the basic building blocks of Rust. The standard library uses a lot of unsafe code (in small quantities, but in lots of places).
  • There are algorithms that require unsafe code to work efficiently, or to be practical (or both). For example, implementing a Linked List is quite a nightmare that to do it well you need unsafe. (See Learn Rust with Entirely Too Many Linked Lists)
  • The point is that the unsafe code portions of a crate should be as small as possible and easy to prove correct. If it can be avoided, it should be avoided; unless there’s a strong reason to not to do so.
  • Unsafe code blocks are not really unsafe. Most of the compiler guarantees still apply. It is close to regular C or C++ code in terms of safety, and most of us feel quite safe coding and running C++ programs.

Regarding on Actix and why it was a problem, first we need to understand that Actix is a web server, and it is intended to be exposed to the internet.

The HTTP protocol is really hard to implement right, completely, and error free. It’s not a simple protocol. An attacker could find a vulnerability leading to things that range from DoS to remote code execution.

In general we shouldn’t trust ANY web server. Anything that you put facing to the internet has to be proven to work properly. And here the issue is not about distrust on unsafe code (or C++ code), the issue is that a web server that is not widely used and scrutinized is potentially vulnerable to unknown attacks.

So for example, if you run NodeJS web servers, Python web servers, or anything similar, please consider removing them if they are reachable from the internet directly or indirectly (this is, proxying through Apache/Nginx/etc will still expose a handful of vulnerabilities). Instead, think about using another protocol to do this, like FastCGI, WSGI, or similar. Those protocols are simpler and harder to exploit vulnerabilities on them.

Actix in this case turns out that is so fast, that it even outperforms Nginx! So there’s no point on proxying through anything, you would lose so much performance in the way that it will make no sense. So Actix would be best as a web server directly exposed to the internet.

Also, Actix is new. There hasn’t been much time yet to carefully search for vulnerabilities. So we can expect that there might be something still hiding over there. It is still rapidly changing, so new bugs might appear at any point.

The problem with Actix was that the original author loved to use unsafe a lot. They quite enjoyed playing with Rust and unsafe. I am happy for them, but this is a recipe for disaster. Having more unsafe code than the bare minimum needed opens the door for unforeseen consequences.

The teams developing browsers like Firefox or Chromium are quite seasoned with C++ and they really know what they are doing, and they try to use all possible measures to reduce any memory related bugs, but even with that, it seems that 70% of the bugs of C++ applications are memory-related. (And Microsoft found this too)

I think this shows clearly why unsafe code should be minimized. But, does this mean that a Rust program with a tiny bit of unsafe code is less secure than a Python or NodeJS one? Nope.

Rust places a lot of restrictions on the code in such a way that the program is almost proven to be right, quite on the style of Haskell and other functional languages.

Having Actix fixed now, with unsafe code blocks reduced to the minimum, makes me more confident running it exposed to the internet than any Python/NodeJS server.

Rust has a lot of guarantees that just don’t exist in Python or Node. Also, the threading and async models impose proper restrictions to avoid programmers shooting themselves in the foot.

In case it wasn’t clear from the previous paragraphs, I wouldn’t put into such high standards to all parts required to do a web application. Still, unsafe needs to be minimized, but if doesn’t receive the user input, bugs will be harder to exploit.

Hope this explains my point of view on Actix and Unsafe. Also, I’m still learning Rust and this is just my humble opinion on the matter.

Thanks a lot to Tom MacWright who referenced to my article, it quite helps to see that my opinion is being read and taken into account.

Released new ping tool in Rust!

A lot of time has passed since my last post. To be sincere, these quarantines have ostracized me and haven’t keep up with almost anything, like hibernating waiting for this thing to go away. After almost a year seems I got some energy back to start writing and doing some other stuff.

I have been playing with Rust a lot. Played with several exercises and different things to get comfortable with it. And now I’m reaching a point where I see that Rust can actually be almost as fast to code as Python (there are still a lot of rough edges though).

In the meantime, during this WFH period, I noticed that my home network is kinda strange. I get some disconnections or weird behavior in anything that requires a real-time connection over the internet. For example, video calls tend to break up often, on-line games display random spikes of lag.

Because of this, I have been looking to ping my router and diagnose the problem. But the thing is, regular ping tools show more or less normal behavior, and to catch any packet loss I need a really aggressive ping that actually is really hard to see.

I searched for other ping tools that better suit this purpose, but what I found was basically paid stuff. It was hard to believe that there wasn’t any open source tool for this. So I thought that this is a good idea for a new Rust project.

And this is how zzping was born. This is a tool that features a daemon pinger that will ping a configured set of hosts at roughly 50 pings per second, store the data on disk and also sends it via UDP to a GUI.

After 1-2 weeks of waiting for approval from my employer to release this, I pushed the changes into my github:

https://github.com/deavid/zzping/

(Just note that even if Google is in the license, this is just a result of the review process. The only relationship between this project and Google is the fact that I was working on it while being employed by Google)

I thought that Rust would not have mature enough GUI libraries, so I played a bit with Python+Qt5. My idea was that Python could handle well enough the data size and Qt would be better than any Rust GUI. But after some trial and error, realized that Qt charting libraries were mostly for office use, like 100 points or static viewing.

As I wanted something that was able to display > 1000 points changing in real time, Qt was out of the question, and with this, Python was also out of the question as well. So I went to Rust Discord servers to ask for advice on a Rust GUI library for this.

Turns out that, obviously, there’s no GUI aside of FFI to GTK that is capable of graphing. But, as they quickly pointed out, Iced can paint into a Canvas quite well and that should do.

So I coded zzping-gui in Rust, and receiving the UDP events from the daemon, I could paint in real-time the ping timings and packet loss up to 10,000 lines in screen. Still it takes “too much” time to draw, to the point that I found it deceiving; I thought that would be faster. But after profiling, this seems to come from my own NVidia drivers drawing, therefore on the Vulkan side of things.

It’s possible that Iced is not optimized enough for this kind of stuff, or maybe (surely) I’m missing optimizations and caching. But I saw that it was fast enough and I moved on.

This is what it looks like when displaying real-time data:

real-time

It can only show one host at a time, and if restarted it loses the history.

Up to this point it’s what I released as 0.1 in the main branch. I’ve continued working in 0.2 in a beta branch.

A bit of trivia

Most people I talked about zzping they said that I surely used threads for the pinger. Wrong! In fact the first library I found was internally creating threads all over the place, so I looked at the sources and coded something similar myself but single-threaded. I purposefully removed the threading ability and instead used a non-blocking approach.

Why? Because it uses less CPU and it’s more efficient. But threads are more performant! Yes, but no. A threading model would allow me to push more pings per second, sure. But this misses the fact that a single thread in a 10 year old CPU can send over 1,000 pings per second or more, haven’t tested the limits.

And at those rates, one would think if our objective was to test the network or to cause a DoS attack and freeze any networking gear we’re trying to ping. It has near to zero value to send a ping hundreds of microseconds apart.

In contrast, threading has a cost. Yes, it does. Programs using threads use more CPU per unit of work done. Threading means that the CPU and OS scheduler have to do more task switches over time, and those switches aren’t exactly free. OS threads also have memory requirements, and have some CPU cost to initialize.

Going all for threads misses a big point here: zzping-daemon is an utility meant to run all the time in background, as a service. The computer that runs this might not have a lot of CPU, or it might be a gaming machine. Every tiny bit of CPU consumed maybe less FPS while gaming and might be a motivation to shut it down.

Therefore, removing threads is a better strategy to keep the CPU as free as possible and do as much work as possible with the absolute minimum CPU required. Rust also helps there, by optimizing the binary to the maximum.

On another topic, I went for UDP communication to the GUI because I wanted real-time and I preferred to drop packets if the connection between zzping-gui and zzping-daemon was flaky. But now I see this as a problem, as it’s connection-less and it’s not reliable, preparing for a next step when a GUI can subscribe and get the last hour to do a prefill is quite complicated. Therefore I’m thinking on going to TCP instead.

TCP has other problems, it might buffer, and it doesn’t display connection problems. But maybe I’m overthinking it, as this tool is thought over local networks, and they should be more or less stable. In any case, if there’s a problem, it should be solved when it appears, not before.

I have quite a hard time when designing how to store the data to disk. Even settling with storing statistics every 100ms instead of every single ping, turns out that this still can account for 50 messages per second, depending on config. And over a year, this is quite easily a lot of gigabytes.

MessagePack has been quite helpful. Is one of my favourite formats, being really compatible with JSON, flexible, really fast, and small. Here I realized that actually using this specification reduced the messages to a really small size (maybe a half by just not storing directly u32, but allowing MP to choose the smallest size).

I played a lot with compression techniques, but nothing was really helping. I settled with a log quantization that can bring files from 20MB/hour to 12MB/hour with an acceptable precision loss. Other techniques like Huffman, delta encoding or FFT quantizing, yielded negligible results while over-complicating the file format. I might at some point go back to them, as I probably overlook a lot of stuff that can be done.

This produced a new data format. I named the old FrameData, and the new FrameDataQ (quite original, hah). zzping-daemon still saves the old, and I wrote several utilities to read and transform it to the new one, which in turn is the one that the GUI can read.

Yeah, I forgot. zzping-gui in the beta branch can read a file if passed via command line options. This opens a completely new mode and refurbished graph:

Three Windows Synced

In the image above, there are three zzping-gui instances, each opening a different file for a different host.

This allows for zoom, and pan. There is also another way of zooming into the Y axis, I named this “scale factor” (sf) and changes the axis into a semi-logarithmic, depending on how you move the slider.

The tool also does some pre-aggregation at different zoom levels and does a seamless zoom transition. It’s quite interesting that it’s able to navigate millions of points in real-time.

And that’s it, for now. I have plans to make this better. But it’s taking time as the design is not quite clear yet.

https://github.com/deavid/zzping/