Raid is not a backup

Repeat with me: RAID are not backups.

Maybe you are one of those that say: I do not trust my drives, and a fire may break the raid, but then I have bigger issues.

That statement is simply false. Raid is redundancy (some sort of) and not related in any way to backups.

First of all, what is RAID for? RAID is meant to add redundancy when a disk says it has failed reading or (mostly) writing. It adds performance when reading and writing, because there are many disks available with potentially the same data. Proper RAID has a memory cache backed with a battery which gives a big performance boost because all writes goes to memory cache, and then writes become instant. If power fails, the battery will power the drives and save the changes before it runs out of charge. The other benefit of RAID is the possibility of replacing disks without even rebooting the computer.

The user hard drives, when they know the read or write has failed they try hard re-reading, re-writing or even reallocating sectors. They go mad with that because the manufacturer believes that if it fails, the data will be completely lost. So your computer will hang until the hard drive manages to solve the problem or gives up.

Server hard drives in contrast they just give up, and they communicate to the computer the failure. The computer needs to handle this. This avoids the server hanging up, and gives a predictable performance. Those hard drives are meant to be used with RAID.

RAID usually does not hold enough data to repair damaged blocks if it does not know which disk failed. (Unless you use RAID 6 or RAID 1 with 3 disks active or more)

Also, RAID replicates all changes instantaneously to all disks, meaning that a mistake is propagated and you have no access to any old data, never.

If you read what I wrote up to here carefully, a big alarm should sound inside in your head. There are lots of scenarios where actual data is lost!

Backups in contrast, they can handle almost of data failure scenarios. Main problem is, the backup strategy needs to be correct, and backups need to be tested from time to time.

Scenarios where RAID loses data (from common ones, to crazy ones):

  • Coworker deletes wrong folder by mistake. Information is lost in all disks.
  • Failure in the power line fries 2/3 of the hards disks on the computer. Irrecoverable data.
  • No one checked RAID status for 5 years and now computer doesn’t boot. Probably you have 1 hard disk that died and the rest have 15% data loss of more. You’ll need a company to retrieve data (starting at 1000 eur) and you’ll get 90% if lucky.
  • Thief stole the server. Data is stolen & lost.
  • Faulty hard drive wasn’t aware of being destroying every file it touches. Most data is lost.
  • Solar storm or magnetic discharges flipped some bits on the disk. All related blocks are lost and unaccessible.

But the funny thing here is, the most used RAID scenario has almost no advantages. I’m referring to using RAID 5 with 3 regular disks. As they are “regular disks”, they will hang on failures, and most of the time those will be automatically repaired. RAID doesn’t come into play here and you get the computer hanging sometimes. As the three disk receive exactly the same load, the three are expected to fail the same week (or even day). As the people having this kind of setup doesn’t care, they just wait until the computer fails, and by then all disks are almost dead. Also, as there are 3 disks, there is 300% chance of having a faulty disk or data loss. Other people switches disks when RAID says they’re dead, and upon rebuilding, all other disks die before dumping all data.

In short, it is expensive, it adds nothing, and common usage tends to be more dangerous than a single disk.

If you want a RAID really, you need a proper plan for it. You need a schedule for health checking, replace disks as soon as they start reporting anything strange, and the first years replace some to avoid having all the disks with the same wear.

If you want to survive a complete hard disk failure, you need RAID 6, which can recover even if one disk has failed completely and others have some problems. It’s not 100% safe though, there is a chance of data loss, but it’s small. For that you will need at least 5 drives.

If 5 drives and all those housekeeping sounds overkill to you, forget RAID and just plan a backup. If you don’t need to, in case of failure then you can just backup the important data to have the server running again in less than an hour. That way you can reuse the same backup drives to backup several servers and computers.

A cheap backup strategy, but not 100% recommended, is leaving a 2nd drive in the computer and using it just for storing daily backups. Then grab two external drives (USB for example) and have always one at home, one at the office. The one at the office should backup the data from the backup disk, say 2 times a week. Every week, swap the office disk with the home disk. If something goes really wrong you will have a week old backup in your home.

To make this backup follow the 3-2-1 rule, the external backups need to use a different medium and not a hard disk. Use tapes, they are cheap and highly reliable.

Other way to strengthen this is to use a cloud backup of that data as another layer. Depending on the volume you want to backup and your office connection, this may be feasible.

I strongly recommend having a backup in the same server in a dedicated disk. As hard drives suffer from wear, a dedicated disk usually survives much longer than the main disk. This may seem stupid, but this kind of backup is reliable in the sense you can’t forget to do it, because is automatic and local. Other backups may be forgotten or the Internet connection may be bad since weeks. Local hard drives almost always work properly.

Also, add checksums to the backups and validate them from time to time. This may show problems in your backup devices before is too late.

If you need a small downtime on server failure, buy a small server for disaster recovery. And have both in sync every few months. Whenever you need, you can swap them around. The small server doesn’t need to be as powerful and expensive, it just needs to be able to run the services at a reasonable speed. Then the downtime is minimal and people can work until the main server is repaired and reinstalled which can take days depending on the setup or the actual problem. This can be useful when the motherboard stops working.

And if you feel you need almost zero downtime, just go clustering, do a Master-Slave replication and a automatic fail-over setup. When the primary fails, the second takes over in seconds.

These strategies will always work better than a RAID for failures. And the basic ones are way less expensive.

Do you want to enforce data integrity? Use filesystems that allow block checksums or databases that do, and don’t forget to enable them.

Do you feel you will need to have the server running with same installation for more than 8 years? Then go RAID, buy an expensive hardware RAID card battery-backed which allows for hot-swap. Remember your housekeeping duties and don’t forget… you still need to do backups as well!!

Someone stop NodeJS package madness, please!!

This is getting way too far. Download the AngujarJS 2 example, try to build it, and you’ll see 300Mb in node_modules, and after that still does not run because some complex situation in the packages required or the already available packages on the machine. Only god knows.

For most of you that have been using NodeJS for many years this will not sound uncomfortable at all, in fact I know you’ll think it’s not that bad and is not that much. Nothing scary after all.

I have been working with NodeJS not so much, but several months already and I’m used to it. This is not my first language, I have more than 15 years of experience in programming in several languages being JavaScript one of them.  And for me something in NodeJS packages ecosystem feels very wrong. I’m starting to believe that people using it and thinking it’s okay to have 700Mb of node_modules in the project (and maybe gigabytes inside tmp/ folders) is having something called Stockholm syndrome.

NodeJS is a great tool, powerful and blazing fast. Package managers are a must here and well, nothing would have been possible without them.

But the thing here I would like to discuss is, what do we expect from the packages we install and what we do expect from the package managers. Let me start with my favorite one, apt-get. Let’s say I want to install a game, for example Warzone2100. Its dependencies in Debian Stretch are:

  • libc6, libfontconfig1, libfreetype6, libfribidi0, libgcc1, libgl1-mesa-glx, libgcl0, libglew2.0, libglu1-mesa, libminiunpc10, libogg0, libopenal1, libphysfs1, libpng16-16, libqt5core5a, libqt5gui5, libqt5script5, libqt5widgets5, libsdl2-2.0-0, libssl1.1, libstdc++6, libtheora0, libforbisfile3, libx11-6, libxrandr2, warzone2100-data, zlib1g. And warzone2100-music as recommended.

That’s a bunch of packages!! But well, this is a complex game, it has music, videos, 3D, multiplayer, etc. I can understand that amount. It is a final application, and it is not meant to be required by other packages as a library. And anyway, as you can see yourself here, all packages have or no version required or a “>=” sign indicating any newer version is good. So if I try to install this package in my Debian SID probably it will work.

Another thought here, these are the dependencies for the compiled binary, which of course due to the nature of C Dynamic linking, the libraries need to match. Even with that, there’s not much restriction here.

Let’s see what happens with one of the libraries required. To be more fair I’ll choose libqt5core5a, because Qt5 has also tons of features indeed. The dependencies (here) are:

  • libc6, libdouble-conversion1, libgcc1, libglib2.0-0, libicu57, libpre16-3, libstdc++6, zlib1g. And recommends the translations module.

From those dependencies, only four are really needed, as the rest were already requested by the game. That’s because those libraries are needed in almost any C++ application. And again, only “>=” versions are marked. So there’s not much a risk of having a dependency problem here.

What I do expect from APT packages? I expect them to have a small size, a reasonable dependency list, and not having exotic dependencies or complex version requirements (which also happens in Debian sometimes)

From the package manager, I expect it to work on the first time and I expect it to be fast. Apt still has a lot of room for improvement on speed, but gets the work done without any issues.

Now, let’s take just an Angular example, the Angular Hero. It has 18 dependencies and 35 dev dependencies. Most of them are relaxed. (Warzone has 27 and Qt5core just 7). Doesn’t look that bad, right?. Well, no. Do an npm install (twice because the first time it usually forgets something) and you get… 857 packages! Does look bad now? As those 53 packages started installing they started requiring more and more sub-libraries to get their work done.

Did I say there were 857 packages? I was lying. That’s just the first layer of node_modules. There are packages which requisites conflict with others so it needs the same package several times. For example for the package “async” there are requisites for versions 1.5.2, ^2.1.5 and ^0.9.0 to name a few. The result is 1420 packages installed on node_modules for a stupid Angular example. 337Mb of space used.

So talking again of what I do expect, I expect to get the dependencies installed and not a whole operating system in a folder. And I’m not exaggerating here, go and try to find an official Docker image that is more than 100Mb. Most operating system images are bellow that size (Usually in the 40-80Mb range).

Does this Angular Hero app handle processes? services? 3D? graphic-related? font rendering? Well, no. And also, most of the fancy things that any JS library would do is being actually done by the browser, so they (thankfully) don’t need actually rotate a font or have OpenGL drivers for example.

If you already have read up to this point I hope you understood that something is really wrong. This is a madness and someone has to do something to stop this.

The root cause for NodeJS package madness is that allows it. Let me rephrase it: NodeJS is so powerful that can handle easily a package madness. This is indeed a good thing, but over time has a bad impact.

This already happened in the real world several times. Change train steam engines to electrical and the contaminants will be down to 1% or less (I don’t know), but also makes the price cheaper and you get 200x of passengers per year so the effect in the long run is you’re contaminating more. Just because they can afford to travel, they do.

So here is the same, just because NodeJS allows managing a dependency hell easily, developers don’t care enough.

Ask a C developer (those that make the libraries and your operating systems) about breaking an API (or an ABI in their case), or requiring a bunch of fancy packages just because is cool and “I can finish earlier”, or maybe you could ask about leaving the project dependencies unattended for a while. They probably will say that your project will be dead before starting. That’s because their tools don’t allow that and they need to be careful on every decision or everything will break.

But well, if NodeJS manages properly, then why care at all? That’s the problem. We think we can continue on this behavior forever, and that’s false. NodeJS package ecosystem is already near its limits for this dependency madness.

Want proof? Click here:
https://trends.google.com/trends/explore?date=all&q=npm%20error,apt-get%20error

Compare also APT and NPM searches to get a grasp of popularity of both:
https://trends.google.com/trends/explore?date=all&q=npm,apt

“npm error” search term is skyrocketing, people is already having 4 times more problems with npm than all Debian distros combined. The common NodeJS developer has searched for answers to some kind of error lots of times. This means that errors are not the exception but the norm.

Ok, what can we do about this? Sadly, as a regular developer, we can’t do anything. I could ask for package maintainers to care more, but that is not going to change anything. We need drastic changes from NodeJS itself and some big company (like Google, Facebook, Microsoft,…) to play a role here.

Maybe NodeJS/npm could add some warnings about this situation, so developers get jaded of them and start updating their dependency list properly. Maybe the website could have some statistics of sanity checks so they act like eBay reputation, if your package doesn’t have enough reputation, other devs will stop using it.

Some big company could do a well maintained package suite, that avoid people from picking tons dependencies for stupid things and instead get one (or five) well maintained packages that do most anything is needed in any project, so the same dependency and version sanity replicates through all packages. It will be extra-nice if those base packages were officially supported by NodeJS and included on the main NodeJS installation.

Npm could be more deterministic, but for me, this issue already has an answer and it’s called yarn. It has been working so well for me, and I’m starting to forget the problems I got from npm itself. With it, NodeJS looks much better.

In conclusion, NodeJS is growing exponentially, and most of the problems come from that. I hope it starts maturing soon so we can get the robustness we all deserve. It is an awesome tool and this dependency madness is preventing people to use it in their projects.