Repeat with me: RAID are not backups.
Maybe you are one of those that say: I do not trust my drives, and a fire may break the raid, but then I have bigger issues.
That statement is simply false. Raid is redundancy (some sort of) and not related in any way to backups.
First of all, what is RAID for? RAID is meant to add redundancy when a disk says it has failed reading or (mostly) writing. It adds performance when reading and writing, because there are many disks available with potentially the same data. Proper RAID has a memory cache backed with a battery which gives a big performance boost because all writes goes to memory cache, and then writes become instant. If power fails, the battery will power the drives and save the changes before it runs out of charge. The other benefit of RAID is the possibility of replacing disks without even rebooting the computer.
The user hard drives, when they know the read or write has failed they try hard re-reading, re-writing or even reallocating sectors. They go mad with that because the manufacturer believes that if it fails, the data will be completely lost. So your computer will hang until the hard drive manages to solve the problem or gives up.
Server hard drives in contrast they just give up, and they communicate to the computer the failure. The computer needs to handle this. This avoids the server hanging up, and gives a predictable performance. Those hard drives are meant to be used with RAID.
RAID usually does not hold enough data to repair damaged blocks if it does not know which disk failed. (Unless you use RAID 6 or RAID 1 with 3 disks active or more)
Also, RAID replicates all changes instantaneously to all disks, meaning that a mistake is propagated and you have no access to any old data, never.
If you read what I wrote up to here carefully, a big alarm should sound inside in your head. There are lots of scenarios where actual data is lost!
Backups in contrast, they can handle almost of data failure scenarios. Main problem is, the backup strategy needs to be correct, and backups need to be tested from time to time.
Scenarios where RAID loses data (from common ones, to crazy ones):
- Coworker deletes wrong folder by mistake. Information is lost in all disks.
- Failure in the power line fries 2/3 of the hards disks on the computer. Irrecoverable data.
- No one checked RAID status for 5 years and now computer doesn’t boot. Probably you have 1 hard disk that died and the rest have 15% data loss of more. You’ll need a company to retrieve data (starting at 1000 eur) and you’ll get 90% if lucky.
- Thief stole the server. Data is stolen & lost.
- Faulty hard drive wasn’t aware of being destroying every file it touches. Most data is lost.
- Solar storm or magnetic discharges flipped some bits on the disk. All related blocks are lost and unaccessible.
But the funny thing here is, the most used RAID scenario has almost no advantages. I’m referring to using RAID 5 with 3 regular disks. As they are “regular disks”, they will hang on failures, and most of the time those will be automatically repaired. RAID doesn’t come into play here and you get the computer hanging sometimes. As the three disk receive exactly the same load, the three are expected to fail the same week (or even day). As the people having this kind of setup doesn’t care, they just wait until the computer fails, and by then all disks are almost dead. Also, as there are 3 disks, there is 300% chance of having a faulty disk or data loss. Other people switches disks when RAID says they’re dead, and upon rebuilding, all other disks die before dumping all data.
In short, it is expensive, it adds nothing, and common usage tends to be more dangerous than a single disk.
If you want a RAID really, you need a proper plan for it. You need a schedule for health checking, replace disks as soon as they start reporting anything strange, and the first years replace some to avoid having all the disks with the same wear.
If you want to survive a complete hard disk failure, you need RAID 6, which can recover even if one disk has failed completely and others have some problems. It’s not 100% safe though, there is a chance of data loss, but it’s small. For that you will need at least 5 drives.
If 5 drives and all those housekeeping sounds overkill to you, forget RAID and just plan a backup. If you don’t need to, in case of failure then you can just backup the important data to have the server running again in less than an hour. That way you can reuse the same backup drives to backup several servers and computers.
A cheap backup strategy, but not 100% recommended, is leaving a 2nd drive in the computer and using it just for storing daily backups. Then grab two external drives (USB for example) and have always one at home, one at the office. The one at the office should backup the data from the backup disk, say 2 times a week. Every week, swap the office disk with the home disk. If something goes really wrong you will have a week old backup in your home.
To make this backup follow the 3-2-1 rule, the external backups need to use a different medium and not a hard disk. Use tapes, they are cheap and highly reliable.
Other way to strengthen this is to use a cloud backup of that data as another layer. Depending on the volume you want to backup and your office connection, this may be feasible.
I strongly recommend having a backup in the same server in a dedicated disk. As hard drives suffer from wear, a dedicated disk usually survives much longer than the main disk. This may seem stupid, but this kind of backup is reliable in the sense you can’t forget to do it, because is automatic and local. Other backups may be forgotten or the Internet connection may be bad since weeks. Local hard drives almost always work properly.
Also, add checksums to the backups and validate them from time to time. This may show problems in your backup devices before is too late.
If you need a small downtime on server failure, buy a small server for disaster recovery. And have both in sync every few months. Whenever you need, you can swap them around. The small server doesn’t need to be as powerful and expensive, it just needs to be able to run the services at a reasonable speed. Then the downtime is minimal and people can work until the main server is repaired and reinstalled which can take days depending on the setup or the actual problem. This can be useful when the motherboard stops working.
And if you feel you need almost zero downtime, just go clustering, do a Master-Slave replication and a automatic fail-over setup. When the primary fails, the second takes over in seconds.
These strategies will always work better than a RAID for failures. And the basic ones are way less expensive.
Do you want to enforce data integrity? Use filesystems that allow block checksums or databases that do, and don’t forget to enable them.
Do you feel you will need to have the server running with same installation for more than 8 years? Then go RAID, buy an expensive hardware RAID card battery-backed which allows for hot-swap. Remember your housekeeping duties and don’t forget… you still need to do backups as well!!