RAID - Arrays of Hard Disks

by Jimmy Martin

RAID stands for Redundant Array of Inexpensive Disks, or Redundant Array of Independent Disks, depending who you talk to. It means using several hard disk drives to gain speed, capacity, reliability or a combination of these benefits.

There are several types of RAID array. The simplest are RAID 0 and RAID 1.

Basic RAID arrays
Figure 1: Basic RAID Hard Disk Arrays

RAID 0 (stripe) - The data is striped across two hard disk drives. In other words, chunks of data are written alternately to the two drives. One drive holds chunks A,C,E, and the second drive holds chunks B,D,F. This speeds up disk access, as you can store and retrieve two chunks at once by accessing both drives at the same time. Unfortunately it also decreases reliability. If one drive fails, the other drive is worthless as it only contains half of the chunks of data.

RAID 1 (mirror) - The data is mirrored on two hard disks. In other words, the drives each have an identical copy of the data. Both drives hold data chunks A,B,C,D,E,F. This drastically improves reliability. If one drive fails, you can keep using the identical copy of the data on the second drive. You can then replace the faulty drive and copy all of the data onto it. However, this halves the storage capacity of the array, since you have two copies of everything.

RAID 5 (stripe + parity) - The data is striped across at least three disks, however, mathematical "parity" information is mixed in with the data. If a drive fails, the system can mathematically reconstruct the missing data using the existing data and the parity information. This keeps the system running until the faulty drive can be replaced. RAID 5 is a good combination of speed, reliability and storage capacity. However, it requires more expensive RAID controller hardware, as it must constantly calculate the parity information.

RAID 1+0 or 10 (striped mirrors) - This uses four hard disks. The data is striped, and each stripe is mirrored on two hard disks for a total of four disks. There is therefore a pair of RAID 1 arrays, each storing half of the data stripes. RAID 1+0 is more reliable than RAID 0+1, as there more scenarios where RAID 1+0 can survive two simultaneous disk failures.

There are more types of RAID, including 2,3,4 and 6, but they are not usually seen in web servers. As reliability is usually paramount, the most common types of raid used in web servers are RAID 1, 5 and 1+0. Which is chosen depends largely on the importance of reading speed, writing speed, cost and capacity in that particular server's role. RAID 1 is usually used in two-disk systems, and is sufficient for most low to medium performance web servers. RAID 5 requires three or more disks, while RAID 1+0 requires four.

In a server with hot-swappable hard disks, a faulty disk can be removed, a new one inserted and the required data copied over to it, all while the server is still running. This is obviously highly desirable.

Remember that RAID is no substitute for making backups. There are many scenarios which would involve all drives failing, for example a fire. Alternatively, all drives could function correctly but the data might be destroyed by hackers or a virus, so a backup would still be needed.