For most of us in modern times, our files are stored on a hard drive, and if the hard drive fails, our storage can go up in flames. For a company, such a situation must be avoided as much as possible, which can either lose the trust of customers or matter to the survival of the company. In this article, we’ll take a quick look at RAID techniques to improve fault tolerance and improve storage security.

I. Introduction to RAID

To prevent data loss (when a disk is faulty) and improve storage performance. We can combine two or more hard disks together, and this is a disk group. For someone who uses a computer, this disk array is a hard drive. Also known as RAID, we can recover or rebuild files through this disk array in case of a hard disk failure.

RAID is a Redundant Array of Independent Disks in Chinese. Redundant array of Independent disks is a storage technology that combines multiple disk drive components (usually multiple hard disks or partitions) into a single logical unit to provide higher storage performance and more reliable data backup than a single hard disk.

Introduction to common RAID Levels

Different types of disk arrays are called RAID Levels, which have different characteristics. As shown in the following figure, common RAID Levels include RAID0, RAID1, RAID5, and RAID10.

2.1 raid 0

So let’s start with RAID0, so we’re going to store files A and B on A RAID0 disk group, and in RAID0 we’re going to split files, so for example, we’re going to split files A into A1 and A2, and we’re going to store them on both existing disks. Ideally, if it takes T time to store A file A completely to disk, then the optimal time to store A for this RAID0 disk group in the graph is T/2.

Therefore, RAID0 improves storage performance and has the highest read/write performance among RAID Levels. However, if one of the disks in the group fails, the data is truly lost and cannot be recovered or rebuilt, that is, data redundancy is not provided. Because of this feature, most of the time, we won’t use it alone in a formal production environment.

2.2 raid 1

Therefore, in order to provide data redundancy and ensure data storage security, RAID1 is developed. As shown in the figure, in A RAID1 disk group, when we store A file, we will not only write to one disk, but also copy A complete file to another disk. That is, when we store file A, all disks in the disk group will have A copy of file A.

So if we have N total disk capacity, and we have two disks in the disk group, then the actual capacity we can use is N/2. RAID1 improves data storage security by providing data mirroring redundancy. However, this security comes at the expense of capacity, which is quite high.

2.3 RAID5

RAID0 and RAID1 take into account the storage performance and data redundancy respectively. Is there a solution that can guarantee the data storage security, storage performance, and storage capacity? So RAID5.

A RAID5 disk group requires at least three disks. Like RAID0, data is separated and stored on different disks. Unlike RAID0, data cannot be recovered or rebuilt upon a failure. If one of our hard drives fails, we can verify the information to recover the data, so some people also call RAID5 a distributed disk group.

As shown in the figure, A1+A2+A3=Ap. If A fault occurs on A1, data of A1 can be restored through AP-A2-A3. In this disk group, only one disk fails. If more than one disk fails, data cannot be recovered.

RAID5 does not have independent parity disks. All parity information is distributed on each disk in the disk group, occupying only the capacity of one disk. Therefore, for a RAID5 group, the disk capacity usage is (n-1) /N.

2.4 RAID10

RAID10 has a very interesting name, it’s just RAID1 plus RAID0, so to make it more intuitive, RAID10 is RAID1 plus 0. Officially known as the mirror array strip, it requires 4 + 2*N disk drives (N >=0) and ideally uses only half the capacity of the disk group.

It has the obvious disadvantage of losing at least 50% of its capacity. The advantages are quite obvious, as it provides twice the speed of single-disk storage, and during a single-disk failure recovery, the entire disk group still works, and the original data is synchronized back to the new disk.

Third, summary

In addition to the above mentioned RAID Levels, RAID3, RAID6 and other scheme Levels, interested students can go to the wiki to learn about.

In general, RAID technology meets the four major requirements of the operating system to use disk I/O to some extent.

  • Increase memory speed
  • Provides redundancy to improve storage security
  • Effectively improve disk utilization
  • Balance CPU, main storage, and disk read/write differences to improve the overall performance of the host