The incredible amount of information that is constantly being generated and processed has brought in an increasing demand for greater storage and a need for faster reading and writing.

But have you ever wondered how storage can be increased dramatically while maintaining fast information exchange speeds?

Introducing RAID. In this tutorial you are going to learn about the fundamental ideas of the technology along with its types and implementations.

What is RAID?

A Redundant Array of Independent Disks, aims to combine multiple storage drives into a single operating unit so that it can increase storage space and improve efficiency by removing redundancy, which however, might not always be the case as redundancy is present in some RAID levels.

In general, this allows for a greater medium to be created that utilizes a number of generic disk drives (HDDs or SSDs).

How does RAID work?

There are many methods that are used in practice, but the three most renowned are striping, mirroring and parity. Striping refers to the splitting of data between different drives. Mirroring, on the other hand, relates to the copying of information to multiple drives, the main intention of which is to preserve it securely.

The final technique is parity also known as checksum, which is essentially a calculated value that can be utilized to rebuild data based on mathematical equations and algorithms. These 3 distinct methods each serve a valuable function when it comes to the stored information - separation, protection and recovery.

Types of RAID configurations

There are loads of variations of the unit sequence. However, we will be looking at levels 0 to 6 as well as 10 in more detail.

RAID 0 - Low Reliability & High Performance

RAID 0 employs striping to split data evenly across a couple of drives. Furthermore, this specific type does not provide information about parity, redundancy or fault tolerance, meaning that one unit failing in the array will lead to the entirety of the system failing.

Moreover, because the given data is divided between multiple disks, this system fault will result in total loss. Therefore, this version is most commonly used to increase the information exchange speed. In spite of this, however, there are some instances where using this particular type would actually result in slower speeds.

Ultimately, because of the no data redundancy feature, this level is primarily used in applications that can be susceptible to low reliability but require high performance.

RAID 1 - High Price, Double Protection

RAID 1 utilizes mirroring as it contains a full copy of the data on separate disks. In contrast to the previous type, this array would continue to be functional as long as there is at least one working unit. The main principle behind this type is to increment and accommodate higher read speeds rather than writing efficiency as the writing speed can be equalled to that of the slowest one amongst the ones in the chain.

Additionally, RAID 1 also has an exceptionally inflated data redundancy index given the fact the same information is stored across multiple devices. This, consequently, means that the price you have to pay for the added security is quite high considering that you would need to purchase double the amount of disks.

RAID 2 - High Cost, Disk Synchronization & Error Correction

While not usually utilized greatly in practice, RAID 2 still has its applications. It uses Hamming code to perform error correction. The working mechanism involves synchronizing the HDDs/SSDs to spin in an identical orientation, which enables extremely swift data exchange speeds due to the fact that striping is performed at the bit rather than the block level.

Additionally, this also requires a large number of drives, making it unfavorable to use in most situations.

RAID 3 & 4 - Long Sequences, No Concurrency

Another variation that is rarely used is RAID 3, which combines striping at the byte and a parity unit, rendering it capable of handling a long sequence of repeated read-write requests. However, this type cannot execute multiple requests simultaneously as data is not only spread amongst but also located at the exact place in both mediums.

Ultimately, the parity drive can be used to rebuild lost information in case one of the disks fails. RAID 4 is quite similar to this level, however, it uses bigger blocks for storage and presents better performance in relation to random reads.

RAID 5 - Distributed Parity, Most Common Option

The next level in the hierarchy, which is also the most commonly employed, is RAID 5, often described as distributed parity because the checksum is spread between the divergent units. Additionally, this type will operate even if one drive happens to fail as the data stored on the others has the ability to reconstruct what was lost.

Moreover, all chain participants can execute write requests, contributing to an increase in writing performance. The minimum device number requirement for this system type is 3 with additional units resulting in increased overall productivity.

RAID 6 - Extra Unit, Better Performance

RAID 6 builds upon the previous iteration as it utilizes one more additional parity drive in order to help rebuild faster, ultimately affecting performance in a positive manner. Furthermore, this level will work even if 2 instances happen to fail thanks to the extra unit. However, this additional block does create some issues regarding overall writing speed, given the constant calculations that need to be made.

RAID 10 - Monitor Each Drive

The final version, which we want to discuss, is RAID 10, which is a combined, or ‘nested’ variant, that implements both individual levels 1 and 0. In this particular level, the information is split amongst divergent drives and every unit is then mirrored. And while this does require a purchase of more drives, the work-rate is improved and the disks can proceed to function by utilizing the mirrored data, which is not in danger of being lost. The minimum number of disks to run this chain is 4.

Which level is the best?

While no ultimate answer can be given to this question, the fifth variation is most commonly used in practice. Each level has its own advantages and setbacks, mostly related to read/write data transfer speeds and unit overall unit cost and utilization preferences depend entirely on the tasks that the given system will need to perform on a daily or regular basis in order to increase productivity.