- Linux Raid
- Contents
- Introduction
- Mailing list
- Help wanted
- Overview
- When Things Go Wrogn
- Areas Of Interest
- Hardware RAID
- Kernel Programming
- Archaeology
- External links
- What is RAID in Linux?
- RAID in Linux can be used to create logical volumes to ensure recovery from disk failures, backups, etc. RAID uses techniques such as mirroring and stripping.
- Working of RAID in Linux
- Benefits of RAID
- Standards required to setup RAID in Linux
- Scope of RAID
- RAID Configurations
- Hardware RAID
- Software RAID:
- RAID levels
- RAID 0:
- RAID 1:
- RAID 5:
- RAID 6:
- RAID 10:
- Conclusion:
Linux Raid
Contents
Introduction
This site is the Linux-raid kernel list community-managed reference for Linux software RAID as implemented in recent version 4 kernels and earlier. It should replace many of the unmaintained and out-of-date documents out there such as the Software RAID HOWTO and the Linux RAID FAQ.
Where possible, information should be tagged with the minimum kernel/software version required to use the feature. Some of the information on these pages are unfortunately quite old, but we are in the process of updating the info (aren’t we always. )
Do NOT use post-2019 WD Red drives in an array
Equally, do not use post-2020 desktop drives in an array
For the reason, read the section on Timeout Mismatch
Pretty much all these drives are shingled and are completely unsuitable for raid
Be warned that array creation and use may work fine.
The problem is that replacing a drive may be impossible when one fails!
Mailing list
Linux RAID issues are discussed in the linux-raid mailing list to be found at http://vger.kernel.org/vger-lists.html#linux-raid
This follows kernel.org conventions. You should use «reply to all» unless explicitly requested. Extraneous material should be trimmed. Replies should be in-line or at the bottom. (Please read posters’ .sig’s, as this may well say «I read the list — please do not cc me».)
And please use an email client that threads correctly!
Help wanted
This site was created by David Greaves and Nick Yeates. But life moved on and having tried to provide up-to-date info, the info became out of date again. Keld Simonsen updated a lot of the information, and made good ratings for Google.
As of September 2016 Wol is updating it to mdadm 3.3 and the 4.x kernels (mdadm 4.0 was released in January 2017). Please contact Wol, Keld or Nick if you want to help. Please read the editing guidelines.
As of June 2018, the main site is pretty up-to-date. There are, however, a lot of holes in the information available, any help you can give to the editor(s) would be gratefully received — offers of articles, case studies of raid implementations that aren’t covered at present, hardware to practice on to avoid corrupting live systems, etc etc.
Overview
[TODO: discuss layering things on top of raid, ie partitioning an array, LVM, or a btrfs filesystem]
The 2016 rewrite is not covering LVM (at the moment) so for LVM you will find all the old stuff in the archaeology section. Also all the performance data is 2011 vintage, so that has been relegated to the archaeology section too.
When Things Go Wrogn
Don’t panic, Mister Mainwaring!
RAID is very good at protecting your data. In fact, NEARLY ALL data lost as reported to the raid mailing list, is down to user error while attempting to recover a failed array.
In particular NEVER NEVER NEVER use «mdadm —create» on an already-existing array unless you are being guided by an expert. It is the single most effective way of turning a simple recovery exercise into a major forensic problem — it may not be quite as effective as «dd if=/dev/random of=/dev/sda», but it’s pretty close .
The simplest things are sometimes the best. If an array fails to start after a crash or reboot and you can’t get it to assemble, always try an «mdadm /dev/mdN —stop», and then try to assemble it again. Problems at boot often leave you with a partially assembled array that then refuses to do anything. A «stop» followed by an «assemble» can never do any harm, and may well fix the problem. Be very careful with «—force», though, as it may trigger a resync which could destroy the contents of a drive and make recovery difficult or impossible.
Also, make sure you are using the latest mdadm (see A guide to mdadm).
In addition to reading this, it is probably worth while going to the software archaeology section and reading «RAID Recovery» and «Recovering a failed software RAID». Just be aware these are old pages, and things may have changed. And that everything that is relevant in 2016 should have been copied into the above pages.
Areas Of Interest
Hardware RAID
Proper hardware RAID systems are presented to linux as a block device and there’s no coverage of them (yet) in this wiki.
BIOS / firmware RAID aka fake raid cards:
- offer a few performance benefits (like CPU, bus and RAM offloading), but may often be much slower than SW raid (link?)
- if the ‘raid’ card or motherboard dies then you often have to find an exact replacement and this can be tricky for older cards
- if drives move to other machines the data can’t easily be read
- there is usually no monitoring or reporting on the array — if a problem occurs then it may not show up unless the machine is rebooted *and* someone is actually watching the BIOS boot screen (or until multiple errors occur and your data is lost)
- you are entrusting your data to unpatchable software written into a BIOS that has probably not been tested, has no support mechanism and almost no community.
- having seen how many bugs the kernel works around in various BIOSes it would be optimistic to think that the BIOS RAID has no bugs.
Given the point of RAID is usually to reduce risk it is fair to say that using fakeraid is a terrible idea and it’s better to focus energy on either true HW raid or in-kernel SW raid . but there is nothing stopping you 🙂
Kernel Programming
This section is meant to be the home for a variety of things. With Neil Brown stepping down as maintainer (early 2016), the development process doesn’t seem to be quite so «robust». Not a surprise, the new maintainers need to gain the experience Neil had of the subsystem. So this section will house documentation about how the internals of the raid subsystem works.
RIP Shaohua Li. Shaohua Li (LWN) Shaohua took over from Neil, but sadly died over Christmas 2018. Jens Axboe has stepped up as temporary maintainer.
But documentation without specification isn’t much use. There was a philosophy (famously espoused by Microsoft, especially in their Office Open XML Specification) that «the code is the documentation» or «the code is the specification». This is great for coders — one of its features is that it eliminates all bugs at a stroke! If the code is the specification, then the system has to behave as specified. So this section will also house documentation about how the internals of the raid subsystem are supposed to work.
Then, of course, we want as many people helping with the system as possible. So this section will also contain a list of projects that people can do, and some advice on help for them on where to start. They needn’t be work on the kernel itself, or mdadm, there are utilities already out there (Phil’s lsdrv, Brad’s timeout script) and there are plenty more that would be appreciated.
Archaeology
This section is where all the old pages have been moved. Some of them may have been edited before being moved but the information here is mostly out-of-date, such as lilo, raidtools, etc. It may well be of interest to people running old systems, but shouldn’t be in the main section where it may confuse people.
External links
- Editing pages
- Wikipedia RAID including description of specific Linux RAID types
- Common RAID Disk Drive Format (DDF) standard specifying standard RAID levels from SNIA
- Kernel Newbies basic information about working on kernel
- The mathematics of RAID6
- FAQ about hardware/fake raid cards
- HW RAID support in Linux
- linux-raid mailing list archives
See Spam Blocks for the spam restrictions on this site.
Источник
What is RAID in Linux?
RAID in Linux can be used to create logical volumes to ensure recovery from disk failures, backups, etc. RAID uses techniques such as mirroring and stripping.
Join the DZone community and get the full member experience.
RAID stands for ‘Redundant Array of Inexpensive Disks.’ It is more commonly known as ‘Redundant Array of Independent Disks.’ It’s a pool of disks that are used to create a logical volume. It’s an essential method of saving or storing the same data through several hard disks to keep our data safe. This helps in situations such as disk failures, etc.
RAID is a technique of combining multiple partitions in separate disks into a single large device or virtual storage unit. These units are called RAID arrays. disk mirroring (RAID Level 1), disk striping (RAID Level 0), and parity are some examples of RAID techniques.
A RAID setup provides benefits such as redundancy, improved bandwidth, lower latency, and data recovery.
Several other heavy technologies are dependent on the Linux framework. For example, Docker is one such containerization technology that was originally built for the Linux platform. When you deploy applications on Docker at the production level and your application starts getting traction, you might want to adopt RAID architecture for the underlying host. Even for persistent storage using volumes in Docker, you can mount drives in your RAID architecture as Docker volumes. There are several such use-cases for RAID in Linux.
Working of RAID in Linux
RAID is made up of a series of arrays (set of disks). A RAID array is a collection of two or more disks joined to a RAID controller, forming a logical disk. Depending upon the configuration or setup called RAID level, the fault tolerance and availability of the disks may vary.
We can store and manage our data in a number of ways using RAID in Linux. It allows us to keep our data safe, accurate, and quickly accessible in a replicated manner. Hence, even if any or all of the drives corrupt or get crashed, due to data replication and backup, the device can still continue to function without any interruption or loss of data.
RAID works by storing the data on multiple disks, allowing balanced input/output or I/O operations to boost its performance. Since RAID makes use of multiple disks, it can increase the Mean Time Between Failures (MTBF) and improve fault tolerance by storing data redundantly.
A RAID array, in an Operating System (OS), appears as a single logical hard disk. Also, it generally uses disk mirroring or disk striping techniques. Mirroring works by copying similar data to several drives. Striping partitions each drive’s storage space into several units, varying from 512 bytes to many megabytes. The stripes of all the disks get interleaved and are treated coherently.
For example, consider a stand-alone system where huge records are kept, such as medical or other scientific data in the form of images. In such a case, the stripes are normally set as small as possible (e.g. 512 bytes). This is so that a single record can cover all the disks and be retrieved as easily as possible, by reading all the disks at the same point in time.
We can improve the RAID performance in a multiuser system by creating a large stripe that can hold large files, allowing overlapped disks across all the drives.
Benefits of RAID
There are several benefits of implementing RAID in Linux at various levels. The system administrator can determine and execute various levels of RAID based on the ITBM framework requirement. The following are the primary benefits:
- Redundancy: If one disk crashes, the data is duplicated on other disks, preventing data loss.
- Performance: By writing data to several disks, the overall data transfer rate can be increased.
- Convenience: Setting up RAID is simpler, and storage from several physical disks can be handled even though they are in different systems.
Standards required to setup RAID in Linux
Let’s discuss the fundamental skills required to set up a RAID array in Linux. Since RAID is a concept of implementation at the server-level, the system administrator or RAID implementer must have a thorough understanding of the server and its concepts such as —
- Controlling hard drive partitions in various RAID levels or logical volume management (LVM).
- ifconfig, IP, path, and several other networking configuration concepts.
- Netstat, traceroute, and other network debugging tools.
- ps, top, lsof, and other process management tools.
- Other services such as Apache, MySQL, DNS, DHCP, LDAP, IMAP, SMTP, FTP, etc.
Scope of RAID
Using RAID levels in our system, we can —
- Improve the efficiency of a single drive.
- Increase the speed of the system and reliability (in case of failure), depending on the configuration of RAID.
Even though nested RAID levels are more costly to implement than traditional levels (due to the higher number of disks and higher cost per GB), nested RAID is becoming more common, since it helps to solve some of the reliability issues that occur when we use standard RAID levels.
RAID Configurations
As a system administrator, you can set up and use two categories of RAIDs. These are —
Hardware RAID
Hardware RAID is implemented independently on the host. This means that you’ll have to spend extra cost on hardware to get it up and running. They are, of course, fast and they have their dedicated RAID controller, which is supplied by the PCI express card.
The hardware does not consume host resources as such and it optimally contributes to the NVRAM cache, which allows faster read and writes access.
In case of failures, the hardware saves the cache and rebuilds it with the backup capacity. Overall, hardware RAID is for a limited group of controllers and needs a major initial investment.
The following are some of the benefits of hardware RAID:
- Genuine performance: Since dedicated hardware does not use the host’s CPU cycles or disks, it increases the overall performance. They work at their peak level with no overhead provided enough caching is available to support them.
- RAID controllers: When it comes to the underlying disk structure, the RAID controller is used to provide abstraction. The operating system treats the entire range of hard disks as if it were a single storage unit. Since the OS deals with the RAID as a single hard disk drive, the OS doesn’t have to put much effort into handling it.
There are some limitations of using hardware RAID. These are —
- Vendor lock-in becomes a threat. If you want to switch hardware manufacturers, you might not be able to access your previous RAID control parameters.
- Another limitation is the expense of the initial setup.
Software RAID:
- Software RAID is resource-dependent on the host. This implies that they are slower than their hardware counterparts, which is understandable given that they do not have access to their collection of resources.
- In the case of software RAIDs, the operating system is responsible for everything.
Following are the main benefits of using software RAID —
- Open Source: The RAID software is open-source. This ensures that you can switch between Linux systems and be confident that they will still continue to operate. You can export a RAID configuration created in Ubuntu and use it later on in another system.
- Flexibility: You will have full control over how RAID works and its configurations. This is so because it has to be programmed in the operating system. As a result, you can make adjustments without modifying any hardware.
- Limited cost: You won’t have to spend a lot of money since no special hardware is needed.
Another form of RAID is a Hardware-assisted software RAID. It’s a firmware RAID, also known as a «fake RAID,» which can be found on motherboards or low-cost RAID cards.
Drawbacks of using this RAID include:
- Overhead of performance.
- RAID support is limited.
- Specific hardware equipment is required.
RAID levels
Standard RAID levels in computer storage are a basic collection of RAID configurations that use striping, mirroring, or parity techniques to construct massive, stable data stores from multiple general-purpose computer hard disk drives.
Some of the common RAID levels are —
RAID 0:
- RAID 0 is a disk configuration that allows you to strip data between two or more systems. Striping data entails dividing it into smaller chunks.
- They are written on each of the disk arrays after they have been broken. When it comes to sharing data for redundancy, the RAID 0 strategy is highly advantageous.
- Predominantly, the more disks you use, the better the RAID efficiency would be.
- The final disk size in RAID 0 is essentially the sum of the current disk drives.
RAID 1:
- In RAID 1, the data is mirrored between devices (two or more). As a result, the data is written to each of the group’s drives. In other words, each disk contains an exact copy of the same data.
- This method is advantageous for establishing redundancy. It is useful if your system has high risk of disk failures. This is so because if a system fails, the data from other working systems may be used to rebuild it.
RAID 5:
- RAID 5 uses techniques from both RAID 0 and RAID 1 in its setup.
- It strips data across devices but also ensures that the striped data is mirrored across the array.
- It checks the parity information using mathematical algorithms.
- Output gains, data restoration, and increased redundancy are among the several benefits of using a RAID 5 architecture. However, there are disadvantages too. Since RAID 5 is prone to slowdowns, it can affect write operations. If one of the array’s drives fails, it may result in a slew of penalties for the entire grid.
RAID 6:
- RAID 6, also known as double-parity RAID, is one of the few RAID schemes that improve performance by distributing data across multiple disks and authorizing input/output (I/O) operations to overlap in a balanced way.
- Data Redundancy is available in RAID 6.
RAID 10:
We have RAID 10, which can be described in two ways:
The highlighting features of RAID 10 are:
- A minimum of four disks is required.
- This is also called the «mirror stripe».
- Redundancy is excellent since blocks are mirrored.
- Outstanding performance due to Striping in blocks.
- It is the best choice for any critical applications if you can afford to spend on higher RAID levels (especially databases).
Conclusion:
In this article, we have discussed the concept of RAID in Linux. We started with a basic introduction to RAID and then moved on to discuss the functionality, working, types, and different RAID levels. You can even use RAID levels with architectures such as Docker containers, VMs, etc.
We sincerely hope that this article gave you a glimpse of what RAID in Linux is. Do let us know in the comments if you have any queries or suggestions.
Источник