- Работа с mdadm в Linux для организации RAID
- Установка mdadm
- Сборка RAID
- Подготовка носителей
- Создание рейда
- Создание файла mdadm.conf
- Создание файловой системы и монтирование массива
- Информация о RAID
- Linux how to raid
- Contents
- RAID levels
- Standard RAID levels
- Nested RAID levels
- RAID level comparison
- Implementation
- Which type of RAID do I have?
- Installation
- Prepare the devices
- Partition the devices
- GUID Partition Table
- Master Boot Record
- Build the array
- Update configuration file
- Assemble the array
- Format the RAID filesystem
- Calculating the stride and stripe width
- Mounting from a Live CD
- Installing Arch Linux on RAID
- Update configuration file
- Configure mkinitcpio
- Configure the boot loader
- Root device
- RAID0 layout
- RAID Maintenance
- Scrubbing
- General notes on scrubbing
- RAID1 and RAID10 notes on scrubbing
- Removing devices from an array
- Adding a new device to an array
- Increasing size of a RAID volume
- Change sync speed limits
- RAID5 performance
- Update RAID superblock
- Monitoring
- Watch mdstat
- Track IO with iotop
- Track IO with iostat
- Email notifications
- Troubleshooting
- Error: «kernel: ataX.00: revalidation failed»
- Start arrays read-only
- Recovering from a broken or missing drive in the raid
- Benchmarking
Работа с mdadm в Linux для организации RAID
mdadm — утилита для работы с программными RAID-массивами различных уровней. В данной инструкции рассмотрим примеры ее использования.
Установка mdadm
Утилита mdadm может быть установлена одной командой.
Если используем CentOS / Red Hat:
yum install mdadm
Если используем Ubuntu / Debian:
apt-get install mdadm
Сборка RAID
Перед сборкой, стоит подготовить наши носители. Затем можно приступать к созданию рейд-массива.
Подготовка носителей
Сначала необходимо занулить суперблоки на дисках, которые мы будем использовать для построения RAID (если диски ранее использовались, их суперблоки могут содержать служебную информацию о других RAID):
mdadm —zero-superblock —force /dev/sd
* в данном примере мы зануляем суперблоки для дисков sdb и sdc.
Если мы получили ответ:
mdadm: Unrecognised md component device — /dev/sdb
mdadm: Unrecognised md component device — /dev/sdc
. то значит, что диски не использовались ранее для RAID. Просто продолжаем настройку.
Далее нужно удалить старые метаданные и подпись на дисках:
wipefs —all —force /dev/sd
Создание рейда
Для сборки избыточного массива применяем следующую команду:
mdadm —create —verbose /dev/md0 -l 1 -n 2 /dev/sd
- /dev/md0 — устройство RAID, которое появится после сборки;
- -l 1 — уровень RAID;
- -n 2 — количество дисков, из которых собирается массив;
- /dev/sd — сборка выполняется из дисков sdb и sdc.
Мы должны увидеть что-то на подобие:
mdadm: Note: this array has metadata at the start and
may not be suitable as a boot device. If you plan to
store ‘/boot’ on this device please ensure that
your boot-loader understands md/v1.x metadata, or use
—metadata=0.90
mdadm: size set to 1046528K
Также система задаст контрольный вопрос, хотим ли мы продолжить и создать RAID — нужно ответить y:
Continue creating array? y
Мы увидим что-то на подобие:
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md0 started.
. и находим информацию о том, что у наших дисков sdb и sdc появился раздел md0, например:
.
sdb 8:16 0 2G 0 disk
??md0 9:0 0 2G 0 raid1
sdc 8:32 0 2G 0 disk
??md0 9:0 0 2G 0 raid1
.
* в примере мы видим собранный raid1 из дисков sdb и sdc.
Создание файла mdadm.conf
В файле mdadm.conf находится информация о RAID-массивах и компонентах, которые в них входят. Для его создания выполняем следующие команды:
echo «DEVICE partitions» > /etc/mdadm/mdadm.conf
mdadm —detail —scan —verbose | awk ‘/ARRAY/
DEVICE partitions
ARRAY /dev/md0 level=raid1 num-devices=2 metadata=1.2 name=proxy.dmosk.local:0 UUID=411f9848:0fae25f9:85736344:ff18e41d
* в данном примере хранится информация о массиве /dev/md0 — его уровень 1, он собирается из 2-х дисков.
Создание файловой системы и монтирование массива
Создание файловой системы для массива выполняется также, как для раздела:
* данной командой мы создаем на md0 файловую систему ext4.
Примонтировать раздел можно командой:
mount /dev/md0 /mnt
* в данном случае мы примонтировали наш массив в каталог /mnt.
Чтобы данный раздел также монтировался при загрузке системы, добавляем в fstab следующее:
/dev/md0 /mnt ext4 defaults 1 2
Для проверки правильности fstab, вводим:
Мы должны увидеть примонтированный раздел md, например:
/dev/md0 990M 2,6M 921M 1% /mnt
Информация о RAID
Посмотреть состояние всех RAID можно командой:
В ответ мы получим что-то на подобие:
md0 : active raid1 sdc[1] sdb[0]
1046528 blocks super 1.2 [2/2] [UU]
* где md0 — имя RAID устройства; raid1 sdc[1] sdb[0] — уровень избыточности и из каких дисков собран; 1046528 blocks — размер массива; [2/2] [UU] — количество юнитов, которые на данный момент используются.
** мы можем увидеть строку md0 : active(auto-read-only) — это означает, что после монтирования массива, он не использовался для записи.
Подробную информацию о конкретном массиве можно посмотреть командой:
* где /dev/md0 — имя RAID устройства.
Version : 1.2
Creation Time : Wed Mar 6 09:41:06 2019
Raid Level : raid1
Array Size : 1046528 (1022.00 MiB 1071.64 MB)
Used Dev Size : 1046528 (1022.00 MiB 1071.64 MB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent
Update Time : Wed Mar 6 09:41:26 2019
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Consistency Policy : resync
Name : proxy.dmosk.local:0 (local to host proxy.dmosk.local)
UUID : 304ad447:a04cda4a:90457d04:d9a4e884
Events : 17
Number Major Minor RaidDevice State
0 8 16 0 active sync /dev/sdb
1 8 32 1 active sync /dev/sdc
- Version — версия метаданных.
- Creation Time — дата в время создания массива.
- Raid Level — уровень RAID.
- Array Size — объем дискового пространства для RAID.
- Used Dev Size — используемый объем для устройств. Для каждого уровня будет индивидуальный расчет: RAID1 — равен половине общего размера дисков, RAID5 — равен размеру, используемому для контроля четности.
- Raid Devices — количество используемых устройств для RAID.
- Total Devices — количество добавленных в RAID устройств.
- Update Time — дата и время последнего изменения массива.
- State — текущее состояние. clean — все в порядке.
- Active Devices — количество работающих в массиве устройств.
- Working Devices — количество добавленных в массив устройств в рабочем состоянии.
- Failed Devices — количество сбойных устройств.
- Spare Devices — количество запасных устройств.
- Consistency Policy — политика согласованности активного массива (при неожиданном сбое). По умолчанию используется resync — полная ресинхронизация после восстановления. Также могут быть bitmap, journal, ppl.
- Name — имя компьютера.
- UUID — идентификатор для массива.
- Events — количество событий обновления.
- Chunk Size (для RAID5) — размер блока в килобайтах, который пишется на разные диски.
Подробнее про каждый параметр можно прочитать в мануале для mdadm:
Также, информацию о разделах и дисковом пространстве массива можно посмотреть командой fdisk:
Источник
Linux how to raid
Redundant Array of Independent Disks (RAID) is a storage technology that combines multiple disk drive components (typically disk drives or partitions thereof) into a logical unit. Depending on the RAID implementation, this logical unit can be a file system or an additional transparent layer that can hold several partitions. Data is distributed across the drives in one of several ways called #RAID levels, depending on the level of redundancy and performance required. The RAID level chosen can thus prevent data loss in the event of a hard disk failure, increase performance or be a combination of both.
This article explains how to create/manage a software RAID array using mdadm.
Contents
RAID levels
Despite redundancy implied by most RAID levels, RAID does not guarantee that data is safe. A RAID will not protect data if there is a fire, the computer is stolen or multiple hard drives fail at once. Furthermore, installing a system with RAID is a complex process that may destroy data.
Standard RAID levels
There are many different levels of RAID; listed below are the most common.
RAID 0 Uses striping to combine disks. Even though it does not provide redundancy, it is still considered RAID. It does, however, provide a big speed benefit. If the speed increase is worth the possibility of data loss (for swap partition for example), choose this RAID level. On a server, RAID 1 and RAID 5 arrays are more appropriate. The size of a RAID 0 array block device is the size of the smallest component partition times the number of component partitions. RAID 1 The most straightforward RAID level: straight mirroring. As with other RAID levels, it only makes sense if the partitions are on different physical disk drives. If one of those drives fails, the block device provided by the RAID array will continue to function as normal. The example will be using RAID 1 for everything except swap and temporary data. Please note that with a software implementation, the RAID 1 level is the only option for the boot partition, because bootloaders reading the boot partition do not understand RAID, but a RAID 1 component partition can be read as a normal partition. The size of a RAID 1 array block device is the size of the smallest component partition. RAID 5 Requires 3 or more physical drives, and provides the redundancy of RAID 1 combined with the speed and size benefits of RAID 0. RAID 5 uses striping, like RAID 0, but also stores parity blocks distributed across each member disk. In the event of a failed disk, these parity blocks are used to reconstruct the data on a replacement disk. RAID 5 can withstand the loss of one member disk.
Nested RAID levels
RAID level comparison
RAID level | Data redundancy | Physical drive utilization | Read performance | Write performance | Min drives |
---|---|---|---|---|---|
0 | No | 100% | nX |
Best; on par with RAID0 but redundant
* Where n is standing for the number of dedicated disks.
Implementation
The RAID devices can be managed in different ways:
Software RAID This is the easiest implementation as it does not rely on obscure proprietary firmware and software to be used. The array is managed by the operating system either by:
- by an abstraction layer (e.g. mdadm);
Which type of RAID do I have?
Since software RAID is implemented by the user, the type of RAID is easily known to the user.
However, discerning between FakeRAID and true hardware RAID can be more difficult. As stated, manufacturers often incorrectly distinguish these two types of RAID and false advertising is always possible. The best solution in this instance is to run the lspci command and looking through the output to find the RAID controller. Then do a search to see what information can be located about the RAID controller. Hardware RAID controllers appear in this list, but FakeRAID implementations do not. Also, true hardware RAID controller are often rather expensive, so if someone customized the system, then it is very likely that choosing a hardware RAID setup made a very noticeable change in the computer’s price.
Installation
Install mdadm . mdadm is used for administering pure software RAID using plain block devices: the underlying hardware does not provide any RAID logic, just a supply of disks. mdadm will work with any collection of block devices. Even if unusual. For example, one can thus make a RAID array from a collection of thumb drives.
Prepare the devices
If the device is being reused or re-purposed from an existing array, erase any old RAID configuration information:
or if a particular partition on a drive is to be deleted:
Partition the devices
It is highly recommended to partition the disks to be used in the array. Since most RAID users are selecting disk drives larger than 2 TiB, GPT is required and recommended. See Partitioning for more information on partitioning and the available partitioning tools.
GUID Partition Table
- After creating the partitions, their partition type GUIDs should be A19D880F-05FC-4D3B-A006-743F0F84911E (it can be assigned by selecting partition type Linux RAID in fdisk or FD00 in gdisk).
- If a larger disk array is employed, consider assigning filesystem labels or partition labels to make it easier to identify an individual disk later.
- Creating partitions that are of the same size on each of the devices is recommended.
Master Boot Record
For those creating partitions on HDDs with a MBR partition table, the partition types IDs available for use are:
- 0xDA for non-FS data ( Non-FS data in fdisk). This is the recommended mdadm partition type for RAID arrays on Arch Linux.
- 0xFD for RAID autodetect arrays ( Linux RAID autodetect in fdisk). This partition type should only be used if RAID autodetection is desireable (non-initramfs system, old mdadm metadata format).
Build the array
Use mdadm to build the array. See mdadm(8) for supported options. Several examples are given below.
The following example shows building a 2-device RAID1 array:
The following example shows building a RAID5 array with 4 active devices and 1 spare device:
The following example shows building a RAID10,far2 array with 2 devices:
The array is created under the virtual device /dev/mdX , assembled and ready to use (in degraded mode). One can directly start using it while mdadm resyncs the array in the background. It can take a long time to restore parity. Check the progress with:
Update configuration file
By default, most of mdadm.conf is commented out, and it contains just the following:
This directive tells mdadm to examine the devices referenced by /proc/partitions and assemble as many arrays as possible. This is fine if you really do want to start all available arrays and are confident that no unexpected superblocks will be found (such as after installing a new storage device). A more precise approach is to explicitly add the arrays to /etc/mdadm.conf :
This results in something like the following:
This also causes mdadm to examine the devices referenced by /proc/partitions . However, only devices that have superblocks with a UUID of 27664… are assembled in to active arrays.
See mdadm.conf(5) for more information.
Assemble the array
Once the configuration file has been updated the array can be assembled using mdadm:
Format the RAID filesystem
The array can now be formatted with a file system like any other partition, just keep in mind that:
- Due to the large volume size not all filesystems are suited (see: Wikipedia:Comparison of file systems#Limits).
- The filesystem should support growing and shrinking while online (see: Wikipedia:Comparison of file systems#Features).
- One should calculate the correct stride and stripe-width for optimal performance.
Calculating the stride and stripe width
Two parameters are required to optimise the filesystem structure to fit optimally within the underlying RAID structure: the stride and stripe width. These are derived from the RAID chunk size, the filesystem block size, and the number of «data disks».
The chunk size is a property of the RAID array, decided at the time of its creation. mdadm ‘s current default is 512 KiB. It can be found with mdadm :
The block size is a property of the filesystem, decided at its creation. The default for many filesystems, including ext4, is 4 KiB. See /etc/mke2fs.conf for details on ext4.
The number of «data disks» is the minimum number of devices in the array required to completely rebuild it without data loss. For example, this is N for a raid0 array of N devices and N-1 for raid5.
Once you have these three quantities, the stride and the stripe width can be calculated using the following formulas:
Example 1. RAID0
Example formatting to ext4 with the correct stripe width and stride:
- Hypothetical RAID0 array is composed of 2 physical disks.
- Chunk size is 512 KiB.
- Block size is 4 KiB.
stride = chunk size / block size. In this example, the math is 512/4 so the stride = 128.
stripe width = # of physical data disks * stride. In this example, the math is 2*128 so the stripe width = 256.
Example 2. RAID5
Example formatting to ext4 with the correct stripe width and stride:
- Hypothetical RAID5 array is composed of 4 physical disks; 3 data discs and 1 parity disc.
- Chunk size is 512 KiB.
- Block size is 4 KiB.
stride = chunk size / block size. In this example, the math is 512/4 so the stride = 128.
stripe width = # of physical data disks * stride. In this example, the math is 3*128 so the stripe width = 384.
For more on stride and stripe width, see: RAID Math.
Example 3. RAID10,far2
Example formatting to ext4 with the correct stripe width and stride:
- Hypothetical RAID10 array is composed of 2 physical disks. Because of the properties of RAID10 in far2 layout, both count as data disks.
- Chunk size is 512 KiB.
- Block size is 4 KiB.
stride = chunk size / block size. In this example, the math is 512/4 so the stride = 128.
stripe width = # of physical data disks * stride. In this example, the math is 2*128 so the stripe width = 256.
Mounting from a Live CD
Users wanting to mount the RAID partition from a Live CD, use:
If your RAID 1 that is missing a disk array was wrongly auto-detected as RAID 1 (as per mdadm —detail /dev/mdnumber ) and reported as inactive (as per cat /proc/mdstat ), stop the array first:
Installing Arch Linux on RAID
You should create the RAID array between the Partitioning and formatting steps of the Installation Procedure. Instead of directly formatting a partition to be your root file system, it will be created on a RAID array. Follow the section #Installation to create the RAID array. Then continue with the installation procedure until the pacstrap step is completed. When using UEFI boot, also read EFI system partition#ESP on software RAID1.
Update configuration file
After the base system is installed the default configuration file, mdadm.conf , must be updated like so:
Always check the mdadm.conf configuration file using a text editor after running this command to ensure that its contents look reasonable.
Continue with the installation procedure until you reach the step Installation guide#Initramfs, then follow the next section.
Configure mkinitcpio
Add mdadm_udev to the HOOKS section of the mkinitcpio.conf to add support for mdadm into the initramfs image:
If you use the mdadm_udev hook with a FakeRAID array, it is recommended to include mdmon in the BINARIES array:
Configure the boot loader
Root device
Point the root parameter to the mapped device. E.g.:
If booting from a software raid partition fails using the kernel device node method above, an alternative way is to use one of the methods from Persistent block device naming, for example:
RAID0 layout
Since version 5.3.4 of the Linux kernel, you need to explicitly tell the kernel which RAID0 layout should be used: RAID0_ORIG_LAYOUT ( 1 ) or RAID0_ALT_MULTIZONE_LAYOUT ( 2 ).[1] You can do this by providing the kernel parameter as follows:
The correct value depends upon the kernel version that was used to create the raid array: use 1 if created using kernel 3.14 or earlier, use 2 if using a more recent version of the kernel. One way to check this is to look at the creation time of the raid array:
Here we can see that this raid array was created on September 24, 2015. The release date of Linux Kernel 3.14 was March 30, 2014, and as such this raid array is most likely created using a multizone layout ( 2 ).
RAID Maintenance
Scrubbing
It is good practice to regularly run data scrubbing to check for and fix errors. Depending on the size/configuration of the array, a scrub may take multiple hours to complete.
To initiate a data scrub:
The check operation scans the drives for bad sectors and automatically repairs them. If it finds good sectors that contain bad data (the data in a sector does not agree with what the data from another disk indicates that it should be, for example the parity block + the other data blocks would cause us to think that this data block is incorrect), then no action is taken, but the event is logged (see below). This «do nothing» allows admins to inspect the data in the sector and the data that would be produced by rebuilding the sectors from redundant information and pick the correct data to keep.
As with many tasks/items relating to mdadm, the status of the scrub can be queried by reading /proc/mdstat .
To stop a currently running data scrub safely:
When the scrub is complete, admins may check how many blocks (if any) have been flagged as bad:
General notes on scrubbing
It is a good idea to set up a cron job as root to schedule a periodic scrub. See raid-check AUR which can assist with this. To perform a periodic scrub using systemd timers instead of cron, See raid-check-systemd AUR which contains the same script along with associated systemd timer unit files.
RAID1 and RAID10 notes on scrubbing
Due to the fact that RAID1 and RAID10 writes in the kernel are unbuffered, an array can have non-0 mismatch counts even when the array is healthy. These non-0 counts will only exist in transient data areas where they do not pose a problem. However, we cannot tell the difference between a non-0 count that is just in transient data or a non-0 count that signifies a real problem. This fact is a source of false positives for RAID1 and RAID10 arrays. It is however still recommended to scrub regularly in order to catch and correct any bad sectors that might be present in the devices.
Removing devices from an array
One can remove a device from the array after marking it as faulty:
Now remove it from the array:
Remove device permanently (for example, to use it individually from now on): Issue the two commands described above then:
Stop using an array:
- Umount target array
- Stop the array with: mdadm —stop /dev/md0
- Repeat the three command described in the beginning of this section on each device.
- Remove the corresponding line from /etc/mdadm.conf .
Adding a new device to an array
Adding new devices with mdadm can be done on a running system with the devices mounted. Partition the new device using the same layout as one of those already in the arrays as discussed above.
Assemble the RAID array if it is not already assembled:
Add the new device to the array:
This should not take long for mdadm to do.
Depending on the type of RAID (for example, with RAID1), mdadm may add the device as a spare without syncing data to it. You can increase the number of disks the RAID uses by using —grow with the —raid-devices option. For example, to increase an array to four disks:
You can check the progress with:
Check that the device has been added with the command:
This is because the above commands will add the new disk as a «spare» but RAID0 does not have spares. If you want to add a device to a RAID0 array, you need to «grow» and «add» in the same command, as demonstrated below:
Increasing size of a RAID volume
If larger disks are installed in a RAID array or partition size has been increased, it may be desirable to increase the size of the RAID volume to fill the larger available space. This process may be begun by first following the above sections pertaining to replacing disks. Once the RAID volume has been rebuilt onto the larger disks it must be «grown» to fill the space.
Next, partitions present on the RAID volume /dev/md0 may need to be resized. See Partitioning for details. Finally, the filesystem on the above mentioned partition will need to be resized. If partitioning was performed with gparted this will be done automatically. If other tools were used, unmount and then resize the filesystem manually.
Change sync speed limits
Syncing can take a while. If the machine is not needed for other tasks the speed limit can be increased.
In the above example, it would seem the max speed is limited to approximately 238 M/sec.
Check the current speed limit:
Set a new maximum speed of raid resyncing operations using sysctl:
Then check out the syncing speed and estimated finish time.
RAID5 performance
To improve RAID5 performance for fast storage (e.g. NVMe), increase /sys/block/mdx/md/group_thread_cnt to more threads. For example, to use 8 threads to operate on a RAID5 device:
Update RAID superblock
To update the RAID superblock, you need to first unmount the array and then stop the array with the following command:
Then you can update certain parameters by reassembling the array. For example, to update the homehost :
See the arguments of —update for details.
Monitoring
A simple one-liner that prints out the status of the RAID devices:
Watch mdstat
Or preferably using tmux
Track IO with iotop
The iotop package displays the input/output stats for processes. Use this command to view the IO for raid threads.
Track IO with iostat
The iostat utility from sysstat package displays the input/output statistics for devices and partitions.
Email notifications
mdadm provides the systemd service mdmonitor.service which can be useful for monitoring the health of your raid arrays and notifying you via email if anything goes wrong.
This service is special in that it cannot be manually activated like a regular service; mdadm will take care of activating it via udev upon assembling your arrays on system startup, but it will only do so if an email address has been configured for its notifications (see below).
To enable this functionality, edit /etc/mdadm.conf and define the email address:
Then, to verify that everything is working as it should, run the following command:
If the test is successful and the email is delivered, then you are done; the next time your arrays are reassembled, mdmonitor.service will begin monitoring them for errors.
Troubleshooting
If you are getting error when you reboot about «invalid raid superblock magic» and you have additional hard drives other than the ones you installed to, check that your hard drive order is correct. During installation, your RAID devices may be hdd, hde and hdf, but during boot they may be hda, hdb and hdc. Adjust your kernel line accordingly. This is what happened to me anyway.
Error: «kernel: ataX.00: revalidation failed»
If you suddenly (after reboot, changed BIOS settings) experience Error messages like:
Is does not necessarily mean that a drive is broken. You often find panic links on the web which go for the worst. In a word, No Panic. Maybe you just changed APIC or ACPI settings within your BIOS or Kernel parameters somehow. Change them back and you should be fine. Ordinarily, turning ACPI and/orACPI off should help.
Start arrays read-only
When an md array is started, the superblock will be written, and resync may begin. To start read-only set the kernel module md_mod parameter start_ro . When this is set, new arrays get an ‘auto-ro’ mode, which disables all internal io (superblock updates, resync, recovery) and is automatically switched to ‘rw’ when the first write request arrives.
To set the parameter at boot, add md_mod.start_ro=1 to your kernel line.
Or set it at module load time from /etc/modprobe.d/ file or from directly from /sys/ :
Recovering from a broken or missing drive in the raid
You might get the above mentioned error also when one of the drives breaks for whatever reason. In that case you will have to force the raid to still turn on even with one disk short. Type this (change where needed):
Now you should be able to mount it again with something like this (if you had it in fstab):
Now the raid should be working again and available to use, however with one disk short! So, to add that one disc partition it the way like described above in #Prepare the devices. Once that is done you can add the new disk to the raid by doing:
you probably see that the raid is now active and rebuilding.
You also might want to update your configuration (see: #Update configuration file).
Benchmarking
There are several tools for benchmarking a RAID. The most notable improvement is the speed increase when multiple threads are reading from the same RAID volume.
tiobench AUR [broken link: package not found] specifically benchmarks these performance improvements by measuring fully-threaded I/O on the disk.
bonnie++ tests database type access to one or more files, and creation, reading, and deleting of small files which can simulate the usage of programs such as Squid, INN, or Maildir format e-mail. The enclosed ZCAV program tests the performance of different zones of a hard drive without writing any data to the disk.
hdparm should not be used to benchmark a RAID, because it provides very inconsistent results.
Источник