System administrators want the most reliable system that can possibly be had. In many cases hardware RAID controllers are either too expensive or simply unavailablef or a particular system.
There are many HOWTOs available on the Internet that describe several different schemes for utilizing Linux software RAID to provide mirroring of boot, root, and even other partitions. However, none of these have ever been robust enough to provide uninterrupted service under a great variety of circumstances. Most also were created using the 2.4.x series of Linux kernels, not taking advantage of new features available since the release of 2.6. For instance, most (if not all) proposed solutions involve mirroring partitions and treat a single mirrored partition as a usable standard-mode partition during the boot process.
Linux kernels in the 2.6.x generation have a new feature allowing the use of entire devices as components of a partitionable RAID-1 array. If one uses this new feature, then all data on the drive is mirrored at all times.
So the goal of this HOWTO is to put the partition table, and boot, root, and swap partitions on bootable RAID-1 mirror device. RAID device(s) should be assembled by the kernel before any filesystem is mounted.
Replace the <#> with the required number of partitions. Omit the number if the standard 4 is being used.
I suggest using the standard four partitions.
I suggest using something like the following, particularly if you’re only using two drives.
# mkreiserfs /dev/md_d0p2
# mkswap /dev/md_d0p3
# swapon /dev/md_d0p3
# pvcreate /dev/md_d0p4
# vgcreate vg /dev/md_d0p4
# lvcreate -L 8G -n usr vg
# lvcreate ...
# mkdir /mnt/gentoo/boot /mnt/gentoo/proc /mnt/gentoo/usr ...
# mount /dev/md_d0p1 /mnt/gentoo/boot
# mkswap /dev/md_d0p3; swapon /dev/md_d0p3
# vgchange -ay vg; mount /dev/vg/usr /mnt/gentoo/usr; ...
Of course, with Gentoo Linux you must build your own kernel. If you are not using Gentoo, you may be required to do the same. If you do, check the documentation for your distribution for details.
The kernel must meet these basic requirements:
To patch your kernel, obtain the patch file and follow these steps:
# patch -p1 < /path/to/patch/md-degraded-boot-2.6.12.1.diff
When installing packages, make sure that you install lvm2, mdadm (not raidtools), and grub.
Since you are using a 2.6 kernel, you will be using lvm version 2. So you need to make sure that the proper lvm tools package is installed.
Most users who are familiar with Linux software RAID are also familiar with raidtools. raidstart, raidstop, radhotadd, etc. are a time-honored tradition. However, the mdadm tool is a single that is much more powerful. Get used to it – you will love it.
In my testing, Lilo absolutely refused to be installed on a RAID-1 array. GRUB is the [only] way to go here.
In testing I have found that it is important to reboot using the installation CD. You may not be required to do so, but proceed at your own risk. Be aware that grub will not complain and no errors will be reported. However, the resulting array simply will not boot. This condition is not fatal, and simply rebooting off of the install CD and re-installing grub will do the trick.
Be aware, that if you use LVM on a RAID-1 array you must do one of the following when booting from the installation CD:
The reason for the latter is that the init system on the install CD may scan for and find the LVM partition type and auto-load the LVM module, activating your volume group. Unfortunately, the LVM module just sees two drives with identical volume groups and simply ignores one of them. If LVM is not shut down, you will not be able to start your RAID array!
After booting (and possibly cleaning up), start your RAID array and volume group:
# mdadm -A -ap /dev/md_d0 /dev/hda /dev/hdb
# modprobe dm_mod
# vgscan; vgchange -ay vg
Now you can remount all of the volumes and continue with your installation.
After chrooting into the /mnt/gentoo, install grub:
grub> root (hd0,0)
grub> install /boot/grub/stage1 (hd0) /boot/grub/stage2 p /boot/grub/menu.lst
grub> quit
# grub
grub> root (hd1,0)
grub> install /boot/grub/stage1 (hd1) /boot/grub/stage2 p /boot/grub/menu.lst
grub> quit
Notice that there are two distinct grub sessions. In testing I have found that you must install grub in two distinct steps.
Inside of your /boot/grub/menu.lst file, your boot configuration should look like this:
# By default, boot the first entry.
default 0
# For booting GNU/Linux title Gentoo
root (hd0,0)
kernel /bzImage-2.6.12 root=/dev/md_d0p2 ro md=d0,hda,hdb
Once that is done, you should be able to unmount all drives, reboot, and remove the installation CD. Congratulations!
Now you should test the system to make sure that all is working as expected.
After removing the installation CD and rebooting, grub should appear and happily boot your system. Of course, if you encounter a kernel panic, or system services start failing on boot you know there is a problem. Troubleshoot your initial install to get a working system.
Note that if grub reboots when loading itself, or if it freezes on phase 2, this is symptomatic of the installation of grub before rebooting (see section 6.2 above). Reboot from the installation CD, remount all devices, and re-install grub.
After booting up, scan the kernel output for status of your raid array(s):
You should see something like this:
md: raid0 personality registered as nr 2
md: raid1 personality registered as nr 3
md: raid5 personality registered as nr 4
md: md driver 0.90.1 MAX_MD_DEVS=256, MD_SB_DISKS=27
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
md: Loading md_d0: /dev/hda,/dev/hdb
md: bind<hda>
md: bind<hdb>
md: kicking non-fresh hda from array!
md: unbind<hda>
md: export_rdev(hda)
raid1: raid set md_d0 active with 1 out of 2 mirrors
md_d0: p1 p2 p3 p4
md_d0: p1 p2 p3 p4
Also, check the current run-time status of the raid module:
You should receive output similar to the following:
md_d0 : active raid1 hda[1] hdb[2]
39082560 blocks [2/1] [UU]
unused devices: <none>
The key words to look for are ’active’ and the ’[UU]’ section. Your array may report itself as ’reconstructing’ if it has not fully synchronized yet.
If you see an underscore in place of one of the ’U’s then one of your drives is not actively part of the array. See section 7.3.1 below on how to add the missing drive.
The first and easiest way to test your array is to do it via software. After making sure that you have a working system, you should reboot. When grub’s menu appears, hit ’e’ to edit the command line. Move down to the kernel line and hit ’e’ again to edit the kernel command line parameters. Change the text that reads ’md=d0,hda,hdb’ to ’md=d0,hdz,hdb’ (replacing your primary, or boot, drive with a non-existant drive designation). After hitting Enter to complete the editing session, press the ’b’ key to continue booting.
At this point the raid module built into the kernel will try to assemble your RAID-1 array using a non-existant drive and your secondary, or mirror, drive.
If the kernel panics because it can not mount the root drive, then the cause is almost certainly that your kernel is missing the md-degraded-boot patch (see section 4.2 above).
If the system boots and everything looks normal then you are well on your way!
Now, if you check your kernel logs (dmesg | grep md) you should see something similar to:
md: raid0 personality registered as nr 2
md: raid1 personality registered as nr 3
md: raid5 personality registered as nr 4
md: md driver 0.90.1 MAX_MD_DEVS=256, MD_SB_DISKS=27
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
md: Skipping unknown device name: hdz
md: Loading md_d0: /dev/hdz,/dev/hdb
md: bind<hdb>
raid1: raid set md_d0 active with 1 out of 2 mirrors
md_d0: p1 p2 p3 p4
md_d0: p1 p2 p3 p4
Notice that it skips the unknown device. An unpatched kernel will notice the unknown device and abort at that point.
Of course, checking the run-time status of the RAID module will show the array running in degraded mode using only one hard drive:
md_d0 : active raid1 hdb[1]
39082560 blocks [2/1] [_U]
unused devices: <none>
If you wish to add the missing device to the array and start a resynchronization, issue the mdadm command:
If you are going to continue testing with the hardware section below, then you probably do not want to start the resync yet.
Now comes the real test. Shut down your system after trying the software test above. Now uplug the primary drive. Now we will force the machine to load grub from the second hard drive, load the kernel, and reconstruct the degraded RAID-1 array.
Power up the machine. Both grub and the kernel should load without any noticable difference! In fact, the system should boot completely normally. The only differences will be a slight glitch in the kernel logs similar to the software test above. Checking the kernel logs for RAID references (dmesg | grep md) should return something like:
md: raid0 personality registered as nr 2
md: raid1 personality registered as nr 3
md: raid5 personality registered as nr 4
md: md driver 0.90.1 MAX_MD_DEVS=256, MD_SB_DISKS=27
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
md: Skipping unknown device name: hdz
md: Loading md_d0: /dev/hdz,/dev/hdb
md: bind<hdb>
raid1: raid set md_d0 active with 1 out of 2 mirrors
md_d0: p1 p2 p3 p4
md_d0: p1 p2 p3 p4
The run-time status will be identical to the software test above:
md_d0 : active raid1 hdb[1]
39082560 blocks [2/1] [_U]
unused devices: <none>
After shutting down and re-connecting your primary hard drive, reboot your system. You might think that the RAID array will now begin resynchronization. You might, then, be surprised that it, in fact, does not.
Checking the run-time status of the RAID module shows:
md_d0 : active raid1 hdb[1]
39082560 blocks [2/1] [_U]
unused devices: <none>
You must add the missing device back into the array using the following command:
Voila! Your array is now synchronizing.
The only drawbacks that are immediately obvious to this specific solution are: