Server won't boot when one RAID1 drive is removed

Question

I am fairly "green" when it comes to setting up RAIDs and LVM, but I can't work out why this setup doesn't work as intended. I have a server with two physical HDD's, upon which I'd like to setup things in a software RAID such that either drive can fail and the machine remain functional.

                                   +------+------------+
                                   | swap | / (root)   |
                                   | 5 GB | 113.5 ext4 |
                       +-----------+-------------------+
                       | /boot     | LVM               |
                       | 1 GB ext4 | 118.5 GB          |
+----------------------+-----------+-------------------+
| EFI system partition | RAID 1    | RAID 1            |
| 500 MB               | 1 GB      | 118.5 GB          |
+----------------------+-----------+-------------------+
| HDD (120 GB)                                         |
+------------------------------------------------------+

ie. both drives are configured identically, and the RAID's span both drives.

However, when I remove drive #1, the system boots to some state but tells me it cannot find my root LVM volume group and fails to come up fully. When I remove drive #2, the system cannot boot at all.

If more detail is needed I can provide it, but is there some fundamental design flaw with this configuration?

Is the EFI system partition also identical on both drives? What is the output of cat /proc/mdstat both when it's working and when you pull drive #1 (you might have to boot from rescue disk for it)? Also, did you pull drive #2 right after putting drive #1 back? the drives would need to rebuild before you can do that.. — DerfK, Commented Feb 8, 2017 at 22:58
The output of lsblk and fdisk -l would probably help us a bit. — Zoredache, Commented Feb 8, 2017 at 23:09
@DerfK Interestingly, /proc/mdstat doesn't exist. I configured this as part of Ubuntu's setup, and I assumed it was using mdadm. Clearly not 🤔 — obeattie, Commented Feb 8, 2017 at 23:47
@Zoredache Here's the output of those two while it's running with both drives gist.github.com/obeattie/8fed90cfa3dfb76ba5fcc24ccfe32f36 — obeattie, Commented Feb 8, 2017 at 23:47
The image is unclear. How do you have 2 drive RAID1 setup but only have one HDD in the ASCII image? I would create bios partition (either 1M "BIOS boot" partition with fdisk for old school BIOS boot or 512 MB EFI boot partition if you want to use UEFI) as the first partition for both disks and another "Linux RAID" partition for the rest of the disk (minus 10 MB of empty space at the end to allow easy replacement to almost identical disk from another manufacturer, sector counts may differ a bit between manufacturers). Then run grub-install for both drives and put LVM on of top mdraid. — Mikko Rantalainen, Commented Aug 22 at 7:56

Zoredache · Accepted Answer · 2017-02-09 00:00:44Z

So part of the problem, not booting at all when one of the drives is missing, probably means that bootloader didn't get configured on both drives.

Since this Ubuntu I think we can safely assume you are using grub? If so run the command dpkg-reconfigure grub-pc. Leave most of the options as-is, what we want to change is the GRUB install devices. Right now it probably has only /dev/sda selected. Make sure both /dev/sda, and /dev/sdb are both selected.

Next problem. From your output, it doesn't appear that your EFI partition is setup for any kind of RAID1. So you might need to manually sync data to the second EFI partition. I am not sure if you can setup a software based RAID1 for that.

I also don't have enough information to figure out why the LVM wouldn't be recognized with one disk removed. When one disk is removed both RAID1 volumes show up as active in the /proc/mdstat?

You can do a software RAID 1 for an ESP volume. It must be of metadata version 0.90, as version 1.0 and onward store their metadata at the beginning of the disk (making reading an underlying member on its own less than workable). Simply create a small RAID 1 with metadata version 0.90, format it as an ESP volume (vfat) and it will work just fine. The EFI will read from the first one it finds and ignore any others. This way we don't have to perform a manual sync every time the EFI files are updated (which is close to never anyways). — Spooler, Commented Feb 9, 2017 at 1:14

George Erhard · Accepted Answer · 2017-02-08 23:33:45Z

While it seems that a RAID 1 (mirror) setup should work with one drive missing, everything in this scheme depends on the method of RAID control in use.

There's two types of RAID from a controller standpoint - hardware-controlled, and software controlled. Hardware controlled RAID is driven from a RAID controller chipset in the computer, such that the drives are managed at a lower level than the OS itself (usually from a BIOS level driver, such as offered by LSI). Software-controlled RAID is managed by the operating system, usually a kernel-mode I/O driver that is set up to address two or more drives in tandem.

In the OP's case, there's a question that would need to be answered: In the system boot order/boot drive selection, is the RAID controller the only hard drive device chosen, or is one of the drives selected as the boot device? If the former, then the RAID setup should work, as all drive access goes thru the RAID controller before reaching the drives. The latter setup bypasses the RAID controller completely, and addresses the onboard controller on the drive itself... and this will break RAID.

I would recommend that the OP do the following if they are using a Hardware-based RAID controller scheme:

Boot into the BIOS, and ensure that Boot Order and Hard Drive selections do not address either of the RAID drives directly, only the RAID controller "virtual drive".
Go into the RAID controller setup (likely CTRL-C while its driver is being loaded on boot) and ensure that the drives are both selected as RAID drives and properly sync'd up.
Reformat the RAID 1 "drive" (technically a mirrored volume) and use it as the OS boot media.
Re-install the OS distribution such that it doesn't see two 'drives' but one 'volume' as its boot device.

IMO, software or OS-based RAID solutions are not recommended.

I guess this is very subjective, and does not apply to all "HW RAID" controllers. It's often referred to as FakeRAID for a reason. If you lose your RAID controller, you may not be able to recover unless having spare one. In such case open format of software RAID is a great advantage - any Linux should work with it. — Martian, Commented Feb 10, 2017 at 18:58

Stack Exchange Network

Server won't boot when one RAID1 drive is removed

2 Answers 2

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
ubuntu
lvm
mdadm
.

Hot Network Questions

Server won't boot when one RAID1 drive is removed

2 Answers 2

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged ubuntulvmmdadm.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
ubuntu
lvm
mdadm
.