Monday, December 20, 2010

Buggy Intel RAID

Recently I solved an very nasty problem in the Intel RAID controller. The problem took me and my colleagues many hours and hairs, so I decided to describe it here in hope it will help someone.

The Intel RAID controller was configured from the BIOS to have two disks in RAID1 configuration and all was good.
00:1f.2 RAID bus controller: Intel Corporation 82801 SATA RAID Controller
Some time ago we put in additional harddisk (hot-swappable SATA) that was part of some other RAID in the past (it is important!), the machine recognized it, we formatted it and started using the disk. All was good until the moment we had to reboot the machine.

It just would not boot. The screen looked like this:

The message was (transcribing for keywords):
Gave up waiting for root device.
Check rootdelay=
Check root=
ALERT does not exist. Dropping to a shell!

"dmraid -ay" would sometimes detect the root filesystem and sometimes (with different disk) not.

After a while an idea struck me - could be that the RAID controller in addition to CMOS-kept configuration tries to autodetect the RAID partitions that might exist on the new disks.

This was the answer. After I cleared both start and end of the new disks, the RAID signature on the disks would no longer confuse dmraid and prevent the kernel to find the real root file system.
dd if=/dev/zero bs=1000000 count=200 of=/dev/sdc
# For a 1TB disk. For different disk you need to calculate bs and seek accordingly.
dd if=/dev/zero bs=1000000000 seek=1000 of=/dev/sdc
If this story helped someone, write in the comments.

No comments: