It was supposed to be a simple memory upgrade for our lustre nodes, but of course they had something else in mind. I’ve got my suspicions as to how it happened, but the issue was one of the OSTs wasn’t mounting. Checking I find that LVM was showing all but one of the OSTs, its LVM metadata wasn’t registering for some reason. Well I’ve seen this before, just use the backup metadata to relabel the device. Not so fast…
The pvcreate command wasn’t working because it thought there was an existing partition table and reading up on LVM shows that when working with raw disk devices, there cannot be a partition table. Though the pvcreate manpage does provide the answer, use dd to write zeros to the first sector, thus clearing the table. It works. I was able to relabel the device, but again not so fast…
Not only was one device not labeled, but turned out that another device had its label swapped. So when I thought I had it fixed and tried mounting the lustre OSTs, one of them was not reconnecting and the clients seem to be oblivious that it was there. Checking
/proc/fs/lustre/devices to see what IDs the mounted OSTs had told me what happened. I think that since the swapped OSTs were being mounted on their failover nodes, the clients were unsure as how to reconnect and thus the issue. Once I swapped the LVM metadata for those two devices, the clients were able to reconnect and everything came back online. This made for a very long night.
Over the weekend, of course while I’m out of town on vacation, the lustre server decided to take a crap. I checked my email that morning to find notices of lustre clients unable to connect. I check the server to find it had reset itself(unsure exactly why but at the time there was a power failure, though the servers are fully redundant, not sure why one psu failure would cause the system to reset). It was the reset that seemed to cause the problem, lustre wasn’t mounting correctly, one of the OSTs was missing.
Part of the lustre installation is to setup the OSTs as LVM targets, my guess is to make it easier to pass the target from system to system as a simple scan will show the device. So why was one of the targets not showing up in the scan? Multipath was working and the multipath device was there, pvscan was not showing it as a listed physical volume. Luckily CentOS(well Redhat), has great documentation and I found this document to be of great help: http://www.centos.org/docs/5/html/Cluster_Logical_Volume_Manager/mdatarecover.html
Though different from that document, the lvs command was not reporting any errors, it simply just wasn’t showing the missing target. A nice feature of LVM is it keeps a backup of the data used in the creation of the LVM targets, which can be used to restore that information to the drive. I used vgcfgrestore to try to restore the data, now an error messasge saying a particular UUID was not found. Great, with that I can continue.
Using that UUID and the backed up LVM metadata, I used pvcreate to recreate the physical volume using the backed up metadata to restore that metadata to the drive. Now vgcfgrestore was able to find the device and restore it. Then used lvchange to bring it back online and was able to mount the device and get lustre working again.