Lustre: LVM Metadata Snafu

It was supposed to be a simple memory upgrade for our lustre nodes, but of course they had something else in mind. I’ve got my suspicions as to how it happened, but the issue was one of the OSTs wasn’t mounting. Checking I find that LVM was showing all but one of the OSTs, its LVM metadata wasn’t registering for some reason. Well I’ve seen this before, just use the backup metadata to relabel the device. Not so fast…

The pvcreate command wasn’t working because it thought there was an existing partition table and reading up on LVM shows that when working with raw disk devices, there cannot be a partition table. Though the pvcreate manpage does provide the answer, use dd to write zeros to the first sector, thus clearing the table. It works. I was able to relabel the device, but again not so fast…

Not only was one device not labeled, but turned out that another device had its label swapped. So when I thought I had it fixed and tried mounting the lustre OSTs, one of them was not reconnecting and the clients seem to be oblivious that it was there. Checking /proc/fs/lustre/devices to see what IDs the mounted OSTs had told me what happened. I think that since the swapped OSTs were being mounted on their failover nodes, the clients were unsure as how to reconnect and thus the issue. Once I swapped the LVM metadata for those two devices, the clients were able to reconnect and everything came back online. This made for a very long night.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s