Monitoring Lustre via Ganglia and Collectl

This here is just a rudimentary outline of how I setup ganglia to post stats gathered by collectl about Lustre.

All the information used is available from these two sources: Roy Dragseth’s page in the Rocks Clusters wiki page and Lustre Tutorial page from collectl’s website. First and foremost have ganglia, ganglia-gmond, ganglia-gmond-python, and collectl installed.

Edit /etc/collectl.conf and reconfigure the DaemonCommands with the following:  -f /var/log/collectl -r00:00,7 -m -F60 -sl -P –export lexpr,f=/tmp/L This gathers the Lustre information and saves it /tmp/L for reading by ganglia later. Now the important bit here is if this is a Lustre node or client determines the variable prefix exported by collectl. For the MDS, lusmds, OSTs, lusost, and for the clients, lusclt.

For an OST, collectl collects reads, writes, writekbs, and readkbs. For an MDS, its gattrP, sattrP, sync, and unlink. Copy the collectl.py and collectl.pyconf from the wiki page to their respective locations for ganglia and edit and replace the relevant entries. The posted files are setup to monitor Lustre stats on a client, so replace lusclt.reads with lusost.reads. Once those files are updated for that server’s needs, start up the collectl daemon and restart gmond. Wait a few minutes and the new stats should be graphed on server’s ganglia page.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s