So much fun, so little time.

A few have noticed the site was down for an extended period this week.  I learned a few things this week.

I set up my FC system and was so excited to get it moving that I neglected to adequately test my equipment.  I bought used equipment, with used drives, and put real data on them after a whopping 2 days of light testing.  I never stress-tested the drives, didn’t do any kind of exercizing of them to validate that they were worthy of production data.

I also neglected to functionally test the array.  While it did offer the ability to configure a hot-spare, I didn’t check to see if the hot-spare was functional before I moved data over to it.  (Seeing that it was configured was enough for me)

So what happened was this.  I was running on the system and all was well until a drive failed.  The hot-spare didn’t invoke on it’s own, and while one drive was in a failed state, the second drive failed.  Needless to say I lost half my luns and three of them were corrupted beyond repair. 

Luckily I’m one of the old hold-outs.  I have a tape backup system consisting of a Veritas 6.0 environment with an ATL tape library.  I was able to restore to within 48 hours of the failure using tape.

My *NEW* storage back-end of consists of a Dell 2650 with 5x 146G drives.  I installed CentOS5 with a 512GB NFS-mount partition and mounted them to my VMWare servers.  The most interesting part is I realized that by bonding the network interfaces I’m getting the same bandwidth I got out of the 2x 1Gig fibrechannel ports.

Not being a network guy though, does anyone have any suggestions for optimising NFS for storage applications?

2 comments

    • on April 11, 2008 at 5:19 pm
    • Reply

    Geez, how much data is on the website?!? Or was it a loss of a lot of data, some being the website?

    If your hardware supports it you can use jumbo frames, but short of a database, it’s rarely worth the effort. NFS by itself should treat you right, I get really good peformance out of it.

    W.

  1. It’s not just this site for one, I run about 12-15 blog sites for different people, friends, family, etc. But this was but one of 10 luns on the box – of those 10 luns, my finance sql server, my exchange, my blackberry, the whole nine yards.

    Exchange came out of it working but with the vmdk file so corrupted I couldn’t migrate off the lun. I ended up having to build a new exchange server, move all of the mailboxes to it, and decomission the old one. The blackberry server, my front-end exchange server, and this webserver were all so totally distroyed I had to restore from backups. The blackberry and web-mail servers were windows, so even restoring from the backup was tricky and ended up being a rebuild instead of a recovery.

    This one was actually the easiest. Once I rebuilt the server, I recovered /etc/http, /www and /var/lib/mysql from backup, rebooted and it all just sort of fell together.

Leave a Reply

Your email address will not be published.