Monday was another one of those days.  When will facilities people get it through their heads that

…maybe it isn’t a GREAT idea to test the generator during the day…

…maybe it’s something better done at night, on a weekend, or when the moon isn’t full…

…maybe it’s a good idea to let the IT people know you’re testing it…

…maybe it’s an even better idea to CLOSE THE BYPASS BREAKER before you start the test.

Monday at about 2pm the planets aligned in their universal task of making me work late.

17 hours later I left the site.

I’m pretty impressed.  We went from a quick-quiet datacenter to back up and running in about 10 hours.  A few more hours working out parts replacements… and all is golden.

Not bad.  Could have been better.  I hope so because in order to fix the Generator/UPS problem that caused the issue in the first place, they are going to have to take the power down again…

At least this time it will be graceful…I hope…I think they’ve scheduled it for the next full moon.

I will say this – of the vendors EMC was first on scene, and they had parts in tow before we even knew what we needed.

I’m suitably impressed.


  1. Love the new look !

    Infrastructure guys live on a different planet. At one site, the guys would test the generator once a month, no problem. Then one day, the power did go out, the generator started, ran for about five minutes and stopped. Turned out they’d used up all the diesel during the monthly tests and never thought to check the fuel levels and top that up. And it was a long blackout, like 40 minutes and then three days to fix the mess.

    Oh it gets *MUCH* worse.

    In this mess we discovered a few things.

    Optical SFP’s don’t like power outages. I’ve replaced like 4 since the outage.

    We’re also discovering which Tier-1 applications are worthy of such exorbitant expenditures such as, oh, dual HBA’s.

    One “Tier-1” application was down for an extra day because an HBA had failed some months ago and never got replaced, because the application was “too important” to take down for 20 minutes for a change.

    Unfortunately it turned out the other half was zoned to one of those Optical SFP’s I replaced yesterday. 😉

    And to top it all off, today at the DR site, we had a series of 1-minute blackouts…timed to occur precicely 15 seconds before I saved my work.

    Yes – same people do the facilities at both sites. You do the math. 😉

