So my cousin runs a small ISP in the Phoenix, Arizona area. Nothing special, a few hundred DSL and Dial-Up users, webmail, etc.Â
A couple of weeks ago, I was down visiting and he was wringing his hands over a technical “glitch.”  Apparently a drive in his raid set had failed and due to either a controller bug (rare but possible) or user error (much more common) the blank disk started rebuilding over the parity information.Â
He asked my advice – I told him easy. Let the raid set finish rebuilding and restore from your most recent full backup, then lay whatever incremental backups you have over that to bring it as close to POF (Point-of-Failure) as humanly possible. It’s not sophisticated enough of a system to expect that they would be doing any kind of transaction logging.
Apparently they weren’t doing backups at all. The feeling being that since they had their disks protected in a RAID configuration, that backups were a uselessly redundant exercise.
Let me explain why this is a bad idea – Garbage in – Garbage out.  RAID, whither it be Raid1 or Raid5 only gets you uptime. Because a corruption will replicate and spread from disk to disk before the user is aware there is a problem.
The same holds true for replication. If you’re replicating a harddrive offsite, a database corruption will replicate right along with the production data. The only exception being in cases of transactional replication. (such as Quest Software’s old “NetBase” product, which detects an invalid change and halts replciation before sending the change to the target system.)
So how to implement a backup? It’s easy. Follow the 2of3 rule (faster, cheaper, or smaller – pick any two) and you will have a backup solution that you can live with.
Or keep your resume spell-checked, because it’s not a matter of if, it’s a matter of when.
Recent Comments