Backup Vs. Archive

The fundamental difference between BACKUP and ARCHIVE.

A backup is there to help you deal with a crisis such as “My datacenter is a smoking hole in the ground now what do I do?” or something not quite as dramatic like “A virus ate my data.”  You recover from the backup to the last known good and all is happy, right?  Well except for the two or three days that might have gone since your last good backup…  (Was in one lawfirm that lost a drive only to find out their backups hadn’t been running for two months.. came back two weeks later to find a COMPLETE change in personnel had gone on while I was gone – lawyers are not very forgiving when they lose two months worth of email.)

An archive is data that, while not “Active” still might be required on a day-to-day basis.  Film / Video / Image archives are a good candidate for and example of that.

So on a disk-based archive you have some platform, ostensibly EMC/Legato DiskExtender or Rainfinity or something along those lines – that will move the data from “Active” storage to “Archive” storage.  In some applications you can even set up a true HSM, moving data that hasn’t been accessed to Tier-2(Enterprise SATA) and even Tier-3(yes, tape) as it ages, only to be recalled to Tier-1 when it’s accessed.

More often than not I’m brought face to face with people who don’t understand that very subtle difference.  One of my recent customers is actually doing it appropriately, using DX and a smallish Centerra to archive data that, while retention is required, is almost never actually accessed.

Then there are the people who use backup technology for archival purposes.

I’m pretty “old school” when it comes down to it.

Tape is for backup.  Tape is *NOT* supposed to be used as nearline storage when there are equally inexpensive (and more reliable) disk methods out there.

My main complaint about tape as archive: You don’t know if it’s bad until you try to read it.  And time you read it the simple act of moving the tape into a tape drive that was manufactured under less than ideal conditions means you are putting your data at risk.

Spending millions of dollars on a new Room-Sized tape library doesn’t make sense when Centerra storage is fairly inexpensive *AND* provides redundancy of the data automatically.

Spending more millions of dollars on three of them is lunacy when one EMC Atmos set up could provide redundancy and a single namespace for recall.  (and if you go whole hog, geographically relevant retrieval is an option to, so you automatically get it from the closest copy.)

It pains me to see it done wrong.  Especially when it involves trying to shoe-horn two more STK monsters into an already cramped datacenter when the work of it could be done in a couple of floor-tiles of spinning disks.


Skip to comment form

    • Jesse on September 15, 2009 at 12:07 pm
    • Reply

    Apologies in advance if this post is rambling, I wrote it at 2am apparently after my brain had started shutting down for the night.


    • william bishop on September 15, 2009 at 6:51 pm
    • Reply

    To me it really depends on what you have. I have an IBM tape frame that can stream at around 300 meg a second and put nearly 2TB of data on a 140 dollar tape. For seldom retrieved data, keeping a copy in the frame, as well as having a second copy sent off to a vault makes perfect sense.

    Personally, I feel like centera is an evil POS, but I will agree that disk can be a good archive target…Providing you can get your data off at some point in the future when you realize that you’re backed into the proverbial vendor locked in corner…Like Centera–which is certainly neither inexpensive or cheap to support. It’s roughly the cost of hitachi frame capacity, it’s hard to ever get data off without paying for “migration services”, and it’s freaking always throwing errors.

  1. I think the most important property of archive is that it is static. Content that you put in an archive don’t change anymore. The data is “inactive”.
    The other ting is long term: at least months, ususally years.

    Sven Neirynck

    1. Exactly – if the data is still changing, it’s active data and should be stored appropriately.

      And backed up. Active data should be backed up regularly to fit whatever RPO you’ve promised your users. The problem with backup is that it doesn’t support an RPO of zero in the event of corruption. Best you could hope for is a 15 minute snap on your production data and the ability to detect corruption within your snap window.

      Though I’m finding myself intrigued by the supposed ability of EMC’s new RecoverPoint appliance to allow a DVR style roll-back to a point in time. I’ve not seen it actually happen yet, but I’m told it’s pretty slick.

      Not being a big fan of appliances in any form, more moving parts to break, more latency, etc. But I’m interested in this one.

    • Spaceman on October 15, 2009 at 10:09 am
    • Reply

    Take a look at Caringo CAStor. It’s Centerra grown up in the real world.

    I’ve had my share of crap solutions from most vendors. Can’t get any of it past the economics anaysis and that’s why they suck. Or, you get it in house and find out that the vendor’s consultants suck and you are stuck with crap that doesn’t work %100.

    I built a 4 node, 2 cluster, replicated 12TB CAS solution for less than Netbackup licensing alone costs for 12TB.

    USB or netboot. Commodity hardware. Pay one price for all features. A simple and elegant solution. Storing fixed content shouldn’t require a consulting gig and a huge budget. These guys invented Centerra and now they can do it right:

    Tape is great for bulk, makes sense for backing up our Celerras and active data, but it sucks for backing up our mail servers (90% of which is fixed content within .nsf files), and huge fixed content filesystems we can’t take down.

    I just love the way a bad LTO drive can silently turn hundreds of LTO tapes into useless junk without telling you about it till it’s too late.

    Want a wake up call? Get a unit that reads the CM chip counters in your LTO3 cartridges. Have plenty of antacid on hand when you do.

    We are sending this stuff to CAS. Let the CAS take care of the data integrity (which my tape doesn’t)and rep it around the world to handle regional disasters. I don’t want to worry about being able to restore every last thing from crappy tape.

    Give these guys some business because they have a good technical solution that doesn’t suck and we need more guys like this out there…
    They’ll give you something like 4TB for free to play

    I need to back off the coffee…

    1. Interesting – I’ve also played with the “DSS” product from “Open-E” ( They do full NAS/SAN (with FC support) in the same box.

      Up to 2TB support is free with the DSS-Lite product.

      haven’t played with it recently, getting the Celerra in my production environment kind of negated the need for finding a third-party app. 🙂

Leave a Reply

Your email address will not be published.