Oct 23

On tape…

Ok, I have no problem with tape.  It’s a *GREAT* backup medium when your requirement is portability for massive amounts of data and you’re not replicating said data.

If I had to ship 400TB of backups to Iron-Mountain, to protect against the earthquake-to-end-all-earthquakes tape would be my FIRST choice (though maybe, as a GIANT CAVE – Iron Mountain might not be.) 😉

But… (and this is where it gets fun)

I have a customer who *LOVES* tape.

Wants to have it’s children loves it.

Uses it as primary storage loves it.

Now if you:

A> Have a few hundred terabytes of data to Archive.

B> Have millions of dollars to spend on giant room-sized storagetek libraries, and the space, power, and cooling that that entails.

C> Really love tape.

and most importantly

D> Live in the early 1980s

Then Archival to tape is *SO* the way to go.

The argument given is as follows.  “Tape is cheaper than Disk”

Well yes, on a terabyte for terabyte scale tape might be cheaper…maybe if you exclude the hardware.

But if you throw something along the lines of EMC’s Atmos product, or even Centerra, or I’d even go so far as to say the NetApp box appealed to me at one point.  (Now that the Celerra supports File Level Retention, I’ve been cured of that.)

Because when  you throw in modern options like replication and, dare I say it, DEDUPLICATION, Disk rapidly becomes the better, faster, more cost effective way to store your long-term data.

Now I wouldn’t recommend anyone go out and buy a DMX-4 for Archival purposes..  (Though if you want to let me know ahead of time so I can buy some EMC stock. – I’m not currently holding any.)

I checked, and the only Tape vs. Disk comparisons I could find on-line were done by storage vendors, each of which has their own agenda (and big surprise, the analysis came out favouring whatever they were selling), so none of them are valid in the grand scheme of things.  (I have a few things to say about marketing and statistics, but that’s a different post)

The things I look for when judging where to store data…

A> How many copies of the data do I need?

This is often overlooked and a question not asked.  How many copies of a piece of data do you really need?  And how many do you currently have?  I’ve been in one data center recently where they LITERALLY have boxes of old tapes stacked up along the walls.  (Note: Storing your backups WITH the system you’re backing up doesn’t do much in the event of a fire or natural disaster)

B> How long to I need to keep the data?

Retention policies are a big catch for a lot of people.  For “Backup” purposes (see my last post) I say two full backups are all that is really required.  If there is any kind of a likelihood that some critical corruption could be missed for weeks (or months) than adjust your backup strategy accordingly.  (or find a better way of auditing your production data for errors)

C> Does my data have to be portable?

Ok, this is aimed specifically at Tape.  The answer is this.  If you have a remote DR facility and a high-speed connection between them, there is absolutely NO REASON to go to tape for portability.  By virtue of Replication (whether it be the production data or VTL) you’ve already moved your data off-site.  Now if you’ve only got one data centre and it’s sitting right on the San Andreas fault line (I’ve actually worked here – not joking) then send tapes off-site.

Lots of them.

5 or 6 times a day if you can.

D> Am I storing a copy of production or my only copy?

If you’re storing a copy of production (running) then chances are you’re not going to need the backup.  If you’re protecting yourself against someone hitting the delete key accidentally, then maybe Celerra (SnapSure – periodic checkpoints that even the users can access themselves) or Centerra (Don’tEvenThinkAboutDeletingThis) are better options.

If you’re storing a copy of something so you can make room for something else, than backup tape is probably not your best option.  Consider an archiving solution like Atmos or Centerra, or even a Celerra with File Level Retrieval enabled – and version 5.6.44 and later supports de-duplication (both single-instance storage and compression) natively.

E> Do I have the money to spend now, or am I willing to spend more over time to keep the initial investment down.  (This is a valid question – and I’d like to know if anyone has any ideas on which would be the cheaper initial investment.

Just remember that you have to count the floor-space as well.  Something many people forget when scoping out storage buys.

if I want 150TB of storage and I want to do it with tape, what’s the supporting hardware going to cost me?  (A single CX4-240 with one rack of disks can provide up to about 220TB of storage with current drive-sizes.

A final note.  Remember with any “portable” backup solution that you have to keep your backups safe.  Tapes, like disks, don’t respond well to things like…well…dropping.  Anytime you transport a medium from one location to another physically you put that data at risk.

Just my .02 cents.


Skip to comment form

  1. william bishop

    Again, I’ll use a personal example. TSM does dedupe, so you’ve no longer got that hanging for you. My tape library does 300meg a second, and put’s 2 TB on a tape.

    Now, let’s think about this. The 20+T of centera I just bought costs approx 80,000 dollars and will last me 1 year. I can do the same on tape for 2,000 dollars. Can I put the centera in a car and move it someplace else for restoration? Nope. Does tape have a constant power draw? Nope. Does a tape lose nodes? Nope. Does it cost me an additional 80k to have another copy? Nope. If I already have copy one on spinning disk, then why, in the name of all that is holy, should I pay 40x the cost for a copy that can be done for far, far less? And can be restored on pretty much any compatible tape drive?

  2. william bishop

    Additionally, does the tape cost me a FORTUNE in annual maitenance? Nope. When I pull the tape back scratch to move the data to a denser medium, does it have the same capital cost as buying newer denser nodes? Nope. I’m failing to see where disk is even in the running. If I need a second copy for ultra reliability, yep, I’ll keep that first centera. But honestly, tape would be my second target

    1. Jesse

      Ok, a few points.

      1. You’re talking about backup, and I conceded the point that for backup purposes tape is the best way to go. Highly portable, extremely cheap.

      2. What I’m talking about is using tape for “nearline” or “online” storage. Files that are being recalled on a regular basis, that need to be done so in a timely fashion. My customer just bought two more of those giant StorageTek libraries, you know, the ones bigger than my first apartment…for storage that is *NOT* backup. This is archival storage, nearline.

      Now the problem with this is, the $2,000 figure you quoted doesn’t include the cost of the hardware, and when you start throwing multiple silos at a solution disk QUICKLY becomes a more viable alternative to tape. These silos are at least a million a piece, not including maintenance. I don’t know the money end of it but I’m pretty sure you can drop a Petabyte of CX4 storage on the floor for about that much, let alone cheaper Atmos or Centerra options.

      Secondly, I want to know where you get the idea that Centerra storage is only good for a year – I’ve got two running right now that have been going strong since 2004.

      Now my quantum rep told me that a tape is good for about 5 years. That means that even if you put a tape in a climate controlled environment after that point you run the risk of data loss if you don’t read the tape and copy it to new media. Start talking hundreds of terabytes to petabytes and you start requiring a seperate environment JUST to handle duplication of your data.

      I stand by my original point. Tape is WONDERFUL for backup. For archival and “nearline” storage it’s just not up to the task.

  3. william bishop

    I mean that I eat through that amount of storage per year. And in 4 years (at the outside), I’m going to have to replace those nodes to keep up.

    You don’t need to have multiple tape silos…you add tape heads. It sounds like he paid WAY too much.

    For nearline, yes, tape would not be adequate for some functions, I agree there….as I also said earlier, 1st target disk, second target tape.

    If there is no penalty for latency on retrieval, tape might work nearline…I must have missed where you were telling us he was using it thusly.

    1. Jesse

      He’s using it as only-copy-production data. Not my idea of a good time. I would *NEVER* put only copy data in anything portable for the sole purpose that if it can be moved or carried it can be dropped, and tapes are just as vulnerable as disk to drop-faults.

  4. Sto Rage

    Have you looked at COPAN Systems? They have a unique MAID system that scales well. We have been using it for all of our Archive/Nearline needs. We now have a few 100 TB on them. Great value.

  5. SANDiety

    In our backup environment, datadomain + lto-4 in a ‘blended’ solution is working very well. For 90% of our hosts, backups that have a retention of 1 year go to disk. Just a quick look: our oracle data (rman fulls bi-weekly + more or less hourly archlogs on deduped-disk) =
    Total files: 29,176; bytes/storage_used: 6.9
    Original Bytes: 47,634,887,009,428
    Globally Compressed: 27,944,163,293,266
    Locally Compressed: 6,768,282,021,668 (47T to compressed to 6.7T)

    vmware (vmdk files/wintel os’s)…
    Total files: 5,361; bytes/storage_used: 24.1
    Original Bytes: 19,759,665,225,250
    Globally Compressed: 1,625,616,555,655
    Locally Compressed: 816,176,984,187 (19T to 800GB)

    We do have monthly full requirements that are greater than 5 years – so we simply put those to tape directly.

    Some of the business ‘sell’ for us was:
    1) Tapes have a lower level of reliability for use as disaster recovery. Once ejected, they SHOULD work…but I’m not betting on ‘should’. Have has too many go bad when needed over the years.
    2) Any Oracle database can be recovered point in time going back 30 days, without having to wait for a tape recall. Huge for us. Also, ever notice how DBA’s want/love their nightly fulls? Ever ask them why? Usually, the answer is tape related…if theres a problem with a tape that happens to be holding 2 day’s of arch logs…the full is all they have. Disk changes that, or more specifically, redundant disks.
    3) Effort. VTL or using a dedupe array is as simple as it gets. Backup runs, no tapes to rotate, no drives to clean, no tape loss worries, no bad tapes getting hung in the drive, no middle of the night robot gripper/hand failure…it’s great.
    4) Backup success rates are important. We regularly get 100% nightly backups (~12TB nightly). With failures usually related to client side issues .

    Anyway – as for the annual maintenance point – I guess it depends on if the annual maintenance on a disk array is worth more to a company, than a higher rate of successful backups. When we were tape only, our policy effectively stated “if we missed a night, we had to guarantee the next”. While we still have this policy in place…I don’t worry about the next night nearly as much.

    just my 2 cents…

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>