Network Appliance

I went to a NetApp demo today, and they were trying desparately to show me where they competed with the Centerra.

First off, i think the demo went in the wrong direction.  I am not the “average” customer, I wouldn’t have been there if I wasn’t interested, so it should have been very much less ‘sales-pitch’ and more nuts and bolts, ‘geeky details’.

My first question, and one that they were not able to answer was about the compliance clock.

First off, the coolest part of the netapp is that the structure of the fileserver itself is stored within the metadata on the disks, as well as in the processor.  This means that (in theory, because I’ve never seen it happen) you can pull the disks out of one filer, put them in a new one, power it up, and have everything exactly as it was when you shut down the original.

Now this is a good thing, except that I understand the compliance clock exists and has to be initialized within the processor.  Now once it’s set it is locked.  The gentleman who ran the demo even admitted he doesn’t know of a way to “clear” it, though I’m sure it can be done through a fairly routine clearing of the NV ram in the storage processor.

So if you’ve got data on a raid group that can’t be deleted, you shut the array down, move the disks to a new array, and boot it.  You then go and initialize and set the compliance clock in the new unit to 30 years ahead and poof, you can now delete data from the disks.

Yes – it’s an unrealistic scenario, but I have always pictured my job in situations like this to be to find the hole in the ruleset and drive a truck through it.

if you can move the disks to a new array and tinker with the clock there, then it’s not a true compliance product.

Can anyone tell me if I’m off base?  Is the compliance clock dependant on the disks as well as the array?

My second problem is the idea of “block-level” remote replication.  The one thing I liked about the centerra is that it’s policy-based replciation is object based, meaning that when a file is replicated it’s pushed to the remote array.  This, among other things, protects the integrity of the remote filesystem. (not that Centerra has a filesystem per-se)  Block level writes, when interupted, can cause filesystem-wide corruption and other general weirdness.

On another (minor) point, the fact that replication is accomplished by reading the data just written to the disk, would double the IO load on the devices.  (Why do it that way, when it could be simply written directly from cache to two locations…but that’s just crazy talk, right?)

 

6 comments

Skip to comment form

    • on July 3, 2007 at 10:28 am
    • Reply

    I feel your pain. I was a guest speaker at vmware’s symposia in atlanta, and got to spend some time with some new equallogic gear. The sales guy ruined the experience for me though, because he didn’t really want to answer questions, he wanted to run his “show” for the mostly it manager groups. Even when it was just me and one guy left, who both wanted “geek”, he was still doing the pitch. And every other sentence was “we can beat a dmx”. Number 1, I highly doubt he can beat a dmx, but more immportantly….I don’t care. I wasn’t wanting to replace my dmx, I was wanting to know if it was worth putting into my environment! I’ve heard great things about equallogic, and saw some pretty cool things, but what I wanted was an engineer, not a sales pitch. I was wanting to know if it was worth the money and energy to get some equallogic for tier 2. I personally don’t like the centerras, this might be a good way to archive long term or use it for “long term storage” where it might be pulled off tomorrow, or might sit for a year(what we currently use our centerras for).

  1. Yeah, and I feel like given how well the sales guys know me, they would have pushed the demo in a more ‘nuts-and-bolts’ direction.

    I know how to point-and-click, it’s a skill picked up through years of forced exposure to Microsoft products. What I want to know is not only does it do what they say it does, but how and why. Knowing how something works or why it works is key to being able to troubleshoot an issue at 2am after a 26 hour day.

    • on July 20, 2007 at 1:22 pm
    • Reply

    Regarding the block level replication..I presume it was Snapmirror they were demoing. Your concerns about the integrity of the filesystem are moot, since what is being replicated is a snapshot of the filesystem, not the active filesytem itself. The replication is a-synch, or semi-sync in most cases.

    Your question about the Compliance Clock is a frequent one. In short, the Compliance Clock is synced to the system time only when it is initialized. It will slowly resynch if the clock changes, but there is a bounded drift, so that if it’s too far off (like more than a few weeks) it might take years for the clock to catch up to the system. The farther ahead the “new” system time is, the longer it will take to catch up. In short, advancing the time will cause the files to be locked longer than if you’d left it alone. This is well documented. If the SE didn’t the answer to that, he shouldn’t have been talking about SnapLock. 🙂

    HTH.

  2. That’s what it seems. I went in for a follow-up lunch&learn with my friends at Strategic and I think I came away with a much better understanding about how it works.

    When a replication cycle starts, a snapshot is taken from the production filesystem. That snapshot is then replicated to the target system, however until the replication is complete, the changes are not rolled-into the target system, so if 100mb of data is out of sync it copies that to the target, then when that copy is complete the target system lays the new data over the old.

    That way your point in time on your target is always the start of the replciation cycle. it’s very similar to how EMC’s SRDF/A works, but not nearly as high-tech.

    SRDF/A does serial, block-level replication, with a checkpoint between each set of changes (or ‘delta-set’ in EMC terminology) Only when both the start and end checkpoint are received by the target system is the replication rolled into disk. It’s why the target system in an SRDF/A environment has to have more cache than the source.

    • on July 25, 2007 at 8:43 am
    • Reply

    Well, it’s actually keeping a copy of ALL the snapshots on the destination. It’s a mirror of the volume, including any snapshots on the source volume. And it doesn’t mark a replication as complete unless it’s verified the block map is identical to the source.

    It’s moving blocks. Only the changed blocks. And it isn’t overwriting any blocks on the destination, unless they’ve been overwritten on the source. I don’ think it’s low-tech. It’s pretty elegant, to be sure. But the filesystem that enables it is pretty sophisticated. Huge Cache isn’t needed, because the filesystem is not dumb about handling and versioning blocks. In most cases, the DR system is much less powerful than the source/sources. See also: Much cheaper. 🙂

  3. I’ve heard of WAFL before, it passing conversation. Seems like good technology.

    I think that the price-point is going to be the make/break on this one though….if I can’t show a cost-savings right out of the gate they’re going to go with EMC, or just put the whole damned thing off.

Leave a Reply

Your email address will not be published.