Cloning for fun and profit.

You know, everytime I start thinking of “Cloning” I am afraid the far-right is going to burn me in effigy, just on the principle of it.

But in this case, I’m talking about cloning disks within an array for a data migration.

The decision was made to move our Microsoft Exchange (Corporate Email) from the Tier-1 (Raid-1+SRDF) to Tier-2 (3+1 Raid-5) storage.  I guess the logic is that losing a day of email won’t hurt us horribly.  (Ok, maybe it will, but I’ll get into the solution to that problem in a different post)

In this case, I’ve got the following drives:

7x 84G Metavolumes
4x 33G Metavolumes
2x single hypervolumes.

The new Exchange administrator, who I am actually most impressed with so far, would like to add 3x 200+G Metavolumes to the mix.  The main reason for the move is that we’re rapidly running out of Tier-1 storage, and need to save it for expansion, production growth, etc.  (Or buy new disks, but that’s yet another story)

So I am going to use this opportunity to demonstrate the power of TimeFinder/Clone.

I’ve created the new volumes on the Raid-5, mapped them to the front end ports, and prepared the masking scripts to move the device masking from the old devices to the new. 

Now in the old days, the way you’d do this is; Shut the hosts down, do a bit-by-bit copy of the disks (if you’re lucky you can do them in parallel, otherwise it’s single-threaded) change the pointers on the host, and bring them back up, hoping everything is exactly how you left it.  Net downtime could be in the neighborhood of several hours, if you’re lucky.

This is a new beast.  Enter TimeFinder/Clone.  Now I have the blank devices.  I do things in a slightly different order.

  1. Create the clone session – this establishes the pairing of devices.
  2. Shut down the Exchange services.
  3. (each node) Unmount the existing disks using admhost or symntctl
  4. (each node) Change the device masking so Exchange sees it’s new disks
  5. (each node) re-scan the bus to remove the entries for the old disks and create the entries for the new luns. (at this time, it will see all new blank disks)
  6. Shut the cluster hosts down.
  7. “Activate” the clone session 
  8. Bring the first “Active” node of exchange up
  9. Bring the passive node up.

Net downtime is about an hour.  The reason being, once the clone session is activated, a background copy is started.  However from the target side, any reads to “invalid” tracks on the target disks actually get serviced from the source disks.  As far as the host is concerned it’s all the original data.  

As the copy progresses, more and more of the reads are serviced from the new disks.

When the array receives a write to a track that hasn’t been copied to the target yet, that track is first copied, THEN the write is processed on the target disks only.  This preserves your production disks in their original state.

With the advent of TF/Clone, I’m surprised anyone still uses BCV’s.  They’re so “old-tech”  The main hang-up of course was the fact that while you (in theory) could protect a BCV using Raid-1, the performance hit you took during establish and split operations was so bad that it wasn’t worth it.  With TF/Clone you can go from Raid-1, to 3+1 Raid5, to 7+1 Raid5, etc. etc.  Without minimal performance impact.

The only downside comes of course when you’re cloning production volumes while they’re in use.  Since reads are being serviced by the production disks and not the clone disks (technically, the tracks you’re reading are simply being copied to the target while the host read is serviced from the cached track) you’re impacting the production spindles during the copy process.

It’s a cool bit of magic – and it’s really fun to play with the minds of people who don’t understand the technology.

16 comments

Skip to comment form

    • on March 24, 2007 at 5:42 pm
    • Reply

    As I recall, you get your point in time copy of the source devices when creating the session, activating the session simply makes the target devices ready (usable by whichever host sees them). Wouldn’t you want to wait to do step 1 until after you’ve quiesced Exchange so you get a good consistent copy?

    Anyhow, I agree, clones are great! You mention the downside of impacting production when you run clones. We have the advantage of having most of the hosts which need to see clones of production at a different site. There’s another Symmetrix at that same site with synchronous SRDF R2s of all our production devices. We take copy-on-access clones of the R2 device there rather than of the actual production devices on the other side, which allows us to avoid any production impact. Reads on the clone devices reference back to the R2s (which while in sync only take writes over the SRDF link).

    I opened a case with EMC to see if we could do all this within the same Symm, like take a clone of an in-sync BCV device instead of the R2, but apparently this is not currently possible. That’s unfortunate, it’s quite handy to be able to take instantly available clones of production devices and not have to worry about IO contention. Maybe someday…

  1. Actually, no – the point in time is established by the activate command. That sets the point in time – The Create defines which devices are to be paired (since the pairings aren’t maintained in cache like they are with TF/Mirror and BCV’s) and the activate sets the point in time.

    The create also defines what type of clone session we’re using.

    ‘-copy’ sets up a background copy. While the clone session is available immediately, the data isn’t copied over right away.

    ‘-precopy’ sets up a foreground copy. This is most similar to TF/Mirror in that the copy has to finish before the “activate” can be given.

    Consequently, issuing the create with no option initiates a “Copy On Access” session. This is great for a ‘snap-like’ session or a session that needs to be able to be created and broken and created again. Since it never results in a full clone (it only copies the accessed tracks, not the whole disk) it can’t be “permanent” per-se.

  2. Sorry, failed to respond to the rest of the comment.

    The reason that you can’t clone a synchronized BCV is that when a BCV is synchronized to a standard, it doesn’t really have a ‘personality’ of it’s own per-se. It is moved into and takes up the M3 position and personality of the source device.

    You might ask about SRDF within a Symm, it can be done, though is sort of quazi-supported. Basically you create an RDF group to SRDF from one RA to another within the same Symm. For instance, you could RDF an R1 device, say SYM100 to an R2 device, say SYM200, and do your clone off SYM200.

    I think it’s more likely that BCV’s as such are going to go away, because clones offer the same basic functionality with so many more bells and whistles….

    The SYMAPI also has a “compatibility mode” you can set so that Clone will respond to SYMMIR commands like BCV’s do. Handy if you have existing scripting and want to start using Clone instead of TF/Mirror. (The best reason to switch is to clone from RAID-1 production volumes to RAID-5 development or test volumes, since Protected BCV’s offer nothing but so many headaches)

    • on March 25, 2007 at 10:42 am
    • Reply

    Looks like I need to bone up on my SymClone commands! I guess we’ve never created a session without immediately activating it. Interesting bit about the SRDF within the same Symm…Support didn’t mention that option. Quazi-support = bad in my environment so I probably won’t be doing this, but it looks useful for the situation I described above. It’d be a good way to confuse new staff, that’s for sure.

  3. The main reason for the “Create” and “Activate” being separate is that if your split is time sensitive, IE must happen EXACTLY (or near as possible) to midnight, it’s easier to create the session at 23:45 and activate it at midnight. The Activate step takes far less time than the create does, so you’re much closer to your point in time requirement.

    Inter-Symmetrix-SRDF used to be used before Consistent-Splits were developed. Because SRDF is serial, you could guarantee a consistent split by splitting the SRDF and then mounting the R2 volumes, much as you would a BCV, the difference being that because when you split the SRDF it essentially takes the link down at a consistent point, you’re usually protected from the kind of data corruption you can get when splitting a large number of BCV devices. (TimeFinder splits are not “atomic” per se, that is they don’t always all happen at the same time, leading to inconsistent split-points throughout a large volume group.

    Yeah, Quazi-support is never a good idea in a production environment, but it makes testing SRDF a breeze. 😉

  4. Not sure if Im missing something obvious here but (on goes my Hitachi hat) on Hitachi storage there is a technology called Cruise Control that allows you to do this without interrupting your hosts. Its basically an addition to the ShadowImage engine (essentiallyTimefinder if I understand EMC products at all) that creates disk clones and then as a final step switches LDEV addresses so that the target LDEVs get the LDEV addresses of the source and the host has no idea that anything has happened…… well other then the obvious slow down in performance while the clone operation is taking place. No rebooting of hosts or repointing anything.

    Its a really cool peice of technology that Ive used a lot of times. I was always really impressed with the technology until one day snig compared it to dynamic sparing that happens when a disk in a RAID set fails. All vendors allow you to either non-disruptively move or rebuild RAID protected data to another spindle in a different RAID set when a disk fails. And they provide this free of charge.

  5. Keep the Hitachi hat on. 🙂 EMC has something called “Open Replicator” that does that, but only if you’ve already mapped the host to the *NEW* targets. (though I’m not sure if this can be used within a Symm, it’s usually used for migrating hosts off other target devices, IE the Symm acts as a pass-through to the old storage until the data has been completely moved.)

    This is also a $$$ extra charge with EMC, but we’re used to that about them.

    And since in either respect you have to move the host to the target ports first, a reboot is still required.

    Sounds like Hitachi has become pretty robust in the years since I was first introduced to it in the form of an HP Rebranded “XP-256”

    Do they charge for their multi-pathing software or is that still offered with the array? PowerPath is an expensive add-on, though I’m the first to admit that in years of using it, I can count the number of problems I’ve had with it on one hand and still scratch.

  6. Hi Jesse,

    Im not 100% sure about charging for HDLM but my gut instinct is that it is not charged for. Id kind of hope it was free as its not a good peice of software.

    Its not bad at doing its main job – managing multiple paths – and has a couple of load balancing options –
    1. Basic round robin
    2. Extended round robin which tries to be a bit more clever than sending alternate IOs down a different path.

    But its the clunky GUI and lack of management features that is annoying. For example, when you remove a LUN from the storage subsystem, this causes the associated path in HDLM to go red and sit in error condition. There is then no way to remove the path, and the associated error condition, without a reboot. So you often find that you decomission a LUN non disruptively to the host but then HDLM continues to mark that LUN as in error condition with a ig red line until you reboot the host – really really annoying. Especially if your feeding these errors into an enterprise monitoring tool.

    Even in a windows environment Windows often manages to stop seeing the LUN when you do a rescan disks and has no problems. So in that respect its worse than Windows!!!!

    😉

  7. LOL – Powerpath has to be by far one of EMC’s best products. It does different types of load balancing, but the best is that it works with the Symm to determine mean time to acknowledgement and to route IO’s accordingly.

    You can also right click on any failed device in the GUI and remove it. The alert goes away immediately.

    Of course the optimization retails for close to $1500 per host, so it’s not a cheap solution, but if you want to use it in basic failover mode it’s free.

  8. HDLM is at version 5.8 now and Ive been pestering people for that sort of functionality (right click on any failed device in the GUI and remove it) since 5.4 which was about 3 years ago. And everyone I know complains about it. You’d think it was the easiest thing in the world to program into the tool – you’d think!

    Its an example in my opinion of the development guys being too far away from the real world.

  9. PowerPath is, as I’ve said before, one of EMC’s most mature products. It’s existed since the mid 90’s, and has evolved quite nicely into a very robust product. On the windows side, I almost NEVER have to use the GUI after it’s installed, and the ‘powermt display’ and ‘powermt display paths’ commands give me just about everything I need to know about the host’s connections.

    • on August 13, 2007 at 4:50 pm
    • Reply

    Hi all,

    I new for the SAN products . I want to how the create the snapClone in EVMS (2.5.5 ) and LINUX.
    Please provide the commands for create a snap clone for a volumes. And one more thing i want to know that how the snap clone will split from the volume if the volume fills the 100% space . It was more helpfull for me ..

    Thanks in advance to all.

    🙂

    • on August 13, 2007 at 5:03 pm
    • Reply

    Hi ,

    Thanks for the given explanation of the clone. I want more clear on that . R u able to describe me that how the snapclone aand snapshot differ and how we remove the link between the snapclone and the volume respectively …

    thanks in advance

  10. Ok, clones are nothing but mirrors of production data. The way the clone works though is in the background. While the cloned volume is visible to the target host, any unmodified tracks are in fact being read from the source device, until those tracks are copied to the target volume. The copy happens at one of two times.

    a. When the track is accessed by either the source host (production volume) or target host (clone volume) the track is read from the source drive and copied to the target drive and the pointer is cleared so that when the target host reads the track, it’s reading from the target volume and not the source.

    As to your question about the volume filling, cloning is absolutly not going to help you in that – it’s an exact mirror, so if your source volume fills, your target volume will also be full.

    Time to extend the volume and resize the partition. Growing a partition can be done online in most versions of linux provided you’re using the linux ext3 filesystem.

    • on August 24, 2007 at 1:27 pm
    • Reply

    Hi SanGod,

    Thanks for the explanation. i have a clear idea now. I want to know how can i create Snap Clone for a volume. i.e., I have volume named “/dev/evms/gopivolume1” and the name of the Volume Region was lvm2/gopirad/Reggopivolume1 , Now i need to create a snap clone named gopisnapclone. For this what i need to do or command for this using EVMS?

    One thing for you, I am using xfs and etx3 file system.

  11. Not a clue – This is probably a good question for your solutions architect. As I known nothing about EVMS any answer I gave you would be a risky proposition, and as such not a good idea from a liability standpoint. 🙂

    I don’t have m y E&O insurance (Errors & Omissions) in place yet. 🙂

Leave a Reply

Your email address will not be published.