Cisco FCIP and SRDF

Been a while since I’ve written anything – I’m not even sure if I still have a readership.

I’ve been working an average of 60 hours a week on a single project these days.  Doing a datacenter migration and consolidation.  Basically moving 4 Symm-5 generation systems into a single DMX-3.

The funniest part of this has been learning the DMX-3, which I’ve not had a lot of stick-time with.  It seems like a great machine, a good hybrid of the Clariion and the Symmetrix.  I don’t much care for the DAE back-end, too many major points of failure, too many cables.  (Though when you do your first code-load on one, it sure gives you a work-out as far as learning what plugs in where.)

Anyway, as the title suggests, we’re doing a large part of this migration using temporary hardware, in the form of the Cisco MDS9216i.  This is a normal MDS 92xx chassis (2-slot) with a 14/2 FCIP blade in it.  Simply 14x4gbit FC ports and 2xGig-E ports on the same blade.  So far it’s been one challenge after another, and as of this posting we still don’t have the georgia and new-york datacenters talking to each other.

Part of the problem is the customer’s network infrastructure.  Namely it sucks.  For those who don’t know, Gig-E ports on the Cisco don’t negotiate down, they are essentially 1000-SX ports.  The customer, who makes a substantial part of their income off network traffic, doesn’t have a single Gig-E port in the entire datacenter. – that was problem number one.

Problem #2 was, in the datacenter that does have Gigabit available, namely the new in in Georgia, there is no optical available.  So we have to go through the painful process of getting an RPQ (an in-exact definition is “Request for Price Quote” – what it really means is getting engineering to bless the configuration) to use copper SFP’s on the MDS switches.

*THEN* we find out we’re replciating over a DS3 circuit, and even at that that we have to “nice” our hardware down to 12.5Meg/Sec so as to not affect their production traffic, which is (of course) running on the same network.  (SRDF has a nasty habit of sucking up all available bandwidth)

Do you know how long it takes to replciate terabytes of data at 12Meg?  LOL

This is going to be fun.  I’ll keep you posted.

 

20 comments

Skip to comment form

    • on October 19, 2007 at 6:33 pm
    • Reply

    re: readership, I think you’d be suprised

    • on October 20, 2007 at 2:31 am
    • Reply

    Hmm. I have a 9216i in my lab, and I thought those were iSCSI ports.

    • on October 20, 2007 at 2:38 am
    • Reply

    Oh, and why not just get a switch with both copper and optical gig ports?

    • on October 20, 2007 at 3:05 am
    • Reply

    i have on my desk right now a Dell Powerconnect 5324 with 24 copper gig ports and 4 optical gig ports.

    http://configure.us.dell.com/dellstore/config.aspx?c=us&cs=04&kc=6W300&l=en&oc=bccwlk1&s=bsd

    under 1K

    • on October 20, 2007 at 11:43 am
    • Reply

    Actually, I imagine there are a lot of that visit your site daily waiting for you to drop a line. It’s interesting, and you mirror a lot of our lives.

    • on October 20, 2007 at 4:59 pm
    • Reply

    re: readership – I follow the site with RSS so can see when something new is up. No probs. Keep it up – we like the tech talk.

    9216i is an expensive piece of kit. It seems a shame that you are having to cripple it so much.

    • on October 21, 2007 at 5:41 am
    • Reply

    We are doing SRDF now with two MDS 9509’s. The qualification process was a nightmare and they are very nit picky for those darn WLA files from ECC what a major PITA. Plus to boot we were told the RA’s can barely handle the load so we may need to buy additional ones.

    The one annoying thing is we went through a bin file change and un-known to me once the standard devices are converted to R1’s you can’t just take them and make a meta. So I had to go through the pain staking process of converting my static R1’s back to standards using symconfigure scripts and then convert them back to R1’s reestablishing them with their R2’s… Grrrr, but it has been a great learning experience….

    But regarding network infrastructure I’ve seen worse, lol and ours is not looking to pretty.

    • on October 21, 2007 at 5:44 am
    • Reply

    Also regarding the DMX-3 how do you connect the Fa’s to the switch. I have seen sites and done projects where they connect all FA8’s to one switch and all Fa9’s to the other. I don’t like that configuration I tend to do it like this for example: FA8 port 1’s and Fa9 Port 0’s on one switch and then the other, something about if a director goes down and having all my fa’s on one switch did not like, at least with the 2nd configuration I still at least load balance across the cisco’s.

  1. I feel so loved. 😉

    RA’s rarely have issues with load, more likely what is happening is that the rate of change is faster than they can push out. This happens quite often. You should have two cards, so four ports. Tha’ts a 4 gig pipe. Now the biggest concern will be what you have behind the 9509s. Are you going intra or inter site? If inter-site, check the bandwidth between the two and the QOS.

    Yes, the acts of making metavolumes of RDF devices is a pain, that’s why it’s actually best to leave them separate until you need them, then meta the source and target devices, and create the RDF pair. Saves several steps.

  2. I’ve seen it done both ways. 8aA to SAN-A and 8aB to SAN-B. the problem is when you run powerpath across ports you’re going to want to keep a separate path from end-to-end. By putting 8aA and 8aB on separate swtiches (and conversely 9aA and 9aB) you will end up doing SRDF using 8aA and 9aB to keep your separate paths.

    By putting 8aA and 9aA in separate switches you greatly simplify your management. Using powerpath it doesn’t matter how you cut it, you’re going to lose half your devices when an FA back-end needs to be replaced.

    At least when you do it using the FA8–>SANA and FA9–>SANB you can exactly tell which HBA is going to go down on each hosts. (Becuase HBA0 should always be in SANA and HBA1 will always be in SANB)

    The other reason for doing it is becuase everyone else does it that way. And while I’m not usually a big fan of conformity, there is a great reason for doing so in this case.

    Some day – I, or some other EMC geek, might have to come in and work on that site. It helps us all if the same standard is followed across the board.

    • on October 21, 2007 at 9:17 am
    • Reply

    Yeppers. Fabrics can be complicated enough. You can do a lot for yourself, the next guy to have your job, and any consultants that come in to help by keeping things simple and as universal as possible.

    Of course, you could always just switch to NetApp and take advantage of IP as a replication medium. Then you can let the network gys worry about the bandwidth. 🙂

  3. You NetApp guys, I swear. One of these days I’m going to set the site up to filter out all references to NetApp. (Just kidding, I’m more and more storage agnostic as time goes on)

    Truthfully, you have the same option with the Symmetrix. From the Symm 5.5 on Gig-E boards have been avaialble. Of course a couple of Gig-E boards will set you back a penny or two, but they can function as both RDF and iSCSI or any combination of the two, (though I don’t believe you can blend both over a single port)

    Not a cheap option, but very solid technology.

    • on October 24, 2007 at 2:36 am
    • Reply

    I’m going to echo him on this, from the beginning, I’ve done dual fabrics, 8a to one, 9a to the other. Along with the maxim of naming your fabrics differently and naming the aliases and such with the name plus the fabric number helps also…Like for instance when some wiring person accidentally wires wrong and you accidentally merge your fabrics. This highly unlikely(and as pratchett would say, “definately going to happen because it’s unlikely”) scenario happened to me not that long ago. Luckily, the naming convention and the simplified mapping pattern saved my butt because it was so STANDARDized that it was easy to figure out what and where the issue occurred. Any admin could have come in behind me(had I taken a swan dive from the roof) and fixed it as well. It helps all involved if we stick to defacto standards, and the separation of FA’s is exactly that. Additionally, the habit of naming the aliases with a name plus a fabric designation meant that nothing was overwritten, merely added to. Always follow traditional standards.

    • on October 25, 2007 at 2:17 am
    • Reply

    Hey Bill, good idea about denoting the fabric name in the alias name. Can you give an example of how you’re doing that?

    I usually use something like : ali_hostname_port (ali_exch02_4a)

    Do you do something like “alif1_hostname_port” for Fab1?

    • on October 25, 2007 at 11:50 am
    • Reply

    Sure;

    CLUS_NODE1_F2
    CLUS_NODE2_F1

    SRV_OS_APPLICATIONNAME_F1
    STR_DEVICE_CONTROLLA_F1

    etc.

    • on October 26, 2007 at 1:51 am
    • Reply

    Right out of the book. CAPS and all. 🙂

    Off topic, but am I alone in detesting ALLCAPS?

  4. I have always been a big fan of making everything as descriptive as possible.

    Creating aliases for host devices to include the hostname and hba identifier. (In case of sun/emulex this might be HOSTNAME_LPFC0 or HOSTNAME_FCS0 or some thing along those lines.

    For the storage, some times the cusomter names the array something goofy, like ac5456sm, but usually it’s left up to me. So I do it simple, like “Symm1234_FA3aA”

    Then, when creating zoning, it’s easy, it’s:

    Z____

    or

    Z_HERCULES_HBA0_Symm1234_FA03aA

    The “Z” at the beginning prevents you from running into problems if there is a host that begins with a number, which has happened, it also clearly identifies it as a zone as opposed to a ZoneSet (ZS_…) or a config (CFG_…)

    Also stick strictly to the “Single Initiator (host) / Single target (Storage)” rule, the last thing you need in a performance sensitive installation is cross-talk between ports that don’t strictly need to talk to each other.

    • on October 27, 2007 at 1:42 pm
    • Reply

    That’s in a book?

    Sorry, but I got given the storage position, I”m not by nature a storage administrator….But I’ve been lucky, and grown a single switch environment into 20+ switches and multiple vsans, multiple arrays, tape libraries, etc. No major glitches, even moving from Brocade to Cisco(while not bringing down the fabrics no less, or even fully bringing down the attached hosts–1 hba at a time). Standards are good, but no one ever gave me a book, the guy I followed used pretty much that naming scheme, and everything else I learned off the internet(Which makes it interesting when your symm’s source port for a lun has two different identities and you’re trying to figure out how in the hell that happened).

  5. You know – i thought it was until I went looking for it, and I think it’s just one of those secrets that EMC people pass down as a part of their oral tradition.

    Actually – the truth is that I remember hearing about it in a “SAN Planning & Design” class I took a while back. The whole class was built around this behemoth of an excell workbook that was very popular for about 15 minutes in 2004. 2.5MB of visual basic macros before you even put the first hostname into the sheet.

    When it worked it worked, when it didn’t it crashed horribly and scattered hours and hours of design work to the four corners of the universe.. (Not that I have any ACTUAL experience with that…)

    Yes, I did the same design no less than 10 times because of this stupid thing. 😉

    /jg

  6. Oh, and just for the record, the migration was started at 10pm on Sunday after a 72 hour work-week trying to figure out a problem with the 9216i switches that turned out to be an incorrectly set MTU on the ethernet ports.

    It defaults to 2300, presumably because a fibrechannel frame is 2112 bytes and you wouldn’t want to segment one frame into two IP packets, right?

    Unfortunately most ethernet switches are set to 1500. When you try to cram a 2300 byte packet into a 1500 byte buffer, there is bound to be some spillage.

    That explains though why it was fine up until we tried to start replication. Sitting there passing status packets back and forth is easy, never exceeded the MTU, but once we started pouring the data down the pipe, it choked on it.

    Anyway, we set the cisco’s to 1500 to match their network infrastructure, (trying to get this particular behemoth to make a switch change like that would have taken months of red-tape) and we just deal with the packet-straddling that it causes.

    We still managed to have the data completely migrated by 8am, Tuesday morning. a full 12 hours under the 60 hours I had predicted. 🙂

    I’m flying down first part of next week to teach them how to install Emulex. Poor guys have had nothing but JNI in their environment for years.

    –J

Leave a Reply to kaneda Cancel reply

Your email address will not be published.