VMWare Host Isolation Response…

I learned what “Host Isolation Response” was today.  Well I already knew what it was, but I learned that in a VMWare cluster, if you leave it at the default, then if the network goes away between the clustered hosts, the HOST then RESPONDS to this ISOLATION by shutting your entire environment down.


Not that anyone would notice, but from 1:30 to 2:00pm EST the system was offline because I (ironically) unplugged the switch briefly to put a battery behind it.  Needless to say it’s better now, but it wasn’t quite the “momentary interruption” I had hoped for.

Not going to be long with this one, needless to say I’m prepping to start traveling again.  Not totally excited about it, but I hear Seattle is nice during the summer in much the same way Virginia isn’t, so at least there’s an upside.

And suddenly…(redux)

**ALERT** I’ve had to…modify this post so it won’t offend someone who doesn’t realize that the storage community is very small and that word will get out regardless…

I’m unemployed.

Unexpectedly too.  Unexpected because right up until the day they told me to go home because I wasn’t getting paid, everyone assured me that the contract renewal was in the bag.

I’m such a sucker.  Believing people like that.  Never again.  I’ll also never believe anyone who tells me “don’t worry about it, I’ve got you covered if there’s a gap.”

It’s ok, next gig is on the horizon already…  And it looks like it will be something that while geographically unpleasant, will be a great job I can learn a LOT from, and truly excel at, which for me is key, because I’ve spent the past two years trying to shoe-horn new ideas into the heads of people who think a new idea is like anthrax, to be avoided at all costs.

(And with that I’d like to say hello to the nice folks at the NSA.  Please forgive me, it was an analogy, if a badly placed one.)

Consulting sucks sometimes.  The worst part of course is not knowing where you’ll be working from year to year, or the fact that you have to keep your eyes open, in permanent recruiter mode.

Of course the money is great, and if you tend to go stagnant on doing the same thing over, and over, and over again…It’s nice to be able to change.

It’s a pity that with being yanked out of an environment with no notice comes no turnover on the projects, and that there are a few implementations that I was in the middle of that might blow up if not tended to properly and in the right time-frame, which sadly isn’t far off.

(Ok, first anthrax and then the phrase ‘blow-up’ – the boys in black are DEFINITELY knocking on my door tonight.)

So the real question is…who is going to get saddled with picking up where I left off, *AND* are they going to ask me to help…

Can’t wait until that happens to give me the opportunity to lecture someone on the value of giving notice. 🙂

Good Cloud, Bad Cloud, a Titanic story…

This weeks abject failure of Amazon.com’s EC2 hosting environment has caused quite the stir.  There are those who say that this proves that this incident “Proves Cloud Failure Recovery is a Myth” and others who say that we should just give it a chance.

Facts are facts.  Amazon screwed the pooch big-time last week.  Their outage caused ripple effects nation-wide.  But while it’s easy to throw the blame at Amazon for the failure ti’s important to remember that cloud computing is still only in it’s infancy, this mad rush to adopt it is part and parcel of the reason these problems are happening.  Customers rushing for a new product creates demand, companies looking to be the first to capitalize on that demand create a product that may or may not be ready for prime time.

But because no-one ever (because it’s impossible) thought to test the kind of cascade failure they experienced, they were pushing the high-availability envelope right out of the gate.

So no big deal, right?  Foursquare, parts of netflix, etc. were down due to the outage.  Other than inconvenience and the inability of narcissistic people to let the world know where they are and what they’re doing, it’s not really that big a deal (for us)

And then this came out: https://forums.aws.amazon.com/thread.jspa?threadID=65649&tstart=0

Specifically this line:

“We are a monitoring company and are monitoring hundreds of cardiac patients at home.  We were unable to see their ECG signals since 21st of April.”

Really?  You have a life-critical application and you hosted it “in the cloud”?  Did it never occur to you that it’s probably *NOT* a good place for a life-or-death application?  While I would consider it as a backup, definitely not my one and only.

People who know me know I have a rule.  I don’t say it works until I’ve seen it work at least once, and even then I’ll qualify my statement with “well I saw it work under THESE conditions.”  I do *NOT* say something works based on what some sales or marketing person tells me works.  (Trust me, this has been a major sticking point between me and my sales team. 😉

That being said.  You have to accept that if you put your critical apps in “the cloud” by it’s very nature you are abdicating your control over it, and putting your full faith in someone ELSE to fix the problem.  Someone who may not think your application is as important as the one in the rack next to yours.

Are you going to take someone’s word that something is “Highly Available” if you haven’t actually pulled the plug yourself and watched it fail over?  I won’t.  I will candidly couch my answer in “That’s the way it’s supposed to work” or “That’s the way it’s designed to work”  But until you see a failover, that’s not the way it DOES work, because it never has.

I run my own email, my own webserver, my own infrastructure. I prefer it this way, because now if the system goes down, I know exactly whose butt to kick.

As a rule, and If I’m paying someone else to provide a service… I make sure I know where, how, and who to call when it blows up.  It’s probably the best advise I can give.

Amazon billed this as being “highly avaialble” and maybe it is, for the most part.  But obviously if you think of a million ways for something to go wrong, you can bet even money on their being at least a million and one ways for it to fail.

Instead of EC2, they should have named it “Titanic” because everyone knows the easiest way to invite disaster is to tell the world you’re immune to it.

IBM XiV – Real-Life impressions…

The Ethernet back-end on an XiV will still be it's undoing

That's a lotta ethernet...

First impression of the XiV in “action”

The GUI is fancy.  Looks like a Mac turned on it’s side. The GUI is also NOT web-based.  It’s an app-install. I do believe however it’s available for multiple platforms.

It really does seem to take all of the guess work out of provisioning since you don’t really have any say on what goes where in your array.

Our first use?  Backing up 6+ TB that was stored on Clariion and moving it to XiV…

Now first off, I’m glad it was decided to do it this way.  Whereas a copy straight from one to the other is possible, utilizing both arrays at the same time, it wouldn’t have provided any comparison as to performance.

The backup was done using Veritas NetBackup, over the network.  The data consisted of a pair of hosts running an extensive XML-type database used for indexing and categorization of unstructured content.  The backup and restore were both done to the same host, over the same network, and the storage was addressed over the same switches, just zoned to different arrays.  The only significant difference was that while the backup was done multiplexed, the restore had to be done single-threaded…(because NBU multiplexed both backups to the same tape)

I have to get the final start/stop-times out of NBU, but from the halway conversation I had with the NBU guy, the backup took 6-8 hours (for both hosts), the restore took 21+ hours…

The most interesting part of it was the first restore took almost the same amount of time as the backup, which is kind of what we would expect.  The second host took dramatically longer to restore than to back-up.

This would indicate to me that, as expected, the XiV didn’t handle the long, sequential write very well.  Since the host only connects to two of the six data nodes, virtually 100% of writes have to be destaged over the Gig-E backend.  My guess is we nailed the cache to the wall with the first restore, and then kept it pegged with the second one.

I like sequential write-tests on this scale because it shows without a doubt whether the cache is masking a back-end issue or not.  If it is, this is exactly what you’ll see.  An initial burst of writes followed by a sharp drop as cache is saturated.  This is even more pronounced in a more utilized array (rather than an idle one) because a certain percentage of cache will already be utilized by host reads/writes.

This doesn’t bode well for an application that requires occasional complete reloads of the XML database…

I can’t wait to see it in action.

The Macintosh Expirement – Final

Well, my 30 days are up.

I enjoyed using it, and I definitely see the upside in Apple computers over PC’s.  But I’m going back to my Dell Precision690. (Already have actually)

Most of the “failings” of the Mac G5 Pro I was using can probably be attributed to the fact that it’s a G5.  So much software doesn’t work on the PowerPC’s, developers have given up on them.. (as is probably justified, they’re old)  and upgrading to a MacIntel would probably solve a few (but not all) of the problems I was having with compatibility.

A few points:


  • MS Entourage had significant issues.  I was forced to use the EWS (Exchange Web Services) client instead of the standard, because my exchange environment is Exchange 2010.  Maybe I jumped the gun in upgrading to Exchange2010.  Entourage 2008 doesn’t work with Excahnge2010, because Microsoft did away with WebDAV.
  • The MS RDP Client for Mac (v1.0 due to PPC Support) only supports a single session.  I usually have 3-4 RDP sessions open at a time, so this was a significant limitation.
  • TimeMachine doesn’t like to back up to a network drive.  I found a few workarounds but was never able to try and get it working.  I prefer to backup to an offsite location.
  • NTFS read-write support doesn’t exist in Leopard (10.5.8)  Though read-only support exists, if I can’t write to an NTFS formatted thumbdrive this is useless to me.  I’ve found some third-party drivers but they are both expensive and buggy.  I’m told this exists in SnowLeopard (10.6.x) but again, not willing to shell out that kind of money for a computer to do something I can do with windows.
  • Software is expensive…  The Version of Quickbooks that I paid $99 for on windows was $299 on Mac.  WTF is up with that?!


  • I love having a native BASH shell.  I do a *LOT* of scripting, and it’s nice to be able to do it hands on.
  • The GUI is very intuitive, I like the Dock (Akin to Cairo-Dock for Linux)
  • I enjoyed iPhoto – the face-recognition, while imperfect, was interesting to play with.
  • Application installations were easy, and almost NEVER required a reboot.
  • It’s mostly quiet.  I love a computer I don’t hear running.  Though the Precision is pretty quiet too.  And the Mac “Jet-Engines” when you put it under load whereas the Dell doesn’t.

– And finally:

  • The start-up chime the mac makes *REALLY* annoys my eldest son, who for some reason (couldn’t be his dad, could it?) HATES apple products.  I must have rebooted it ten times one night while he was in the other room playing BlackOps just to hear him complain.

Bottom line, I work with EMC products.  Much of the software I use in my work runs on Windows by virtue of the fact that EMC writes it that way.  (Why Symmwin hasn’t been ported to CentOS or some such yet is beyond me….would save the company MILLIONS every year in software licensing)

But it all comes down to cost.  The starting price of a new Mac Pro is $2499 (Source: Apple)  That’s for a ‘simple’ box with a quad-core processor.  The higher-end systems (12-core, 2x 6-core CPU’s) run $4,999.

Macs is more expensive.  As a side-note.  I walked into Micro-Center to buy memory for it.  The G5 uses standard DDR, PC3200 memory.  In the *SAME STORE* memory was two different prices, depending on whether you were in the Mac side or the PC side.  For PC’s the 1GB PC3200 memory was $29/ea.

On the Mac side, it was $59/ea.  What amazed me mostly was the fact that the guy behind the counter said that people would GLADLY pay the extra $30 for the exact same memory because it said “Mac Ready” on the label.  (It was even the same manufacturer)

Wow.  That’s all I can say about that.  Wow.  That’s abusive.  That’s taking advantage of people who don’t know any better.  Double?  Really Apple?  (Well this wasn’t apple, but it is the general problem.)

Let’s put this into perspective.  The Dell Precision 690 I have runs 2x Dual-Core 3.0Ghz Xeon CPU’s, 8G of ram, and it cost me less than $1,000 when bought seperately.  It’s a faster box, (Twice as many CPU cores, DDR2, PC5300 memory, etc)

Now I’m not the type to buy the latest and greatest.  I’ve never bought a “new” laptop in my life, (I prefer refurbs, especially since Dell sells them with the exact same warranty as new at half the price.) I drive a 6-year old Prius, my wife drive’s a 10-year old Chevy.  I have a modest house in the suburbs that’s slightly crooked but fits my needs, but isn’t flashy by any stretch.  And every piece of computer equipment I buy for the datacenter is second-hand.  (we just acquired a pair of Cisco 9140 Switches, how many generations back is that?)

To go out and buy a “NEW” Mac for those prices is completely INSANE.  Now I could probably buy one used on ebay.  (Apple people tend to upgrade often, so there are lots of them out there.)

So in my humble view – Macs are great personal computers, and wonderful graphics arts systems.  They *CAN* be used in business if you’re willing to make some sacrifices, but again, if you want stuff to just work, Windows is still the way to go for business.

I *MAY* consider a used MacBook Pro though.  I can see where the portable version would come in VERY handy, and you can get Intel-based MacBooks on Ebay (lease-returns) pretty cheap.  (I’m amazed Apple doesn’t have an outlet store like Dell does)

This concludes my latest experiment.

P.S. For Sale – Mac Pro G5 Tower.  Dual 2.5Ghz PPC, 8GB Ram, 2x 250G HardDisks, dual-port Video, Keyboard/Mouse (new).  MacOS 10.5.6 Leopard (Installed, no media)

Make me an offer.

Day-24 (Mac Experiment)

I told you I had no concept of days right?

Well I think it’s an “I can use this” thing.  The only downside I’ve found so far probably has more to do with my outdated hardware than anything else.

I’ve since upgraded the SIngle Processor 1.6Ghz G5 to a dual-processor 2.5Ghz G5.  The difference in performance is obviously grand, plus the dual 2.5 has 8 DIMM slots for memory instead of 4.

So now I’m up to 8G of Ram.

What I found most interesting is that to move from the old system to the new it was simply a matter of move the drives over.  I guess simplification and standardization of the hardware means that unlike windows/PC hosts, you never have to worry about whether or not the drivers are installed when you upgrade.

I also had a pretty good time with “TimeMachine”

It seems like it does a great “Grandfather-Father-Son” backup automatically, and without the user having to understand what a “GFS” backup is.  So you can restore to any hour in the last 24, day in the last month, or month in the last (however much disk-space you’ve got.)

What I liked is that nothing special was required to restore from disk.  Just the OSX boot CD.  Boot, select “restore from backup” and poof, or tah-dah, or whichever.  Windows7 has something fairly similar, but you have to build a recovery CD for it to work, probably because it has to store whatever raid-specific drivers you’re using.

All in all, a positive experience.  I may still go back to my Precision690 though…Dual Xeon 2.8Ghz processors and 8G of ram can run circles around the older G5 hardware.

I haven’t decided.


Day-5 (Mac Experiment)

Ok, I might be sold.

Though the outdated hardware has posed a few limitations, I’m not so worried about that.  I did just order a Dual 2.5Ghz G5 off Ebay for $300 because the one thing I *AM* driven nuts by is the pounding that this single 1.6Ghz PPC chip is completely incapable of taking.  I’m hoping that the new one has more than the 4 DIMM slots this one has…more memory couldn’t hurt. 🙂

I’ve obtained a copy of Mac:Office 2008, Photoshop, and a few other neat pieces to play with, but so far I’ve not dove completely into it.  (My laptop is still on my desk as well, just in case I should need it)  Entourage 2008 Web Service edition was added because of course, I’m running Exchange2010, and WebDAV has been removed after Exchange2k7.

The other thing I’ve noticed is the backup/restore process worked wonderfully.  I wiped the drives and built a new 1.8TB RaidSet on the new drives, which finally gave me a partition of appropriate size, and booted from the CD, Selecting “Restore from TimeMachine backup”

Impressively enough it took less than 2.5 hours to restore the OS and everything I had done on it to that point.  When it rebooted, there were no strange messages, though on opening the “Mail” app, I found that it had to go and re-import all of the mail that had already been downloaded.

Oh well, not THAT huge a deal.

I’ve done something similar with Windows7 recently, but it required a “RecoveryCD” be made before you could run the restore.

The best part of this new setup, by far.  Is access to BASH. I *HATE* that there doesn’t seem to be a decent CygWin shell anywhere on the market for windows.  I do a *LOT* of shell scripting both for work and because I find it fun, and this makes life very easy.

We interupt this experiment to bring you this special bulletin…

The government’s “Continuing Resolution” will be expiring a week from today.  As a government contractor, this directly affects me.

They have two choices.  They can pass ANOTHER C.R. or they can actually pass a budget.

I don’t post political statements here too often.  However I don’t know about you, but from where I stand this travesty that the House has floated is a disaster.  1.2 million jobs lost by the estimates I’m hearing, and to top it all of, it doesn’t do SQUAT to balance the budget because the places that need to be cut / reformed, IE Defense, etc. are off the table.  So this will be for nothing.

If the posturing peacocks on capitol hill don’t get their collective crap together and one side (or the other) forces the government to shut down. I may have some time on their hands.

Part of me is hoping that cooler heads prevail.

Part of me is looking forward to a little time off. 😉  I’m told it is actually a CRIMINAL offense for me to work if the government shuts down.

Bring it.

Day-4 (Mac Experiment)

So I’m trying a disk upgrade.  One of the spare 250G drives I threw into the Mac when I got it failed (Lower-bay, which I’m assuming was disk-2 of 2) so I decided I would try my hand at upgrading the disk.

I realized that the 250G drives i put in there won’t be enough to hold my pictures and music alone, let alone the rest of the stuff that I usually keep on my desktop.

So I found a 2TB drive to put in it’s place.

The Swap went swimmingly.  Shut down, pop the old drive out, pop the new drive in, took a little playing around (and a quick google search) to get the new disk added to the raid-set, and a few hours later, Tah-Dah.  😉

So then it comes to adding the other 2TB drive in and mirroring back.  Simple process right?

I shut down again, pulled the top drive out, put the new 2TB drive in it’s place, etc etc etc.


It seems that while the data is contained on both disks, the host is only set to look at the first disk for it’s boot device.  Oversight?

So I used the google, something I’m priding myself on my ability to do these days.  (What do people with only one computer do when they get into trouble?)  And found that I have to boot from the CD and manually mirror the disk back using the disk-utilities on the CDRom.

Makes sense.  Though you’d think it wouldn’t be too hard to, on failing to retrieve boot information from disk0, to look on disk1 before giving up.

Does anyone of my five readers know if there is there a way to force a boot to the “up” drive in the set?

I would gravitate to the storage end of things wouldn’t I…

So here’s the underlying problem.

Note the 1.8TB RaidSet1.  Now Note the 232.8GB RaidSet1.

The problem seems to be that while both devices are owned by the raid set, and the “RAID Slice” itself on each drive is 1.8TB, the partition on the underlying slice is stubbornly stuck at 232.8G.


This makes the filesystem *WAY* smaller than it could be.

Since I don’t have anything appreciable on it…  I’m going to back it up using Time Machine, and reinstall.  (I’ve heard complaints about time machine, so I’m going to want to see work it before I actually come to depend on it.)

*THAT* will be tomorrow’s post.

Day-3 (Mac Experiment)

My definition of “Day” changes…well…daily.

I’m having to do some shuffling of data off my old PC Workstation and I found something interesting.

MacOS can’t WRITE to an NTFS volume without a third-party driver.  It can read from it just fine.

I ran into a situation where I had to consolidate data from 2x 2TB drives onto one to free space for “The Next Step” (which you’ll read about tomorrow if you care) and so I connected them both to the Mac to hopefully do a quick ‘copy’ from one to the other.

Nope.  Not a chance.

MacOS can’t write to an NTFS volumes.  Now I found a few third-party drivers and tried one that had a 15 day trial version…  Only to find my way into what I can only assume is the Mac version of the “Blue Screen of Death”  (The “You must turn your computer off NOW” screen – how rude.)

What I find interesting about this is this.  Linux can read/write from an NTFS volume just fine.  Since MacOS is BSD Linux, I can only assume that Apple has made the conscious decision not to support NTFS writes.  Probably because it provides people with a simple migration path OFF Apple hardware, which, as any hardware vendor would like to believe, no-one would ever want to do.

Still – my experience is largely positive.  Got my work VPN up and running on it without too much angst and work.  And my son is sufficiently horrified at the sound of the power-on chord (there’s probably a fancy word for it) that I make up reasons to reboot when he’s in the room. 😉

I can see that the entertainment value of this investment will be limitless.