Recovering from a windows AD failure…

A couple of years ago my PDC died.  The only physical box in my environment and the one physical server died.

I was 2,700 miles away.  I wasn’t going to be back any time soon, and stuff was broken.  (Thankfully, customer data was on the Linux Webhosting environment, so nothing lost there, except their backups)

My setup involves 1 physical server, and about 14VM’s (on two physical hosts)  The physical server does a number of things.  In addition to being the PDC/Infrastructure Master, etc.  It holds my backups, gives me a plase to run consoles for various management agents…etc.

It died.  Rebooted after a power-failure in the hosted datacenter I was throwing good money away on. (don’t EVEN get me started)

Anyway, technical mumbo-jumbo.

Recovered the original DC as a domain member using the following steps:

1. On DC1, Remove network connection / boot host. *VERY* important…

2. On DC1, Force-removed secondary/tertiary active-directory servers. (DC3, DC4)

3. On DC1, run DCPROMO and removed Active Directory – (There were a couple of minor gotchas to do this – like an idiot I didn’t write them down, but they were easy fixes, easily googleable. (is too a word) This removes all AD membership and makes it a stand alone workstation.

4. Shut down DC1

5. On the new PDC (DC2) removed DC1 as an AD server.

6. On DC1, connect network, boot server.

7. Join DC1 to AD as a domain member.

The quest for 100% uptime…

Are you the type of IT shop that won’t take downtime?  I mean won’t take downtime to the point that there are EOSL applications running on EOSL hardware, redundancy gone because an HBA has failed and a replacement simply isn’t available, (or is and you won’t take the outage to replace it)

It got me to thinking about this quest for “100% uptime”  Is it possible?

In my experience downtime is absolutely required.  Not only is it required, it’s guaranteed.  It’s *GOING* to happen eventually.  Whether it happens on your schedule or on the universe’s schedule is the only thing you have any control over. (and even then, sometimes not)

I’ve found in my experience that Virtualized platforms, VMWare, HyperV, or whatever myriad of platforms are almost a requirement if your aim is to provide for minimal hardware-related outages.  It allows you to move a Guest host from one system to another, replace/upgrade hardware, move it back, etc.  It also allows for a certain level of storage virtualization, allowing the user to move from End-Of-Life storage, keeping things “fresh.”

But then there is the operating system.  To my knowledge, *ALL* Midrange OS’s require patching, upgrades, reboots.  All of them.

When you add to that, the fact that operating systems were written by human beings, and in large part by thousands of human beings, some of whom never talk to one another, well you get the idea.  Computers are an imperfect construction of imperfect beings.

So in short, because I know that TL;DR really is a thing these days.  Don’t promise anyone “100% uptime” because if you do, you’re a liar.  It simply can-not happen.

Privacy In The Clouds….

I’m not sure why this never got discussed before, but suddenly, with the “shocking” revelation that the government has been collecting data from the cloud in bulk, the concept of “Privacy” is on everone’s mind.

I’m telling you.  Anyone who thought their “Cloud” storage was secure from prying eyes has deluded themselves with visions of puppy-dogs and unicorns.

Personally I’m not worried about it.  I never expected anything I put in the cloud to be private anyway.

Bottom line, the internet wasn’t designed to be secure, it was designed to be redundant, transparent, resilient.  But when you send information out “to the cloud” you’re trusting your electronic information to equipment that other people own/control, and as such have no guarantee as to the security of your data.

I’ve gotten into my share of “discussions” on news message boards when Edward Snowden broke the news that the NSA was spying on Americans… (Duh)   When I tell people “I don’t care” and “I assumed it was anyway” I got lambasted.

So how *DO* you secure your data?

Endpoint Encryption

The only way to be reasonably sure that your data is secure in transit is to implement endpoint encryption.  Where you have an encryption device on the source, another on the target, and if you *REALLY* want to be secure, you’ve HAND CARRIED your encryption keys from Point-A to Point-B..  (Sending your private key over the email is, by definition, stupid.)

Then, you’re only at the mercy of Barracuda, Cisco, EMC, or whomever built your encryption appliance.  Here’s a thought though… Do you know there is no back-door to decrypt data?  How do you?  the code that runs on these appliances are proprietary, you don’t know ANYTHING about the internal code, and I’m sure none of the above will release the source-code to you for inspection, (nor do you have any reasonable assurance that the source code you’re shown is what’s compiled and running on your encryption appliance).  Again, it’s a matter of trust, but there is always the possibility.

Closed System

This is the only real chance for security.  A campus-wide, closed system, with no external connection to the internet, optical (as opposed to copper) connections between buildings, etc. Is the only REAL chance for security.  But is it worth it?

I had a colleague when I worked for the student loan company (thankfully defunct) that used to say that the best way to secure a system was to turn it off.  He probably wasn’t far from the truth.  When I took my Windows NT MCP certification course (dating myself huh?) my instructor told us that Windows NT was the most secure operating system on the planet, provided the computer wasn’t connected to a network.  (Then, presumably, all bets were off)

In Conclusion, as long as you know that, the more of your application/data you put in the “cloud” the more vulnerable you are to plundering, not to mention outages that are completely out of your control (right Amazon AWS?)

If you keep your data in-house, under your control, not only do you have a neck to choke when you’re system goes down, but you can be reasonably sure of it’s security.

(Unless you plug it into the internet – then all bets are off)


For sale:

Just wondering if there is any interest out there. I have three working storage arrays I’m looking for a new home for.

1 Clariion cx300
1 Clariion cx500
1 Celerra ns500

The ns500 can be split (I have the cables to split the back-end) but I want to sell it as one unit.

Everything works, though while I have cables, I don’t have SPS units for them.  (Those are pretty cheap/easy to find.

The cx500 and ns500 have 146g vault sets, the cx300 has 73g vault drives.

I have a bunch of 73g and 146g drives available to go with if you’re similarly interested.

Make me an offer, but expect on $150 to $200 in shipping charges.

Please know how to set them up. Im not signing on for lifetime support.

The “Public” Cloud…

It’s really easy to point-and-click yourself to a leased cloud or public cloud infrastructure.  Throw a credit-card # in and you can start working immediately.

But it has to come down to the real question.

Should you?

Putting your application “In The Cloud” whether it be Google’s new service, Amazon, or anyone who you rent a few hours of CPU time from can be the easiest way to start something big.

But what’s the downside?  Is there one?

Anyone who knows me knows, I’m a bit OCD.  I want to be in control.  When you put your application “in the cloud” you are putting your faith in someone you don’t know, in a building you’ve never seen, using hardware you have no idea about.  Most “Cloud” providers use hardware that barely qualifies as “Enterprise”

An example, the place I used to rent space from, (whom I will not mention right now because I don’t want a subpoena from them) used the CRAPPIEST supermicro servers, single power-supply, single-harddrive, no backup infrastructure, etc.

This was their cloud.  The *STATED* (Yes, they actually told me this) logic being that it’s cheaper to settle with the occasional customer over a failure than it is to buy real hardware.

Now I’m not, but any extension, saying that they are all like that.  I’m saying that if you haven’t seen the hardware, you have no idea where they’re putting you, and what the reliability rate is.  You only know what their sales-people / website tells you.

Anyone who believes marketing at face value needs to talk to me about this bridge I’ve got for sale in the everglades…

If you want to *KNOW* it’s done right, you have to do it yourself.  Bottom line.  Anything else is faith.

If you want the flexibility of a capital-C Cloud infrastructure, build one yourself.  VMWare, EMC, Cisco, or pick your brand.

Just don’t delude yourself about what you’re getting.  If you buy cheap, you get cheap.  It’s the law.



To say I’m overwhelmed is an understatement.  Hip deep in a major Tech-Refresh for a (non-EMC) vendor that is sucking my life dry, I sometimes forget that this blog is ever here. (As evidenced by the lack of activity)

On the blog/webhosting front things have been interesting.  CATBytes is hosting about 50 users or so.  Mostly informally, just bloggers and the like looking for a cheap place to park their wordpress sites.

I guess the part of it I forgot about was security.  I am *NOT* a big security wonk, and I’m learning this stuff as I go.  One of my users used a simple password and allowed their site to be hacked, and while that SHOULDN’T have been a big deal, it allowed some user to start sending out Denial-Of-Service attacks using one of my webservers.

For about a month.

And It didn’t occur to me because I wasn’t getting any complaints about bandwidth, speed, etc. (my equipment is good, my internet uplink is good, so it was hardly noticible.

Until the bill came.  See I pay $38/MB for a 10MB commit, but it’s a 100MBit pipe.  They don’t bill me the extra bandwidth so long as I don’t exceed my 10MB for more than like 5%.  And normally I don’t, by a long-shot.

Except for this month.  And since the hack managed to straddle two billing cycles, It double-hit me.

Now my provider “Neglected” to tell me about this overage until months later, stating that they had a glitch in their billing.  But going 90Mbit over my 10 for almost 30 solid days makes for a SEVEN THOUSAND DOLLAR bandwidth bill.

Crap.  So now I’ve rapidly taught myself how to limit bandwidth in VMWare (something I should have been doing the whole time) but I have a mad fight on my hands to try and get this provider to see that they’ll bankrupt me if they pursue this, and that won’t be good for either of us.

I hope they see logic.  Because if not, I have to explain to 50-75 bloggers why their sites are going down.  And I *WILL* name names.

The Great Conversion (Part 2)

CX3-20c take oneYes, it’s been a while.

Over 18 months ago, I started converting an old CX300 to a CX3-20c.

(Set the wayback machine and see The Great Conversion )

Shortly after I started this, I got wrapped up in work, got laid off, found new work in the wrong state, spent 12 months commuting from Virginia to Washington, etc. etc. etc.)

So i’ve been back for just over six weeks now, and I’m finally settled into the new job, So I *FINALLY got motivated to finish this project.


Needless to say, there wasn’t much to it.  Once I got the SP’s to realize I was booting off a DAE full of 2G drives, (the LCC link came up at 4G until I forced the issue by adding another, working DAE to the loop) everyth
ing went almost completely automatically.

Torquemada would be proud...

But, like a good techie, I followed the procedure to a letter, (as generated by the Clariion Procedure Generator) and at the end of the day (literally, less than 4 hours from when I started the second half of the migration) it’s done, and I have a CX3-20c.

I *ALSO* ended up with a CX3-10c through an EBAY fluke.  (I found one for $750 which was a bit much except that the DAE came with 15x 300G FC drives.  Had to have the drives, so bought the whole thing instead.  I *THOUGHT* the CX3 I had was also a CX3-10c, so was looking for an easy workaround.  (The only external difference is that the CX3-20c has 4 IP ports per SP, the CX3-10c has only two)

Now I have a potential target system if I ever need to replicate, because obviously one of the benefits of the ‘C’ version vs. the “F” version is the presence of ethernet storage ports.


So next week I start the process of moving over to the new storage….  When all is said and done, I’ll have a few nifty paperweights…


Anyone wanna buy:

1. A CX300

2. A CX500

3. An NS500 (captured or split back-end, your choice)

I also have a couple of Brocade DS16B2 switches floating around. (These were from before my Cisco upgrade last year)

Everything works, and yes, the NAS comes with a working control station.

On Storage and Security.

Hi there.  Remember me?  I’m your friendly (not) neighborhood (probably not) storage blogger (ok, well sometimes)

So the situation keeps coming up, and it’s worthy of a post, so here I am.  You’ve all (all six of you) probably gathered that I’m not here to promote anything.  I don’t play favorites, and I certainly don’t get any money from the stupid ads I placed on the sidebar… (why are those still there?)  Hell I’m well aware that I am sorely lacking as a writer even.

Tonight’s topic:

Storage Security

Why are companies locking the storage admins out of the hosts?

Why for the love of pete?  I have a customer where the storage-admin’s job stops at the connection to the server, for ‘security reasons’

It’s a useless endeavor, it doesn’t gain you ANYTHING as far as security goes, and in fact *WILL* end up costing you more than you would ever have dreamed of saving.

it also makes your storage admins feel untrusted and unappreciated… (but employers don’t care about that so much these days.)

So, in a nutshell, I will list the three common reasons for locking the storage people out of the server environments, and why doing so is a complete waste:


  • My computers have sensitive information on them that the storage people shouldn’t have access to.

If you trust someone to manage your storage environment you trust them with your data.  I can name two different ways off the top of my head that a storage admin could gain access to data without ever going NEAR the server, either physically or over the network.  And one of those would be COMPLETELY UNTRACEABLE.

Long story short, the storage admin has access to the data.  Just get used to that fact and stop making up ways to make their lives more difficult.

If you have doubts about the people you’re hiring, look at your hiring practices.

  • The storage admin could inadvertently crash the host.

Well gee.  Anyone with access to the power cord could do that.  Again, can think of at LEAST two different ways a storage admin could do that without even trying, and that happen on a daily basis.  (Remove device masking, remove zones)  Again – you’re fixing the chicken-coop with the fox inside.

Try trusting the people you hire to do what you pay them to do.

  • The storage admin doesn’t require access.

Well, this is kind of a generalization.  Many companies practice a “if your job doesn’t directly relate to the server you aren’t granted access to it.  If troubleshooting only extended to the point where the server connected to the SAN the above would be a true statement.  But as with most systems, there are inter-relationships that are crucial.  Multi-path software, HBA management software, Drivers/Firmware, *ALL* are a part of the storage environment.

And the bottom line is this:  Storage touches EVERYTHING.

If, like most sane companies, backup is included in the storage job, that’s 100% everything, otherwise there are SOME occasions where non-SAN attached hosts don’t require storage-admin access.

Troubleshooting in an environment where the storage admin’s access ends at the HBA connection can take HOURS longer than it would normally take, and requires at least twice the manpower.

Storage doesn’t stop at the physical layer.  Storage management software counts!

My scenario – Here’s why giving your storage administrator access to the servers *WILL* save you money.

It’s 4:15 on a friday afternoon. The dual-port PCI-e HBA you put into the server (to save money and slots which are tight in 1U servers) has failed. Not the port (which, granted, is infinitely more likely) but the chip itself. The SAN storage for the host is down.

As the storage admin, I got a page when the switch ports went dark. Assuming the storage environment is managed properly, I instantly know what host is experiencing the problem. (it’s also safe to assume that the host owner knows because his disks are MISSING)

Now as the storage admin, I’ve tested the connections, the switch ports and I’ve narrowed it down to an HBA issue. The host needs to be shut down (assuming it’s not Windows and blue-screened at the first sign of trouble)

Now if I have to coordinate the reboot, the installation of the new HBA’s, flash up-to-date firmware, pull WWPN’s, rezone, remask and reboot the host again, we’re talking about time. Maybe not much, maybe the host admin is on the ball, and maybe if you’re clever you can zone/mask before the initial boot, but you still need to flash firmware to stay within supportability and not risk further problems.

I’ve done this. If I’m doing it myself the system is back up by now, and the only thing i need the application owner to do is validate the app is functioning correctly.

If you don’t have access but are sitting in the same room with the person it’s still fairly simple but takes a little longer, though not much.

So let’s hope the failure happens during business hours—If it’s after hours, you’ve got two people driving in instead of one. Hours of downtime, total, that is, if you’re lucky enough to be able to get ahold of the host admin.

Now this came about because I had an outage happen. A VMWare lun disappeared and the owners of the “secure” vmware environment were nowhere to be found. (on what planet is it ok for an IT person to not respond to a page?)

Myself and the owners of the “unsecure” vmware environment sat around for a while twiddling our thumbs before the decision was made that the host owner wasn’t going to get back to us and the management decision was made to leave it for the night.

That’s a whole night this host will be down because the people who were there didn’t have the information needed to finish fixing the problem.

I’ve said it before, I’ll say it again.  If you don’t trust the people you hire, maybe who has access to what isn’t your primary problem.

Skynet is coming…


“Why process automation isn’t always a good thing.”

I’m *GOOD* at scripting and automation.  This is not to say I’m the best there is, I think I worked with him about 10 years ago in California…  I learned most of what I know about BASH from him.

There are some WONDERFUL uses for shell scripting.  Automating mind-searingly simple and repetitive processes.  Providing a little extra functionality and error-checking into a process or to augment the output of a program that probably could have been written better.

And yes, automating a complex task can save you HOURS in the long run.  (Though more often than not, you’re going to spend more time writing and debugging the script than you would have simply executing the commands manually.)

The best reason to do it is, for me at least, it’s fun.

Three Laws

  • A robot may not injure a human being or, through inaction, allow a human being to come to harm.
  • A robot must obey orders given it by human beings except where such orders would conflict with the First Law.
  • A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.

Ok, maybe fun isn’t the right word.  But when you can do with one simple command and a few arguments what would take you 14 different commands and a legal pad to do, it’s…an elegant solution to a convoluted problem.

The downside to process automation is that a script is only as good as the guy who wrote it, and in the absence of the guy who wrote it, it’s only as good as the guy running it.

When you write a script to do an advanced function, you can only plan ahead so far before something comes back to bite you.  Maybe you grep out the number 2140 and then a month later run into a situation where the output is going to include the number 52140.  (Hint always include surrounding spaces in an awk statement, or for beginning of line use “^” to ensure nothing precedes the value you’re looking for)

This is where it gets tricky, and where scripting isn’t always a good idea.

A script is never, EVER a replacement for actually knowing how to do the job.


“DARPA is apparently investing $20-million in a project to come up with bi-pedal combat robots that can be operated remotely or automatically.–Apparently the finest minds in the US Military have yet to learn ANYTHING from the finest minds in Hollywood.”

Here’s a real-world example:

I just wrapped up month 10 of a 12 month contract planning and implementing a datacenter move.  The move won’t be complete until LONG after I’m gone…  (It’s a slow, one at a time process)

So they commission a scripted move and I supply it for them.  Some beautiful scripts.  (In my opinion)

One set creates the SRDF pairing based on the source/target storage groups, choosing next available RDF Group, pairing up boot/luns, etc, and deletes them.

Another set to do the graceful SRDF/A Establish/Split, checking/changing modes, etc. (Made available to the end users)

Simple processes, right?

It is…To anyone who knows how to do it manually.

Here’s the catch – When we first started this process the proper disclaimers were made.  EMC will not support these scripts, they’re only temporary.  EMC’s official stance is they won’t support *ANY* scripts that don’t come out of the Cork, Ireland scripting team.  I don’t blame them.  They can’t be responsible for every Tom, Dick or…well…Jesse’s scripting, especially custom scripts that only apply to ONE environment.

Disclaimers were made to management, agreed to, settled.  Right?

Oops.  New management team comes in not even two months later.  One *MORE* month goes by and the Senior EMC guy leaves for another opportunity.

The new Management are very good people and seem to have a good understanding of what’s what, and all is fine.

Until I’m having a conversation with the new boss tonight and remind him that these scripts aren’t supported by EMC and as him if there is anyone on the Linux team I can sit down with and cross train to take over the scripts?

Blank Stare.  Um… not only isn’t there anyone *ON* the linux team…there really isn’t a “Linux Team” per se.  (I figured that despite it being primarily a windows shop there had to be SOMEONE there who knew Linux.  (There was, see the part about the Senior EMC guy leaving…)

So I’m hip deep in editing my scripts.  I shall comment EVERY. SINGLE. LINE. so that if something breaks, they at least have a fighting chance.

Scripting is a wonderful thing when the scripts are used to automate processes using knowledge that your storage team already has.  When it’s used to REPLACE knowledge your storage team should have…well that’s a problem.

Don’t automate a process you don’t know how to do from memory.

I am not a PC…

The new member of the family…

So here it is.  I bought a MacBook.  After literally 10+ years of being a “Dell Guy”, well Dell finally ran out of laptops that I found interesting, (you should *NOT* hear the display creak when moving it during a video call)

My last notebook, the Adamo13 (which doesn’t exist anymore) was one of my favorites.  Ultra slim, solid state just about everything.  Could do a 5 hour plane-ride almost without issue.

But I needed something else.  After flipping back and forth between Linux and Windows I realized I needed something that could go both ways.  The more I thought about it, Apple seemed like the way to go.  Apple runs on a BSD Linux kernel after all, has a linux command-line (if you know where to get to it) and pretty good compatibility.

So when I finally got it in my head to upgrade, well I went ahead and dropped the hammer on a 15″ macbook pro.  (so to speak, no actual hammers were involved.)

So far I’m pretty happy with my choice.  But when the first person at work saw me on it and asked me the idiot question I got pissy.

“Are you a Mac now?”

Under breath: “No idiot, I’m a person.  I’m *USING* a Mac.”

Let me break it down.  I have in my arsenal the following systems.

In my household and business I have:

3 Desktop PC’s running windows 7
3 Laptops running Windows 7
1 Dell 1850 running Windows 2003 Server . (That despite all my kajoling, refuses to survive a P2V)

4 VMWare ESXi hosts containing the following:

11 Windows 2008 Servers
2 Windows 2003 Servers
10 CentOS 5 Servers
5 CentOS 6 Servers
2 SUSE Enterprise 11 Linux

and now

1 15″ MacBook Pro

This is the thing.  I’m a technology pragmatist.  I use what works best and does what I need it to.  In the limited scope of a transportable computer, a Mac seems to do what I need nicely, and yes, it comes in an attractive and (so far) fairly durable package.

But I’m not a Mac.  Nor am I a PC.  I’m a *PERSON* who uses a computer.  (Several actually)

Religion has no place in technology.  Leave it in the church.

Oh, and I’m still not buying a #$!@!? iPhone.