T-SQL Tuesday: Disasters don’t just come in huge

imageSo we’re supposed to talk about disasters we’ve had or discuss disaster recovery technologies. I’m going to take a slightly different approach…

<soapbox>

I’m in the fortunate position of living in a region of the world that’s relatively free of natural disasters. We’re reasonably geologically stable, the nearest fault lines are the Great Rift Valley and somewhere in the Antarctic ocean. We don’t get tornadoes, we’re a long way from the ocean (and Madagascar partially protects the coast from typhoons and tsunamis)

Given that, and looking at the recent events in Japan and Southern USA, local IT managers might be grateful that their critical infrastructure is here, not there. But that is no reason for complacency, no reason to ignore putting a disaster recovery plan in place.

Major huge national disasters, while they attract a whole lot of attention (and rightly so) are probably not the main cause of IT disasters. IT disasters, for the most part, are likely to be caused more by smaller events like these1

  • A drive in the RAID 5 array fails, and the SAN admin switches out the wrong drive.
  • A SAN controller fails and overwrites a LUN or 2 with binary garbage.
  • The server room floor collapses dumping the SAN 2 floors down into a garage, with the server on top of it.
  • A water leak a floor above the server room results in the UPS getting a shower, and the resultant power surge fries the servers’ main boards
  • A developer with far too many permissions truncates an important table on the production server, thinking he was working on the development environment.
  • The mains power fails but the generators don’t come online because their fuel was drained a day earlier in preparation for maintenance.

Events like those (or probably even more mundane events) are the ones that we need to plan for. Relatively minor disaster that can leave business without critical infrastructure or critical data for hours or days.

You need to plan for the small disasters as well as the large ones. Plan for the dropped table. Plan for two drives failing in the RAID 5 array. Plan for the server’s power supply failing. Plan for the big disasters too, just don’t think that they’re the only thing that endangers your data and your business.

(1) I’ve personally seen 3 of those events happen, I’ve heard from people who have seen two more and I know of a company that’s at risk of one. They’re not just made-up improbably occurrences.

8 Comments

  1. Steve Jones

    I’d like to think that great minds think alike since that would put me in your company. My post was about similar issues.

    Reply
  2. Thomas Rushton

    Nice to see that I’m not the only #tsql2sday blogger talking about the little stuff rather than the big…

    Reply
  3. Rich Yarger

    Coming from the little guy’s point-of-view, it is nearly impossible to get those decision makers to make the right moves, at times, and just like the boy who cries wolf – there’s no one around to help them out once one of these unplanned for outages takes place. Then and only then do they finally listen (but yet it seems like one of the little guys, who coincidentally always trying to get them to make the needed moves to avoid the outages, is who winds up being blamed for it all).

    Great article as always Gail. Thank you for never being afraid to speak the truth! We need those decision makers to know that this is not some pie in the sky conversation.

    Reply
  4. Gail (Post author)

    I’ve had and lost that conversation at a recent client. They’re all “It can’t happen to us.” Sometimes we just have to make them aware of the risks, what the likely cost will be if something happens and then live with the results (and the documentation of the decisions taken)

    Reply
  5. Dave

    How about a variation on one of your points: The power company (*cough*Escom*cough*) implements rolling black-outs to conserve electricity, and the generator fails because someone siphoned (i.e. stole) all the diesel. 😉

    Reply
  6. Gail (Post author)

    I could see that happening.

    In the case that I saw, it was during the load shedding, which is why the diesel had been drained early, they had to be finished the maintenance and refuel before the next scheduled black out. We just got hit with an unscheduled blackout as well.

    Reply
  7. Pingback: T-SQL Tuesday #19 Wrapup | Allen Kinsel - SQL DBA

  8. Kevin Davis

    Let’s not leave out the business users changing the wrong data and waiting 24 hours to ask that you recover back to when the data was changed and keep all the good data from the current days work. Talking about miracles!

    Reply

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.