"No campaign plan survives first contact with the enemy" - Carl von Clausewitz

I attended the Lansing Information Systems Security Association Meeting yesterday and took a few notes that I thought others might be interested in. The presentation was by Amerisure Insurance (http://www.amerisure.com/) on how they handled an unexpected system crash event on February 22, 2012 during 9am hour for their ~400 servers (I think they said 80% or so were virtual) and network equipment.

The back story is that Amerisure IT department had prepared for unexpected events in the past on a quarterly basis (I believe I recorded that correctly), even traveling to an off-site backup location in NY for a weekend to bring up the systems for testing their plans and procedures. They had built a fancy "Command Center" (CC) in the Michigan data center with large tables, Internet connections, computers, world clocks, etc.

There had been some problems with the Michigan data center Uninterrupted Power Supply (UPS) previously. The UPS vendor sent their top technician who was trying to diagnosis the problem and, in the process, accidentally hit the primary circuit breaker that took down the data center power in the process. 

During this event they found multiple gaps in their processes to handle unexpected events, such as:

Some of the resolutions they've implemented to prevent these problems are:

In conclusion, the IT team felt that the situation was a disaster since it took almost 48 hours to restore all of the services and test for data loss and perform recovery. In contrast, the rest of the company thought it was pretty extraordinary that they recovered things so quickly, so they (IT) learned about how their perception of the situation made it worse. Humorously, as details leaked out to the rest of the company about what had occurred the story became that the UPS driver was delivering a package to IT and tripped on a power cord as he cut through the data center. 



Troy Murray
Michigan State University
College of Medicine
Life Science
1355 Bogue St, B-136D
East Lansing, MI 48824
P: 517-432-2760
F: 517-355-7254
RedHat 5 Certified Technician
RedHat 5 Certified Systems Administrator
HL7 V2.6/2.5 Certified Control Specialist