Disaster Recovery – Part 2

The power came back up around 1615 and once the batteries on the UPS were charged and the network guys said they were good, we were allowed to start bringing systems up.

First to come up was our fiber channel SAN infrastructure. This consists of two director series switches and five pair of controllers. Parallel to this we brought up the failover DNS/DHCP server. Side note: We have ninety percent of our campus covered with 802.11n and very short lease times so as to not exhaust our lease space. So the first thing the DHCP server did when it came up was spend about twenty minutes expiring the 14,000 previously active wireless leases.

Once the core infrastructure was up and running, it was time to bring up the virtualized infrastructure. Our virtual infrastructure consists of 18 hosts running around 500 virtual guests. So first I had to find the management server. Of course it would not be a disaster with just a power outage; The host configured to run all the various servers that manage the VMware environment was toast and would not boot.  Dell ended up replacing every component in the server to get it back up and running.

I spent the next 30 minutes tracking down which hosts the other management servers lived and powered them on. I ended up adding the management server to one of the functioning hosts to get it back up. Once the management servers were running we started brining up the other production systems. Of course it was now close to 2100 hours.

Leave a Comment

Filed under disaster, server, vmware

Leave a Reply

Your email address will not be published. Required fields are marked *