Preparing for the Moment of Disaster
Monitoring, knowing what to monitor, and knowing whom to tell, still do not provide the whole solution for proper Network Monitoring; you must also know what to do when monitoring discovers problems.
Implement Automatic Actions: “Self-healing”
One of the basic preparations for emergencies is to have automatic self-repairing actions in place. For example, configuring a server to reboot automatically is in many cases the fastest way to get it back online.
Implement Notifications: “Alarms”
Implementing instant notifications streamlines the process in such a way that, the moment there is a problem with your server, the people needed to help resolve the problem receive an email, SMS, or an instant message, informing them about the problem. Those responsible can then take the necessary measures to ensure the issue is resolved.
Prepare and Test Disaster Recovery Plans
Simply preparing contingency plans for emergencies is half the battle; you also need to test their effectiveness. For example, if your plan includes moving customer traffic to a backup server, you need to test whether it will be able to handle the extra load.
Consider Load Balancing and Hot Standby Redundancy for Mission-Critical Systems
Having a stand-by for mission critical systems is very important. In case of an emergency, such as server crash, you can simply redirect your traffic to the stand-by system. For example, our company runs a full, nightly updated copy of our main website www.paessler.com (already running on a load balanced dual server setup located in the U.S.) on a second, dedicated server 24/7 (located in Europe). In case of any problem with the first server, we simply change the DNS entry to move all traffic to the backup system. If you require even higher availability or if your website is transaction-based (such an auction website), using load balancers to automatically move traffic to another machine in case of failures is the right way to go.