Overview
This lecture covers the essential elements of a disaster recovery plan for IT support, focusing on minimizing downtime and data loss during emergencies.
Disaster Recovery Plan Overview
- A disaster recovery plan is a documented set of procedures for handling emergencies affecting IT operations.
- The plan includes actions to take before, during, and after a disaster to ensure business continuity.
- The primary goals are to minimize system downtime and prevent significant data loss.
- Disaster recovery plans address preventive, detection, and corrective (recovery) measures.
Preventive Measures
- Preventive measures are proactive steps taken to reduce the impact of potential disasters.
- Examples include regular system backups and redundant hardware components.
- Redundant power supplies fed from different sources (like battery backups) help ensure continuous operation during power failures.
Detection Measures
- Detection measures alert teams of disasters that could affect operations.
- Timely notifications are crucial because some disaster recovery steps are time-sensitive to prevent data loss or equipment damage.
- Systems should have alerts for events like power loss and environmental changes in server rooms (e.g., flood, temperature, smoke).
- Monitoring environmental conditions, such as using flood sensors and temperature monitors, helps prevent equipment failure.
Personnel and Building Considerations
- Disaster recovery planning must also account for personnel safety and ability to continue working.
- IT support works with building management on power, cooling systems, and evacuation procedures.
- Plans should allow employees to work from home if a disaster makes the workplace inaccessible.
Corrective or Recovery Measures
- Recovery measures involve restoring lost data from backups and rebuilding or reconfiguring damaged systems.
- Restoration begins after detection and initial prevention steps have been taken.
- Loss of a system in a redundant pair creates a single point of failure, increasing vulnerability until redundancy is restored.
Key Terms & Definitions
- Disaster Recovery Plan — Documented procedures for responding to and recovering from emergencies that impact IT operations.
- Preventive Measures — Actions taken before a disaster to reduce its potential impact.
- Detection Measures — Systems or procedures that alert to the occurrence of a disaster.
- Corrective/Recovery Measures — Steps taken after a disaster to restore normal operations.
- Single Point of Failure — Situation where one failure could bring down an entire system due to lost redundancy.
Action Items / Next Steps
- Review your organization's current disaster recovery plan and identify preventive, detection, and recovery measures.
- Ensure you know evacuation and remote work procedures in case of an emergency.