🛡️

Disaster Recovery Planning

Jun 20, 2025

Overview

This lecture covers the essential elements of a disaster recovery plan for IT support, focusing on minimizing downtime and data loss during emergencies.

Disaster Recovery Plan Overview

  • A disaster recovery plan is a documented set of procedures for handling emergencies affecting IT operations.
  • The plan includes actions to take before, during, and after a disaster to ensure business continuity.
  • The primary goals are to minimize system downtime and prevent significant data loss.
  • Disaster recovery plans address preventive, detection, and corrective (recovery) measures.

Preventive Measures

  • Preventive measures are proactive steps taken to reduce the impact of potential disasters.
  • Examples include regular system backups and redundant hardware components.
  • Redundant power supplies fed from different sources (like battery backups) help ensure continuous operation during power failures.

Detection Measures

  • Detection measures alert teams of disasters that could affect operations.
  • Timely notifications are crucial because some disaster recovery steps are time-sensitive to prevent data loss or equipment damage.
  • Systems should have alerts for events like power loss and environmental changes in server rooms (e.g., flood, temperature, smoke).
  • Monitoring environmental conditions, such as using flood sensors and temperature monitors, helps prevent equipment failure.

Personnel and Building Considerations

  • Disaster recovery planning must also account for personnel safety and ability to continue working.
  • IT support works with building management on power, cooling systems, and evacuation procedures.
  • Plans should allow employees to work from home if a disaster makes the workplace inaccessible.

Corrective or Recovery Measures

  • Recovery measures involve restoring lost data from backups and rebuilding or reconfiguring damaged systems.
  • Restoration begins after detection and initial prevention steps have been taken.
  • Loss of a system in a redundant pair creates a single point of failure, increasing vulnerability until redundancy is restored.

Key Terms & Definitions

  • Disaster Recovery Plan — Documented procedures for responding to and recovering from emergencies that impact IT operations.
  • Preventive Measures — Actions taken before a disaster to reduce its potential impact.
  • Detection Measures — Systems or procedures that alert to the occurrence of a disaster.
  • Corrective/Recovery Measures — Steps taken after a disaster to restore normal operations.
  • Single Point of Failure — Situation where one failure could bring down an entire system due to lost redundancy.

Action Items / Next Steps

  • Review your organization's current disaster recovery plan and identify preventive, detection, and recovery measures.
  • Ensure you know evacuation and remote work procedures in case of an emergency.