Lecture Notes on High Availability Storage Solutions

Jul 10, 2024

Lecture on Storage Solutions for High Availability and Reliability

Whonnock 10: The Ultimate High Availability Server

Introduction

  • Fast and reliable storage is crucial for video production teams.
  • Main editing server, Whonnock, has served well but a single minute of downtime now costs over $50 in payroll.
  • Solution: Redundancy.
    • Drives are redundant, but they all sit in one single server.
    • New server: Whonnock 10 (high availability).

Hardware Components

  • Supermicro Grand Twin A+ Server AS-2115GT-HNTR
    • 2U, containing four independent computers.
    • Independent motherboard, 384GB memory, AMD EPYC Genoa processor (64 cores), dual M.2 slots for boot drives, six PCIe Gen 5 NVMe slots.
    • 200 gigabit ConnectX 6 NICs for network redundancy.
    • 2200-watt power supplies (80 plus titanium).

High Availability Configuration

  • Redundant NVMe first file system (Weka).
    • Can sustain two entire server dropouts without interruption.
    • Moved entire team onto it for real-world test during workday.
  • Redundant switches for network connectivity.
    • Limitation: switch failure – ideally need two single port NICs.
    • Overall system designed to continue operation even if one NIC fails.

Software Configuration

  • Weka File System
    • System setup with compute, drive, and front-end cores for task management.
    • Dynamically assigns cores for different tasks (e.g., parity calculation, inter-cluster communication).
  • Potential to run additional services like Proxmox and high-availability Plex server.
  • Boot drives: Sabrent 512GB Gen 3 rocket drives (reliable enough for OS boot).
  • Storage drives: Kioxia CD6 Gen4 NVMe drives (future plan for 4x 15TB drives per node).

Deployment and Testing

  • Servers and network setup led to real-world testing to ensure seamless operation.
  • System handled unplugging of a server without impacting video editing work.
  • Analysis and AI integration for media asset management
    • Detects objects, faces, and scenes for searchability.

Key Points

  • Weka file system optimized for low latency and high throughput.
  • Nvidia SN3700 switch used for network connectivity.
  • Achieved 70GB/s read performance and 4 million read IOPS.
  • AI-driven media asset management from axel.ai provides powerful search capabilities.
  • Proxy generation to avoid huge latency when dealing with large original clips.

Summary

  • Whonnock 10 provides major improvements in high availability and redundancy for video editing workloads.
  • Redundant hardware and software setups minimize downtime.
  • Future potential for even more advanced configurations and AI-driven features for media management.

Conclusion

  • Overall significant upgrade from the single machine setup.
  • Provides a reliable, fast, and redundantly protected server environment for the video team.
  • Emphasis on real-world testing and practical benefits of high availability systems.