Cloudflare Incident: February 18, 2017

Jul 7, 2024

Cloudflare Incident: February 18, 2017

Background

  • Location: Cloudflare HQ
  • Date: February 18, 2017 (Friday afternoon)
  • Event: Severe system issue discovered

Timeline of Events

Initial Discovery

  • Time: 4:11 PM PST
  • Discoverer: Google Project Zero team
  • Issue: Severe data leak in Cloudflare's system
  • Initial Contact: Made within minutes of discovery
  • Report Details: Suggestion of a widespread data leak

Cloudflare Response

  • 4:32 PM PST: Cloudflare receives alarming details of the report
  • Primary Product: CDN (Content Delivery Network)
  • Function: Delivers internet content from multiple edge servers
  • Problem: Returned sensitive data (cookies, keys, customer data)

Incident Impact

  • Data Compromised: Full HTTPS requests, IP addresses, responses, passwords
  • Affected Areas: Google's cache could have indexed leaked data

Immediate Actions

Identification of Correlation

  • Feature Suspected: Email obfuscation feature
  • Recent Deployment: Partial migration to a new HTML parser
  • Initial Measures: Global kill for email obfuscation by 5:22 PST

Further Debugging

  • Identified Problematic Features: Auto HTTP rewrites, server-side excludes
  • Actions Taken:
    • Immediate shut down of auto HTTP rewrites
    • Development of patch for server-side excludes

Global Deployment

  • 9:24 PM PST: Engineers working on the global kill for server-side excludes
  • 11:22 PM PST: Patch deployed worldwide
  • Key Remaining Task: Purging cached data from search engines

Root Cause Analysis

Common Denominator

  • Features Affected: Email obfuscation, server-side excludes, auto HTTP rewrites
  • Commonality: Parsing and modifying HTML content
  • HTML Parser: cf-html (new parser)

Detailed Analysis

  • Old Parser: Ragel parser
  • Issue Trigger: Changes in buffer handling by new parser
  • Error Mechanism:
    • Failure to match due to unfinished attributes at buffer end
    • Running the parser again led to undefined memory parsing

Key Observations

  • Buffer Handling: Old parser received an extra dummy buffer with no content
  • New Parser: No empty last buffer, leading to exposed overrun
  • Mixed Parser Usage: Issue arose from combined use of old and new parsers

Lessons Learned

Backwards Compatibility

  • Challenges: Maintaining compatibility with legacy systems
  • Overlooked Details: Small changes causing significant breakdowns

Mitigation Strategies

  • Fuzzing Generated Code: Search for pointer overruns
  • Memory Management Techniques: Reduce impact
  • Best Practices: Avoid modifying compiled code directly

Closing Comments

  • Fix Implementation: Pointer checks and re-enablement of features
  • Collaboration: Worked with search engines to purge caches
  • Impact Analysis: Small overall impact with no evidence of leveraged attacks