Intel CPU Issues and Crash Analysis

Jul 12, 2024

Lecture Notes: Intel CPU Issues and Crash Analysis

Overview

  • Speakers: Main speaker joined by Wendell
  • Topic: Stability issues with Intel 13900K, 14900K CPUs in gaming and server environments
  • Sponsor: NZXT C1500 Platinum power supply

Key Topics

Intel CPU Issues

  • Red Herrings: Initially thought to be VRAM limitations or BIOS configurations
  • Issue: Crashes with Intel 13900K, 14900K CPUs and variants
  • Common Belief: Issues with power profiles and microcode updates
  • Current Stance: Intel may not fully resolve with software updates alone
  • Timeframe: Ongoing issue for about 5 months

Analysis and Findings

  • Game Development Side: Game crashes linked to specific Intel CPUs
  • Data Source: Full crash database for two different games based on Unreal Engine
  • Common Errors: GPU running out of VRAM, CPU-related errors
  • Time Running: Servers crashing in a manner inconsistent with BIOS setup

Server Side Investigation

  • Server CPU Usage: High failure rate for Intel 13900K, 14900K in servers
  • Failure Rates: 50% of servers experienced some failure within 7 days
  • Data Sources: 250 systems across three providers, mix of Super Micro and Asus w680 motherboards
  • Failure Types: Various, inconsistent errors, sometimes CPU or memory related
  • Performance: Failures observed even under lower power TDP settings

Consumer Impact

  • Consumer Market: Less clear, but potential aging effect in consumer systems
  • Failure Rate: Possible higher long-term instability in consumer usage
  • Server vs. Consumer: More vocal reporting in servers; different handling in consumer markets

Solutions and Recommendations

  • DDR5 Memory Speeds: Lowered DDR5 speeds helped stabilize some systems
  • Possible Recalls: OEMs have replaced some CPUs from 13th to 14th gen
  • Competitive Comparison: AMD Ryzen CPUs (e.g., 7950X) might offer more stability
  • Actions for Gamers: Look for support from Intel and possibly replacements

Industry Response

  • OEMs (Dell, Lenovo, HP): Reporting significant CPU replacements
  • Intel's Acknowledgement: Some indication Intel knows of the issue
  • Further Steps: Intel needs to address gamers and step up replacements

Recommendations for Developers

  • Telemetry Analysis: Game companies should revisit crash telemetry
  • Adjust Bans: Some developers mistakenly banned players due to CPU-related client/server inconsistencies

Conclusion

  • Call to Intel: Recognize the issue publicly and provide replacements where necessary
  • Consumer Advice: Contact either presenter for more details if you experience issues
  • Further Research: Both presenters plan deeper dives into issue based on new credible tips

Check out Level One Tech's videos for more details on these topics.