Strategies to Cut Data Infrastructure Costs

Aug 2, 2024

Reducing Overspending on Data Infrastructure

Introduction

  • Many companies are overspending on data infrastructure.
  • Smaller companies: tens of thousands; larger companies: hundreds of thousands to millions.
  • Poor optimization leads to excessive costs (possibly double what they should spend).
  • Example: A company saved $500,000 by fixing configurations on Snowflake.

Speaker Introduction

  • Ben Roan (Seattle Data Guy, now in Denver).
  • Background as a data engineering and infrastructure consultant, previously at Facebook and other startups.

The Importance of Cost Consideration

  • Cost should always be a factor in building solutions (bridges/software).
  • Early discussions with clients involve budget for infrastructure and monthly spend.
  • Trade-offs exist in choosing between open-source solutions and vendor solutions.

Common Areas of Overspending

  1. Costly Data Ingestion Solutions

    • Example: Client quoted $200,000 for data ingestion from a single database.
    • This cost could often be reduced by building simpler solutions.
    • Use of Estuary helped reduce costs by 80%.
  2. View on View on View Problem

    • Complex view systems lead to poor performance and high costs.
    • Dashboards taking too long (e.g., 10 minutes) often indicate inefficiencies.
    • Consider building permanent data models instead of real-time views to optimize costs.
  3. Real-Time Data Challenges

    • Real-time data is expensive in cloud environments (e.g., Snowflake charges per task run).
    • Batch data processing reduces costs significantly.
    • Example: Running 60 tasks live incurs costs for the entire hour, while batching them could cost only for one minute.
  4. Dashboarding Solutions

    • Companies often spend a lot on dashboard solutions pulling raw data.
    • Misguided advice from vendors can lead to unnecessary expenses (e.g., processing raw data in dashboards).
  5. Inefficient Data Models

    • Bad data models can cost companies significantly.
    • Importance of optimizing how data models are built (e.g., appending vs merging data).
    • Consider trade-offs in data modeling strategies (normalized vs denormalized).

Key Takeaways

  • Regularly assess and optimize data infrastructure costs.
  • Understand the trade-offs to make informed decisions.
  • Aim for a balance between performance needs and cost efficiency.
  • Importance of thorough groundwork when evaluating costs.

Conclusion

  • Goal to save companies over $1 million in data infrastructure costs.
  • Offer for quick consultations to help reduce costs.
  • Encouragement to reach out for assistance.