Under the Hood Hybrid Tables

Jul 2, 2024

Notes on Lecture: Under the Hood Hybrid Tables

Introduction

Presenters:

  • Christian: Engineer at Snowflake
    • Responsible for architecture of Unistore and FDB
  • David: Senior Engineer in Performance at Snowflake's Berlin office

Purpose:

  • Discuss the architecture and construction of Unistore hybrid tables
  • Explain the business case for Unistore

The Business Case for Unistore

Problems with Traditional Databases:

  • Split brain problem: separation of OLTP and OLAP engines
  • Issues: Security, data transfer costs, latency, governance

Unistore's Solution:

  • Create a single store for both OLTP and OLAP workloads

Snowflake Architecture Overview

  • Three-layer architecture: Cloud services, query processing, and storage tiers
    • Cloud services: Orchestrates the database, metadata manager
    • Query processing: Executes queries
    • Storage: Manages data artifacts

Unistore Hybrid Tables

  • Difference from traditional Snowflake tables: Hardware use
    • Traditional tables use blob storage
    • Unistore hybrid tables use direct-attached storage for low latency

Hardware Considerations

  • Blob storage: High latency, suitable for large scans
  • Direct-attached storage: Low latency, crucial for OLTP workloads

FoundationDB (FDB) in Hybrid Tables

  • FDB as the bottom layer in hybrid table stack
    • Open-source cluster-hosted key-value store
    • Reliable (used by tier 1 data companies and internally at Snowflake)
    • Availability due to multiple replicas across Availability Zones (AZs)

Enhancements and Properties of FDB

  • Performance improvements
  • Fast recovery and online updates
  • Key-value store with microtransactions

Key Improvements for Hybrid Tables

Multi-tenancy

  • Sandboxing to enforce usage quotas per tenant
  • Prevents data leakage and over-utilization

Continuous Backup and Historical Queries

  • Continuous backup to blob storage
  • Enables clone databases and point-in-time queries
  • Adaptive querying for different data requirements

Native Bulk Load

  • Efficiently handles large input operations
  • Uses the same logic as backup and restore

Control Plane Enhancements

  • Elasticity in scaling clusters
  • Rolling out updates without downtime

Quality of Service Improvements

  • Reducing variance between best and worst client interactions
  • Example: Transaction commit times

Optimizing Other Layers

Cloud Services Layer (Brain of Snowflake)

  • Telemetry management: Reduce system CPU usage
    • Log lines are essential but costly
    • Implemented asynchronous logging
  • FDB Metadata Store Optimization
    • Local caches to reduce network latency for metadata access

Query Processing Layer (Muscle of Snowflake)

  • Process reusability
    • Transition to reusing processes to manage queries efficiently
    • Improved the model to maintain isolation while improving performance

Hybrid Tables as Fantasy Football backend Example

  • Use case: Apps with high concurrent access and low latency needs
    • No separate backend needed
    • Simplifies the deployment of such applications

Conclusion and Future Plans

  • Goals for Unistore:
    • Focus on price performance for competitiveness
    • Performance and cost improvements ongoing
    • Optimize customer workloads already running on hybrid tables

GA on AWS expected this year

Call to Action

  • Encouragement to use hybrid tables and provide feedback
  • Snowflake engineers continuously optimize workloads based on customer usage