Notes on Lecture: Under the Hood Hybrid Tables
Introduction
Presenters:
- Christian: Engineer at Snowflake
- Responsible for architecture of Unistore and FDB
- David: Senior Engineer in Performance at Snowflake's Berlin office
Purpose:
- Discuss the architecture and construction of Unistore hybrid tables
- Explain the business case for Unistore
The Business Case for Unistore
Problems with Traditional Databases:
- Split brain problem: separation of OLTP and OLAP engines
- Issues: Security, data transfer costs, latency, governance
Unistore's Solution:
- Create a single store for both OLTP and OLAP workloads
Snowflake Architecture Overview
- Three-layer architecture: Cloud services, query processing, and storage tiers
- Cloud services: Orchestrates the database, metadata manager
- Query processing: Executes queries
- Storage: Manages data artifacts
Unistore Hybrid Tables
- Difference from traditional Snowflake tables: Hardware use
- Traditional tables use blob storage
- Unistore hybrid tables use direct-attached storage for low latency
Hardware Considerations
- Blob storage: High latency, suitable for large scans
- Direct-attached storage: Low latency, crucial for OLTP workloads
FoundationDB (FDB) in Hybrid Tables
- FDB as the bottom layer in hybrid table stack
- Open-source cluster-hosted key-value store
- Reliable (used by tier 1 data companies and internally at Snowflake)
- Availability due to multiple replicas across Availability Zones (AZs)
Enhancements and Properties of FDB
- Performance improvements
- Fast recovery and online updates
- Key-value store with microtransactions
Key Improvements for Hybrid Tables
Multi-tenancy
- Sandboxing to enforce usage quotas per tenant
- Prevents data leakage and over-utilization
Continuous Backup and Historical Queries
- Continuous backup to blob storage
- Enables clone databases and point-in-time queries
- Adaptive querying for different data requirements
Native Bulk Load
- Efficiently handles large input operations
- Uses the same logic as backup and restore
Control Plane Enhancements
- Elasticity in scaling clusters
- Rolling out updates without downtime
Quality of Service Improvements
- Reducing variance between best and worst client interactions
- Example: Transaction commit times
Optimizing Other Layers
Cloud Services Layer (Brain of Snowflake)
- Telemetry management: Reduce system CPU usage
- Log lines are essential but costly
- Implemented asynchronous logging
- FDB Metadata Store Optimization
- Local caches to reduce network latency for metadata access
Query Processing Layer (Muscle of Snowflake)
- Process reusability
- Transition to reusing processes to manage queries efficiently
- Improved the model to maintain isolation while improving performance
Hybrid Tables as Fantasy Football backend Example
- Use case: Apps with high concurrent access and low latency needs
- No separate backend needed
- Simplifies the deployment of such applications
Conclusion and Future Plans
- Goals for Unistore:
- Focus on price performance for competitiveness
- Performance and cost improvements ongoing
- Optimize customer workloads already running on hybrid tables
GA on AWS expected this year
Call to Action
- Encouragement to use hybrid tables and provide feedback
- Snowflake engineers continuously optimize workloads based on customer usage