PostgreSQL Usage in Zerodha - Key Learnings and Practices
Introduction
- Speaker: Key member of Zerodha's tech team
- Purpose: Share experiences and learnings from using PostgreSQL in Zerodha
- Disclaimer: Learnings are specific to Zerodha's context and may not apply broadly
Personal Background
- Role: Part of Zerodha's tech team since its inception
- Experience: Extensive back-end management using various databases like MySQL, Redis, MongoDB, etc.
Zerodha Overview
- What is Zerodha: India's leading stockbroker, comparable to Robin Hood in the USA
- Services: Facilitates stock market trades and handles significant financial data
- Market Hours: 9:15 AM to 3:30 PM
- Platforms: Kite (trading platform), Console (back-office platform)
PostgreSQL in Zerodha
- Starting Point: Imported 100 MB data/day for 20,000 clients
- Current: Handles hundreds of GBs daily
- PostgreSQL Versions: Started with PostgreSQL 8, currently using PostgreSQL 13
Keys to Handling Data - Zerodha's Approach
- Indexing
- Use indexes judiciously to balance speed and storage
- Employ partial indexing to categorize search data efficiently
- Materialized Views and Denormalization
- Avoid complex joins by denormalizing data
- Use materialized views to speed up frequent queries
- Understanding Data
- Choose the database based on data type and query needs
- Avoid over-engineering by choosing the right tool for the job
- PostgreSQL Tuning
- Tune individual tables differently based on their use-cases
- Autovacuum: May not be ideal for all setups; consider manual vacuum
- Query Planning
- Deep understanding of query plans is crucial for optimization
- Use EXPLAIN to identify and fix slow queries
- Vacuuming
- Manual vacuuming rather than relying on autovacuum for specific data loads
Additional Strategies
- Sharding: Use Foreign Data Wrappers (FDW) to partition data across servers based on financial years
- Caching Layer: External caching using PostgreSQL as a secondary layer for storing query results
Error Management and Scalability
- Managing Errors: Backups stored in S3, manual restart in case of failure
- Scaling: Handle increased load efficiently by scaling hardware and using effective caching
Q&A Highlights
- Data Import: Nightly jobs for importing trading data to Console
- Caching: Predominantly uses Redis; PostgreSQL for specific needs like sorting, pagination
- Overall Architecture: Lean and efficient, focusing on asynchronous processing and minimizing load on primary databases
Conclusion
- Summary: PostgreSQL has been highly resilient and effective for Zerodha's needs
- Future Plans: Exploring other databases like ClickHouse for different requirements
Key Takeaway: Effective database management is context-specific; relying on firsthand experience, pragmatic solutions, and avoiding over-engineering while leveraging the strengths of PostgreSQL.