📊

PostgreSQL Usage in Zerodha - Key Learnings and Practices

Jul 16, 2024

PostgreSQL Usage in Zerodha - Key Learnings and Practices

Introduction

  • Speaker: Key member of Zerodha's tech team
  • Purpose: Share experiences and learnings from using PostgreSQL in Zerodha
  • Disclaimer: Learnings are specific to Zerodha's context and may not apply broadly

Personal Background

  • Role: Part of Zerodha's tech team since its inception
  • Experience: Extensive back-end management using various databases like MySQL, Redis, MongoDB, etc.

Zerodha Overview

  • What is Zerodha: India's leading stockbroker, comparable to Robin Hood in the USA
  • Services: Facilitates stock market trades and handles significant financial data
  • Market Hours: 9:15 AM to 3:30 PM
  • Platforms: Kite (trading platform), Console (back-office platform)

PostgreSQL in Zerodha

  • Starting Point: Imported 100 MB data/day for 20,000 clients
  • Current: Handles hundreds of GBs daily
  • PostgreSQL Versions: Started with PostgreSQL 8, currently using PostgreSQL 13

Keys to Handling Data - Zerodha's Approach

  1. Indexing
    • Use indexes judiciously to balance speed and storage
    • Employ partial indexing to categorize search data efficiently
  2. Materialized Views and Denormalization
    • Avoid complex joins by denormalizing data
    • Use materialized views to speed up frequent queries
  3. Understanding Data
    • Choose the database based on data type and query needs
    • Avoid over-engineering by choosing the right tool for the job
  4. PostgreSQL Tuning
    • Tune individual tables differently based on their use-cases
    • Autovacuum: May not be ideal for all setups; consider manual vacuum
  5. Query Planning
    • Deep understanding of query plans is crucial for optimization
    • Use EXPLAIN to identify and fix slow queries
  6. Vacuuming
    • Manual vacuuming rather than relying on autovacuum for specific data loads

Additional Strategies

  • Sharding: Use Foreign Data Wrappers (FDW) to partition data across servers based on financial years
  • Caching Layer: External caching using PostgreSQL as a secondary layer for storing query results

Error Management and Scalability

  • Managing Errors: Backups stored in S3, manual restart in case of failure
  • Scaling: Handle increased load efficiently by scaling hardware and using effective caching

Q&A Highlights

  • Data Import: Nightly jobs for importing trading data to Console
  • Caching: Predominantly uses Redis; PostgreSQL for specific needs like sorting, pagination
  • Overall Architecture: Lean and efficient, focusing on asynchronous processing and minimizing load on primary databases

Conclusion

  • Summary: PostgreSQL has been highly resilient and effective for Zerodha's needs
  • Future Plans: Exploring other databases like ClickHouse for different requirements

Key Takeaway: Effective database management is context-specific; relying on firsthand experience, pragmatic solutions, and avoiding over-engineering while leveraging the strengths of PostgreSQL.