Introduction to Cassandra

Jul 3, 2024

Introduction to Cassandra

Overview

  • Cassandra: A distributed NoSQL database known for high availability and scalability.
  • Key Topics Covered: Features of Cassandra, Read/Write Path, Data Modeling, Use Cases.

Features of Cassandra

  • Distributed Database: Data is replicated across multiple machines.
  • Always Available: Works even if one or more machines fail.
  • Eventual Consistency: Tunable to strong consistency if needed.
  • Fast Writes: Suitable for write-intensive applications.
  • Leaderless Architecture: No single point of failure; all nodes can handle writes.
  • Peer-to-Peer: More resilient than master-slave setups in relational databases.

Distributed Nature and Availability

  • Data Replication: Same data resides on multiple nodes for redundancy.
  • High Availability: As long as one replica node is available, queries can be fulfilled.
  • Resilience: Any node can take over the role of a failed node.

Storage and Data Structure

  • Partitions: The most atomic unit in Cassandra; akin to files within a system.
  • Designing Partitions:
    • Related data should be in the same partition.
    • Smaller partitions lead to better query performance.
  • Schema Design:
    • Needs to be carefully planned with query patterns in mind to maintain performance.
  • Example Schema:
    • Single Field Primary Key: Defines one field as the partition key.
    • Composite Partition Key: Includes multiple fields.

Write Path Efficiency

  • Memory-Based Writes: Data first goes to a commit log and memtable (both in memory).
    • Commit Log: A pen-only log.
    • Memtable: A data structure in memory.
  • Speed: Writes are acknowledged once data is in memory, not waiting for disk storage.
  • Disk Storage: Data is later flushed from memory to disk (ssTables) periodically.

Use Cases

  • High Write Throughput: Ideal for applications requiring frequent and fast writes.
  • Internet of Things (IoT): Suitable for high-frequency sensor data.
  • Web Activity Tracking: Efficient for logging user interactions on busy websites.
  • Predictable Read Patterns: Highly efficient if queries are defined ahead of time.
  • Health Data Applications: Fast write performance supports constant health monitoring data.

Conclusion

  • Cassandra excels in resilience, availability, and write performance.
  • Ideal for applications with high write requirements and predictable read patterns.