Coconote
AI notes
AI voice & video notes
Export note
Try for free
Introduction to Cassandra
Jul 3, 2024
Introduction to Cassandra
Overview
Cassandra
: A distributed NoSQL database known for high availability and scalability.
Key Topics Covered: Features of Cassandra, Read/Write Path, Data Modeling, Use Cases.
Features of Cassandra
Distributed Database
: Data is replicated across multiple machines.
Always Available
: Works even if one or more machines fail.
Eventual Consistency
: Tunable to strong consistency if needed.
Fast Writes
: Suitable for write-intensive applications.
Leaderless Architecture
: No single point of failure; all nodes can handle writes.
Peer-to-Peer
: More resilient than master-slave setups in relational databases.
Distributed Nature and Availability
Data Replication
: Same data resides on multiple nodes for redundancy.
High Availability
: As long as one replica node is available, queries can be fulfilled.
Resilience
: Any node can take over the role of a failed node.
Storage and Data Structure
Partitions
: The most atomic unit in Cassandra; akin to files within a system.
Designing Partitions
:
Related data should be in the same partition.
Smaller partitions lead to better query performance.
Schema Design
:
Needs to be carefully planned with query patterns in mind to maintain performance.
Example Schema
:
Single Field Primary Key
: Defines one field as the partition key.
Composite Partition Key
: Includes multiple fields.
Write Path Efficiency
Memory-Based Writes
: Data first goes to a commit log and memtable (both in memory).
Commit Log
: A pen-only log.
Memtable
: A data structure in memory.
Speed
: Writes are acknowledged once data is in memory, not waiting for disk storage.
Disk Storage
: Data is later flushed from memory to disk (ssTables) periodically.
Use Cases
High Write Throughput
: Ideal for applications requiring frequent and fast writes.
Internet of Things (IoT)
: Suitable for high-frequency sensor data.
Web Activity Tracking
: Efficient for logging user interactions on busy websites.
Predictable Read Patterns
: Highly efficient if queries are defined ahead of time.
Health Data Applications
: Fast write performance supports constant health monitoring data.
Conclusion
Cassandra excels in resilience, availability, and write performance.
Ideal for applications with high write requirements and predictable read patterns.
📄
Full transcript