📊

DynamoDB Data Modeling Overview

Jul 29, 2024

Notes on DynamoDB Data Modeling

Introduction

  • DynamoDB is AWS's NoSQL database service.
  • Data modeling in DynamoDB differs from traditional relational database modeling.
  • Newcomers often try to simulate relational data modeling, leading to higher costs.
  • Proper design can reduce AWS bills and ensure millisecond latency at scale (1 GB to 10 TB of data).

What is Data Modeling?

  • Data modeling refers to how an application stores data related to real-world entities.
  • Two types of databases:
    • Relational Databases (SQL): e.g., MySQL, Oracle, Microsoft SQL Server
    • NoSQL Databases: Optimized for different use cases.

Relational Databases

  • Use data normalization to split data across multiple tables to reduce redundancy.
  • Performance degrades as the database scales due to complex queries and multiple joins.
  • Typically have a strict schema.

NoSQL Databases

  • Optimized for compute rather than storage (which is cheaper now).
  • Allow duplicates and minimize table joins, reducing compute power for data retrieval.
  • Provide flexible schemas that can accommodate diverse data structures.
  • Scale horizontally very well.

Five-Step Process for DynamoDB Data Modeling

  1. Draw an Entity Diagram: Identifies main entities for your application.
  2. Identify Relationships: Understand how entities are related (one-to-many, many-to-many).
  3. List Access Patterns: Identify CRUD operations and data retrieval needs.
  4. Decide on Primary Keys and Indexes: Choose effective primary keys that satisfy access patterns.
  5. Identify Secondary Indexes if Needed: Use Global Secondary Indexes (GSIs) or Local Secondary Indexes (LSIs) for additional access patterns.

Practical Example: Multi-Tenant Project Management Tool

  • Entities: Organization, Projects, Employees
  • Attributes:
    • Organization: ID, Name, Tier
    • Projects: ID, Name, Type (Agile/Fixed Bid), Status
    • Employees: ID, Name, Date of Birth, Email
  • Relationships:
    • One-to-many: Organization to Projects
    • One-to-many: Organization to Employees
    • Many-to-many: Employees to Projects (requires an additional entity: Project-Employees).

Access Patterns Example

  • Organization: CRUD operations, find projects/staff.
  • Projects: CRUD, filter by type/name/status.
  • Employees: CRUD, find projects associated.
  • Project-Employees: Allow querying for both employee and project relationships.

Choosing Primary Keys

  • Use composite keys (Partition Key + Sort Key).
  • Pattern:
    • Organization ID as Partition Key and Hash Metadata as Sort Key.
  • Ensure keys uniquely identify items to facilitate efficient queries.

Utilizing Indexes

  • GSI: Allows additional access patterns and flexible queries.
  • May require specific attributes for basic queries, like employee IDs or project types.

Sparse Indexing and Filtering

  • Sparse indexing using GSIs on specific attributes (e.g., "on-hold" projects) can optimize retrieval.
  • Take care not to use filter conditions if aiming for efficiency, as they can lead to read capacity consumption issues.

Implementing in AWS

  • Utilize AWS CLI and SDK for seamless integration with DynamoDB.
  • Sample CRUD operations and queries showcased through Node.js SDK examples.
  • Focused on how to model and query within the limits and capabilities of DynamoDB.

Conclusion

  • Effective data modeling in DynamoDB requires understanding application access patterns and entity relationships.
  • Utilize indexing smartly to minimize costs while optimizing performance.
  • DynamoDB’s flexible schema allows for efficient data handling if modeled correctly.