Notes on DynamoDB Data Modeling

Introduction

DynamoDB is AWS's NoSQL database service.
Data modeling in DynamoDB differs from traditional relational database modeling.
Newcomers often try to simulate relational data modeling, leading to higher costs.
Proper design can reduce AWS bills and ensure millisecond latency at scale (1 GB to 10 TB of data).

Data modeling refers to how an application stores data related to real-world entities.
Two types of databases:
- Relational Databases (SQL): e.g., MySQL, Oracle, Microsoft SQL Server
- NoSQL Databases: Optimized for different use cases.

Use data normalization to split data across multiple tables to reduce redundancy.
Performance degrades as the database scales due to complex queries and multiple joins.
Typically have a strict schema.

Optimized for compute rather than storage (which is cheaper now).
Allow duplicates and minimize table joins, reducing compute power for data retrieval.
Provide flexible schemas that can accommodate diverse data structures.
Scale horizontally very well.

Draw an Entity Diagram: Identifies main entities for your application.
Identify Relationships: Understand how entities are related (one-to-many, many-to-many).
List Access Patterns: Identify CRUD operations and data retrieval needs.
Decide on Primary Keys and Indexes: Choose effective primary keys that satisfy access patterns.
Identify Secondary Indexes if Needed: Use Global Secondary Indexes (GSIs) or Local Secondary Indexes (LSIs) for additional access patterns.

Entities: Organization, Projects, Employees
Attributes:
- Organization: ID, Name, Tier
- Projects: ID, Name, Type (Agile/Fixed Bid), Status
- Employees: ID, Name, Date of Birth, Email
Relationships:
- One-to-many: Organization to Projects
- One-to-many: Organization to Employees
- Many-to-many: Employees to Projects (requires an additional entity: Project-Employees).

GSI: Allows additional access patterns and flexible queries.
May require specific attributes for basic queries, like employee IDs or project types.

Sparse indexing using GSIs on specific attributes (e.g., "on-hold" projects) can optimize retrieval.
Take care not to use filter conditions if aiming for efficiency, as they can lead to read capacity consumption issues.

Utilize AWS CLI and SDK for seamless integration with DynamoDB.
Sample CRUD operations and queries showcased through Node.js SDK examples.
Focused on how to model and query within the limits and capabilities of DynamoDB.

Effective data modeling in DynamoDB requires understanding application access patterns and entity relationships.
Utilize indexing smartly to minimize costs while optimizing performance.
DynamoDB’s flexible schema allows for efficient data handling if modeled correctly.