Overview
Data warehouse design requires thoughtful architecture beyond simply consolidating data. The two most common frameworks are the Inmon and Kimball models, each with distinct approaches and trade-offs.
Inmon Model (Enterprise Data Warehouse)
The Inmon model, created by Bill Inmon, uses a top-down approach centered on an Enterprise Data Warehouse (EDW).
Core Structure:
- Central repository integrating data from all organizational operational systems
- Data stored in normalized form; EDW not queried directly
- Data feeds into departmental data marts filtered for specific needs
- Applications connect to departmental data marts rather than the EDW
Advantages:
- Single source of truth with standardized definitions agreed upon upfront
- Easier modeling since sources remain normalized throughout
- Supports any analysis by containing all organizational data
- Low redundancy due to normalized storage; minimal conflicting values
Disadvantages:
- Normalized form requires more joins and depths, resulting in slower query performance
- Slow startup time with significant upfront work required
- Combining data across departmental data marts can be difficult
- Top-down approach can be time-consuming for large, complex organizations
Kimball Model (Dimensional Data Warehouse)
The Kimball model, popularized by Ralph Kimball, uses a bottom-up approach with denormalized dimensional modeling.
Core Structure:
- Data denormalized into flat or dimensional models (star schemas)
- Subject-specific data marts created first with measurable processes and attributes
- Multiple data marts connect via shared attributes to form the warehouse
- Ground-up approach: build one process at a time
Advantages:
- Rapid deployment of critical business processes for quick reporting
- Denormalized model makes data easy to consume for users
- Enables quick reporting and self-service capabilities
- Faster user adoption through incremental, visible progress
Disadvantages:
- Time-consuming ETL process to denormalize data into dimensional format
- Creates duplicate data across multiple data marts
- Dilutes single source of truth with potential conflicting data
- Requires ongoing development as new processes are introduced
Model Comparison
| Aspect | Inmon Model | Kimball Model |
|---|
| Approach | Top-down, enterprise-wide | Bottom-up, incremental |
| Data Storage | Normalized | Denormalized (star schema) |
| Initial Setup | Slow, high upfront work | Fast, iterative deployment |
| Query Performance | Slower (more joins required) | Faster (fewer joins) |
| Data Redundancy | Low | Higher (duplicate data) |
| Ongoing Work | Less after initial setup | Continuous development |
| User Accessibility | Requires data marts | Direct, easier consumption |
| Single Source of Truth | Strong | Potentially diluted |
Key Terms & Definitions
- Enterprise Data Warehouse (EDW): Central repository with standardized, integrated data from all operational systems
- Data Mart: Subset of data filtered for specific departmental or functional needs
- Normalized Form: Data organized to reduce redundancy through multiple related tables
- Dimensional Model/Star Schema: Denormalized structure with measurable facts connected to descriptive attributes
- Top-Down Approach: Starting with comprehensive enterprise view, then creating specific views
- Bottom-Up Approach: Building individual data marts first, then connecting them
Action Items / Next Steps
- Assess organizational environment and existing data structures before choosing architecture
- Consider hybrid approach combining both models' strengths
- Evaluate whether operational data store or previous warehouse exists
- Determine if critical business processes need rapid deployment (Kimball) or comprehensive integration needed first (Inmon)
- Balance user needs against technical constraints of each model