Central repositories for clean and organized business data.
Originally on-premises, now moving to cloud.
Data Lakes
Store raw, unstructured data for later cleansing and analysis.
Data Lakehouse
Combines data lake flexibility and data warehouse quality.
Support operational data and new analytical/machine learning use cases.
Methodologies
Data Fabric (Focus of the lecture)
Introduction to Data Fabric
Definition: Architectural approach and set of technologies to break down data silos and enable data access, ingestion, integration, and sharing across enterprises in a governed manner.
Purpose: Access data from different locations, integrate it, and manage it without creating governance issues or data quality problems.
Comparison: Data Mesh focuses more on organizational changes than Data Fabric.
Responsibilities of Data Fabric
Accessing Data
Data spread across data warehouses, data lakes, SaaS applications.
Use a virtualization layer to aggregate access without massive data movement.
Use robust data integration or ETL tools for latency-sensitive applications.
Managing the Data Lifecycle
Governance & Privacy: Role-based access control, active metadata to enforce policies (masking, redaction).
Data Sources: Historical data from data warehouses, sentiment analysis, customer reviews, co-branded credit card data.
Master Data Management: Ensure accurate customer information, apply governance policies (masking sensitive info).
Publishing and Application Development: Publish governed data to catalogs, enable developers to build personalized applications (recommendation engines, guest services tools).
Conclusion
Importance: Data fabric is crucial for delivering personalized, high-quality experiences across various industries.
Further Learning: Check out more resources on IBM's data fabric solution.