🔍

Overview of Microsoft Fabric Solutions

Apr 22, 2025

Microsoft Fabric Lecture Notes

Introduction

  • Microsoft Fabric offers a Software as a Service (SaaS) solution for enterprise data needs.
  • Addresses the issue of data silos within organizations due to diverse data storage formats and tools.

Current Challenges

  • Organizations have data spread across various tools and engines (SQL, Spark, etc.).
  • Different groups within an organization have their own data silos, leading to data staleness and governance issues.
  • The traditional Extract, Transform, Load (ETL) process has shifted to Extract, Load, Transform (ELT) for flexibility.
  • Challenges include data governance, auditing, discovery, classification, and protection.

Microsoft Fabric Overview

  • Microsoft Fabric: Provides comprehensive solutions for enterprise data.
    • Lakehouse: Supports structured, semi-structured, and unstructured data.
    • Warehouse: Schema-based structured data.
    • Spark Engine: For data engineering and data science.
    • Data Activator: Alerting system based on events.
    • Real-Time Analytics: Streaming data logs and events.
    • Data Factory: For pipelines and data movement.
    • Power BI Integration: Enhances data visualization and reporting.

Core Concepts of Microsoft Fabric

  • One Lake: A unified namespace for organizational data, akin to OneDrive but for enterprise data.

    • Automatically available for all data within a tenant.
    • Supports governance and auditing without data silos.
  • Fabric Capacity: Virtual buckets of serverless compute capacity.

    • Can pause/resume capacity based on needs.
    • Allows for 'bursting' above provisioned capacity, smoothing usage over time.
  • Workspaces: Organizational units within Fabric.

    • Associate workspaces with specific capacities.
    • Allow for organization and permission management of data items.

Data Storage and Formats

  • Uses Delta Parquet format for storing structured data.
    • Enables interoperability across different engines (Spark, SQL, etc.).
  • Supports Iceberg metadata for compatibility with platforms like Snowflake.
  • Utilizes Apache XTable for metadata format conversions.

Integration and Compatibility

  • ADLS Gen 2 API: Provides open integration for existing tools and platforms to access One Lake.
  • Shortcuts and Mirroring: Allow integration of external data without migration.
    • Shortcuts: Symbolic links to external data sources.
    • Mirroring: Replicate non-Delta Parquet data into One Lake using change data capture.

Data Processing and Tools

  • Lakehouse and Warehouse: Store structured and unstructured data.
  • Data Engineering and Science:
    • Notebooks and Spark for data processing.
    • Data Factory for pipeline management.
  • Semantic Models in Power BI:
    • Direct Lake mode for enhanced performance without data translation.

Governance and Security

  • Purview: Integrates with One Lake for data governance, discovery, and protection.
  • Planned features include workspace-level private link settings for enhanced network security.

Conclusion

  • Microsoft Fabric centralizes and simplifies data management, eliminating silos and enhancing data accessibility.
  • Continued development and integration with tools like Purview and AI features enhance its value.
  • The roadmap includes exciting features like workspace private link and further integration capabilities.