📊

Overview of Data Hub Actions Framework

Dec 16, 2024

Lecture Notes: Data Hub Actions Framework

Introduction

  • Speaker: Hyejin Yoon from DevRel Team
  • Focus: Introducing the Data Hub Actions Framework
  • Agenda:
    • Explanation of the framework
    • Purpose and usage
    • Real-life use cases
    • How to create actions
    • Demo on propagating tags on datasets

What is Data Hub Actions Framework?

  • Purpose: Takes actions based on changes happening within Data Hub.
  • Components:
    • Changes: Events occurring in Data Hub
    • Actions: Notifications, propagations, etc.

Functionality

  • Data Ingestion:
    • Metadata from data sources (e.g., BigQuery, Snowflake) into DataHub using CLI, APIs, SDKs.
  • Data Export:
    • Subscribing to events, notifications, syncing changes, propagating aspects.
  • Event Types Supported:
    • Entity change event
    • Metadata change log
  • Event Source:
    • Kafka (currently the only supported source)

Execution

  • Executed similarly to normal ingestion using CLI.
  • Requires installation of DataHub Actions module.
  • Configuration:
    • Action Config File: Define Action Pipeline, source (Kafka), filters, and action type.

Types of Actions

  • Pre-defined Actions:
    • Hello World: Prints events as JSON
    • Slack: Sends notifications to a configured Slack channel
  • Propagation:
    • Supported from version 0.0.13 for tags, terms, and Snowflake
  • Customization:
    • Custom actions can be created by extending action-based class in Python.
    • Functions: create, act, close (act contains main logic)

Creating Custom Actions

  • Define action in Python
  • Install action by placing in directory or as a package
  • Run by declaring custom package, file, and class names

Demo: Tag Propagation

  • Tag Propagation Concept:
    • Automatic attachment of tags to downstream datasets once a tag is added to a dataset.
    • Utilizes dataset lineage to identify downstream assets.
  • Demo Steps:
    • Deploy data instance locally
    • Configure action compute file
    • Run action using datahub actions -c <file>
    • Observe automatic tag attachment to downstream datasets

Conclusion

  • Demonstrated the flexibility of Data Hub Actions Framework through customizable actions.
  • Encouraged exploration of the framework for more complex use cases.
  • Thanked the audience and signed off.