🔄

Creating Your First Pipeline in Azure Data Factory

Jul 16, 2024

Creating Your First Pipeline in Azure Data Factory

Introduction

  • Video tutorial on creating your first pipeline in Microsoft Azure Data Factory (ADF)
  • Steps for navigating the Azure portal to set up the pipeline

Steps to Create Pipeline

Step 1: Access Azure Data Factory

  • Go to the Azure dashboard
  • Click on Data Factory resource
  • Click on "Launch Studio"

Step 2: Authoring in Azure Data Factory

  • Click on the author icon
  • Open the authoring page

Step 3: Create New Pipeline

  • Click on three dots > New Pipeline
  • Set pipeline name: PL_ingest_WS_sales_to_data_Lake
    • PL: Pipeline
    • WS: Web Store
  • Add description: "Ingest Web Store online sales data into the Data Lake"

Step 4: Create Sales Data Set

  • Create a folder named "webstore" for better organization
  • Click on "New Folder" > name the folder "webstore" > click "Create"

Step 5: Set Up Data Set

  • Click on the three dots of the folder > new data set
  • Choose data source: Azure Blob Storage
    • File type: Json
  • Name the dataset: BS_online_sales_json
  • Click on "Link Service" > Create New
    • Name the link service: LS_abs_webstore
    • Description: Connection to Azure Blob Storage

Step 6: Configure Integration Runtime

  • Use default integration runtime (Azure Active Directory authentication)
  • Select Azure subscription: free Azure subscription
  • Pick Blob Storage account: SE_webstore_data001
  • Test connection > Success
  • Click "Create"

Step 7: Fetch Data

  • Select file from Blob Storage: sales.json
  • Click "OK" to fetch data

Build Pipeline

Step 8: Add Transformation Component

  • Navigate to "Move and Transform" > Drag and drop transformation component
  • Name the component: "Copy webstore online sales data"
  • Description: "Copy online sales data from webstore and ingest into Data Lake"
  • Adjust timeout: Change from 12 hours to 10 minutes

Step 9: Configure Source and Sink Data Sets

  • Source Data Set: BS_online_sales_json
  • Sink Data Set (new):
    • Azure Data Lake Gen 2
    • File type: Json
    • Name: DL_online_sales_json
    • Create new link service: LS_ADLs_data_engineering_DL
    • Description: Connection to data engineering
    • Select Azure subscription and storage account
    • Test connection > Success
  • Select where to save data: web development/webstore/raw/online_sales
  • Click "OK" to save

Step 10: Validation and Publishing

  • Validate pipeline > No errors
  • Publish pipeline > Save changes
  • Click "Publish All"

Step 11: Trigger and Monitor Pipeline

  • Trigger pipeline > Use last published configuration
  • Monitor pipeline > Check status and duration (10 minutes)
  • Check final details: Read 68.3 MB of data, transfer successful

Confirmation

  • Navigate to storage account > Check sales.json in Data Lake Gen 2

Conclusion

  • Successfully created first Azure Data Pipeline
  • Ingested data from Blob Storage to Azure Data Lake
  • Video concludes with a thank you note and a sign-off