⚙️

Databricks Job Parameterization

Jul 10, 2025

Overview

This lecture covers how to parameterize and productionalize Databricks jobs using Databricks Utilities (dbutils) widgets, focusing on writing data to Snowflake with dynamic parameters.

Parameterizing Databricks Notebooks

  • Use dbutils.widgets to add interactive input options (widgets) such as dropdowns or text boxes in Databricks notebooks.
  • Widgets allow users to specify parameters like database, table, and schema names at runtime, making notebooks dynamic.
  • dbutils.widgets.dropdown("database", "test_database", ["test_database", "prod_database"]) creates a dropdown widget with configurable options.
  • Retrieve widget values in notebook code using dbutils.widgets.get("database").
  • Avoid hardcoding values by referencing widget values, which supports reusability and easier modifications.

Productionalizing Jobs

  • Transform an interactive notebook into a reusable production job by parameterizing it with widgets.
  • In the Databricks Jobs UI, you can pass widget values as job parameters.
  • Job parameters set in the interface take precedence over notebook values.
  • This approach enables running the same notebook with different inputs (e.g., different tables or databases) without changing code.
  • Typically use job clusters for production jobs, but interactive clusters can be used for testing.

Workflow Example

  • Load and transform a Databricks dataset using PySpark.
  • Filter, aggregate, and apply window functions to analyze top delayed airlines.
  • Write results to a Snowflake table using parameters for table/database/schema/warehouse.
  • Use mode='overwrite' to replace existing data in the target table.

Key Terms & Definitions

  • Databricks Utilities (dbutils) — Helper functions in Databricks for tasks like creating widgets and accessing parameters.
  • Widget — UI input element (dropdown, text) used to parameterize notebook execution.
  • Parameterization — Process of making notebooks dynamic by using variables instead of hardcoded values.
  • Productionalize — Converting a notebook for repeatable, automated execution in production settings.
  • Job Cluster — A cluster that is created and terminated automatically for a job run.

Action Items / Next Steps

  • Practice creating widgets and retrieving their values in a Databricks notebook.
  • Update existing notebooks to replace hardcoded values with widgets.
  • Set up and test a Databricks job with parameters in the UI.
  • Ensure job overwrites data only when intended by reviewing the mode option.