Delta Table Merging Overview

Jul 10, 2025

Overview

This lecture explains how to use the merge operation in Delta Tables, focusing on efficiently performing insert and update (upsert) actions using the MERGE INTO command in Databricks.

Introduction to Merging in Delta Tables

  • Merging is used to perform upserts: update existing records or insert new records based on specific conditions.
  • Delta Tables provide the MERGE INTO statement to simplify merge and upsert operations.
  • Before Delta Tables, merging data was more complex and difficult.

Setting Up Demo Tables

  • The example uses two tables with the same schema: id, first name, middle name, last name, gender, date of birth, SSN number, and salary.
  • Table 1 is created from a default Databricks dataset and saved as "people10m."
  • Table 2 ("people10m_upload01") is created manually, and a few records (already present in Table 1) are inserted into it.

Performing the Merge Operation

  • The MERGE INTO statement merges data from Table 2 into Table 1 using the "id" as the join key.
  • Syntax: MERGE INTO target_table USING source_table ON <condition> WHEN MATCHED THEN UPDATE WHEN NOT MATCHED THEN INSERT.
  • After running the merge, statistics indicate 3 rows were updated (matched), 0 deleted, and 0 inserted.

Handling New Insertions via Merge

  • New records not present in the target table are inserted into Table 2.
  • Running the merge again results in 3 more rows being inserted into the target table.
  • The merge operation can perform both updates (for existing IDs) and inserts (for new IDs) simultaneously.

Key Terms & Definitions

  • Delta Table — A Databricks table format supporting ACID transactions and efficient merge/upsert operations.
  • Merge (Upsert) — Combining two tables; updates existing records and inserts new ones based on a condition.
  • MERGE INTO Statement — SQL command that merges data from a source table into a target table with specified update and insert actions.
  • Managed/Unmanaged Table — Managed tables are fully controlled by Databricks; unmanaged tables have externally stored data.

Action Items / Next Steps

  • Watch the first video in the Delta Tables series for details on managed vs. unmanaged tables.
  • Practice using the MERGE INTO statement with sample data on Databricks.
  • Review syntax and experiment with different merge conditions.