๐Ÿ—ƒ๏ธ

Delta Table Management

Jul 10, 2025

Overview

This lecture explains how Delta tables manage history through transaction logs, enabling time travel and versioning for data recovery, auditing, or rollback.

Delta Table Structure

  • Delta tables store data in Parquet files and maintain a transaction log in a folder named _delta_log.
  • The transaction log contains JSON files (transaction records) and checkpoint files, capturing all table operations.
  • Each change to the table creates a new version, tracked in the transaction log._

Accessing Table History

  • The .history command shows all changes made to the Delta table, including version, timestamp, user, and operation type.
  • Transaction log JSON files contain metadata about each modification (user, operation, notebook ID, etc.).

Overwriting and Versioning Data

  • Overwriting the Delta table with new data creates a new Parquet file and a new version in the transaction log.
  • The .history command displays multiple versions once the table has been modified multiple times.

Time Travel and Data Recovery

  • You can read a specific version of data using the option("versionAsOf", version_number) parameter in the read command (in Python, Scala, or SQL).
  • Alternatively, use the option("timestampAsOf", timestamp) to retrieve data as of a specific time.
  • In SQL, specify the version or timestamp directly in the select query after the table location.

Restoring to Previous Versions

  • The RESTORE TABLE command (in SQL or Python) resets a table to a previous version, undoing unwanted changes.
  • Restoration removes files from later versions and reverts the table to the chosen versionโ€™s state.

Key Terms & Definitions

  • Delta Table โ€” A table format that tracks data changes with ACID transaction logs.
  • Transaction Log โ€” Files in _delta_log folder recording all modifications to the Delta table.
  • Versioning โ€” Each table change is a new sequential version, allowing data snapshots at different points in time.
  • Time Travel โ€” The ability to access data as it existed at any previous version or timestamp._

Action Items / Next Steps

  • Practice reading Delta table history and accessing specific versions using .history and versionAsOf.
  • Try overwriting a Delta table and restoring it to an earlier version.
  • Review Delta table SQL and Python commands for time travel and restoration.