Overview
This lecture explains how Delta tables manage history through transaction logs, enabling time travel and versioning for data recovery, auditing, or rollback.
Delta Table Structure
- Delta tables store data in Parquet files and maintain a transaction log in a folder named
_delta_log.
- The transaction log contains JSON files (transaction records) and checkpoint files, capturing all table operations.
- Each change to the table creates a new version, tracked in the transaction log._
Accessing Table History
- The
.history command shows all changes made to the Delta table, including version, timestamp, user, and operation type.
- Transaction log JSON files contain metadata about each modification (user, operation, notebook ID, etc.).
Overwriting and Versioning Data
- Overwriting the Delta table with new data creates a new Parquet file and a new version in the transaction log.
- The
.history command displays multiple versions once the table has been modified multiple times.
Time Travel and Data Recovery
- You can read a specific version of data using the
option("versionAsOf", version_number) parameter in the read command (in Python, Scala, or SQL).
- Alternatively, use the
option("timestampAsOf", timestamp) to retrieve data as of a specific time.
- In SQL, specify the version or timestamp directly in the select query after the table location.
Restoring to Previous Versions
- The
RESTORE TABLE command (in SQL or Python) resets a table to a previous version, undoing unwanted changes.
- Restoration removes files from later versions and reverts the table to the chosen versionโs state.
Key Terms & Definitions
- Delta Table โ A table format that tracks data changes with ACID transaction logs.
- Transaction Log โ Files in
_delta_log folder recording all modifications to the Delta table.
- Versioning โ Each table change is a new sequential version, allowing data snapshots at different points in time.
- Time Travel โ The ability to access data as it existed at any previous version or timestamp._
Action Items / Next Steps
- Practice reading Delta table history and accessing specific versions using
.history and versionAsOf.
- Try overwriting a Delta table and restoring it to an earlier version.
- Review Delta table SQL and Python commands for time travel and restoration.