🧹

Delta Table VACUUM Overview

Jul 10, 2025

Overview

This lecture explains the VACUUM command in Delta Tables, detailing its purpose, operation, syntax, restrictions, and practical usage.

Delta Table Recap

  • Delta Tables store data as a set of files with versioning for tracking changes.
  • Table history can be viewed to see versions and operations performed (create, overwrite, restore).

VACUUM Command Purpose

  • VACUUM removes files that are no longer referenced by the current Delta Table state.
  • Regular use of VACUUM helps manage storage space and prevents performance degradation from accumulating old files.

How VACUUM Works

  • By default, VACUUM retains files for 7 days (168 hours) before deletion.
  • The command: VACUUM <delta_table> RETAIN <hours> HOURS removes files older than the specified retention period.
  • Files currently referenced by the Delta Table are never deleted.

VACUUM Syntax and Options

  • VACUUM delta.<table> RETAIN <n> HOURS chooses retention period in hours.
  • Adding DRY RUN shows which files would be deleted without actually deleting them.
  • Attempting to VACUUM with retention < 168 hours triggers an error unless the safety check is disabled.

Overriding Retention Period

  • The setting spark.databricks.delta.retentionDurationCheck.enabled controls the 7-day retention safety check.
  • Set this to false to allow shorter retention, but use caution as this deletes files immediately.

Effects and Warnings

  • Deleting files with a short retention makes restoring to old versions impossible once the involved files are removed.
  • The Delta Table’s history is updated after each VACUUM operation.
  • Restoring a table to a version with deleted files results in an error.

Best Practices

  • Only use VACUUM to remove unnecessary files and free up space.
  • Avoid running VACUUM with low retention unless absolutely certain, as data recovery becomes impossible.

Key Terms & Definitions

  • Delta Table — A table format that stores data in files with version control and transactional support.
  • VACUUM — A command to delete files not referenced by the latest state of a Delta Table.
  • Retention Period — The minimum time files are kept before they can be deleted by VACUUM.
  • DRY RUN — A VACUUM option that lists files to be deleted without actually deleting them.

Action Items / Next Steps

  • Review previous videos on Delta Tables to solidify foundational knowledge.
  • Try running VACUUM with various retention settings and observe the effects.
  • Practice using DRY RUN before actual deletion to ensure only unnecessary files are targeted.