Overview
This lecture explains the VACUUM command in Delta Tables, detailing its purpose, operation, syntax, restrictions, and practical usage.
Delta Table Recap
- Delta Tables store data as a set of files with versioning for tracking changes.
- Table history can be viewed to see versions and operations performed (create, overwrite, restore).
VACUUM Command Purpose
- VACUUM removes files that are no longer referenced by the current Delta Table state.
- Regular use of VACUUM helps manage storage space and prevents performance degradation from accumulating old files.
How VACUUM Works
- By default, VACUUM retains files for 7 days (168 hours) before deletion.
- The command:
VACUUM <delta_table> RETAIN <hours> HOURS removes files older than the specified retention period.
- Files currently referenced by the Delta Table are never deleted.
VACUUM Syntax and Options
VACUUM delta.<table> RETAIN <n> HOURS chooses retention period in hours.
- Adding
DRY RUN shows which files would be deleted without actually deleting them.
- Attempting to VACUUM with retention < 168 hours triggers an error unless the safety check is disabled.
Overriding Retention Period
- The setting
spark.databricks.delta.retentionDurationCheck.enabled controls the 7-day retention safety check.
- Set this to
false to allow shorter retention, but use caution as this deletes files immediately.
Effects and Warnings
- Deleting files with a short retention makes restoring to old versions impossible once the involved files are removed.
- The Delta Table’s history is updated after each VACUUM operation.
- Restoring a table to a version with deleted files results in an error.
Best Practices
- Only use VACUUM to remove unnecessary files and free up space.
- Avoid running VACUUM with low retention unless absolutely certain, as data recovery becomes impossible.
Key Terms & Definitions
- Delta Table — A table format that stores data in files with version control and transactional support.
- VACUUM — A command to delete files not referenced by the latest state of a Delta Table.
- Retention Period — The minimum time files are kept before they can be deleted by VACUUM.
- DRY RUN — A VACUUM option that lists files to be deleted without actually deleting them.
Action Items / Next Steps
- Review previous videos on Delta Tables to solidify foundational knowledge.
- Try running VACUUM with various retention settings and observe the effects.
- Practice using DRY RUN before actual deletion to ensure only unnecessary files are targeted.