Echo Kanak

Deletion vectors in delta

Talking about copy on write and merge on read

Apr 16, 2026

So deletion vectors is a really interesting topic in delta lake.

Before that letโ€™s learn the 2 main strategies that are used by different data lake table formats to handle parquet files.

Parquet is a compressed, columnar, binary file storage format where internally data is organized in row groups, column, pages, metadata etc. Parquet files are immutable. That means that we cannot easily update the file we need to create a new file.

Copy on write

  • Copy on write basically means that on every change (update/delete/merge) a new file is created.

When deletion vector in delta is disabled copy on write followed.

We can disable deletion vector for a table like so on databricks

Lets assume the above table has 100 records and we are updating 1 of the records. When we check the history of our delta table after doing an update

lets look at the operation metrics for the update operation

  • numRemovedFiles : 1 and numAddedFiles : 1, it rewrote the previous file
  • numCopiedRows : 99 it copied all the other rows
  • numUpdatedRows: 1 and it updated just that 1 row
  • rest we can see no update on the deletion vector front since it is disabled
 

Merge on Read

  • Merge on read allows the original parquet file to remain unchanged and the changes we apply are recorded in a separate file with is called deletion vector
  • When we read data deletion vector is checked for any changes
  • Delta follows merge on read when deletion vector is enabled

We can simply enable the deletion vector using the following table property

Now assuming the above table is also 100 records and I deleted one record. When we look at the operation metrics we see something like

  • Now we can see that numCopiedRows is 0 since we only did 1 delete operation
  • and we have numDeletionVectorsAdded is 1
  • numDeletedRows is 1
 

Instead of rewriting large parquet files

  • it stores small changes separately
  • merges them later during reads or compaction

This gives us faster updates/deletes.

Over time, as more changes happen we can end with many small DV files which is where compaction helps and that's where OPTIMIZE and VACUUM come in. Iโ€™ll cover them another blog along with some other topics.



 

You might also like

BlogPro logo