Expand description
Utilities to remove duplicate rows from a sorted batch.
Structsยง
- Dedup
Metrics - Metrics for deduplication.
- Last
Fields ๐Builder - Buffer to store fields in the last row to merge.
- Last
NonNull ๐ - Dedup strategy that keeps the last non-null field for the same key.
- Last
NonNull ๐Iter - An iterator that dedup rows by LastNonNull strategy. The input iterator must returns sorted batches.
Traitsยง
- Dedup
Metrics Report - Trait for reporting dedup metrics.
- Dedup
Strategy - Strategy to remove duplicate rows from sorted batches.
Functionsยง
- filter_
deleted_ ๐from_ batch - Removes deleted rows from the batch and updates metrics.