Expand description
Utilities to remove duplicate rows from a sorted batch.
Structsยง
- Batch
Last ๐Row State of the last row in a batch for dedup. - Dedup
Metrics ๐Metrics for deduplication. - Dedup
Reader ๐A reader that dedup sorted batches from a source based on the dedup strategy. - Last
Fields ๐Builder Buffer to store fields in the last row to merge. - Last
NonNull ๐Dedup strategy that keeps the last non-null field for the same key. - Last
NonNull ๐Iter An iterator that dedup rows by LastNonNull strategy. The input iterator must returns sorted batches. - LastRow ๐Dedup strategy that keeps the row with latest sequence of each key.
Traitsยง
- Dedup
Strategy ๐Strategy to remove duplicate rows from sorted batches.
Functionsยง
- Removes deleted rows from the batch and updates metrics.