Expand description
Utilities to remove duplicate rows from a sorted batch.
Structsยง
- Batch
Last ๐Row - State of the last row in a batch for dedup.
- Dedup
Metrics ๐ - Metrics for deduplication.
- Dedup
Reader ๐ - A reader that dedup sorted batches from a source based on the dedup strategy.
- Last
Fields ๐Builder - Buffer to store fields in the last row to merge.
- Last
NonNull ๐ - Dedup strategy that keeps the last non-null field for the same key.
- Last
NonNull ๐Iter - An iterator that dedup rows by LastNonNull strategy. The input iterator must returns sorted batches.
- LastRow ๐
- Dedup strategy that keeps the row with latest sequence of each key.
Traitsยง
- Dedup
Strategy ๐ - Strategy to remove duplicate rows from sorted batches.
Functionsยง
- filter_
deleted_ ๐from_ batch - Removes deleted rows from the batch and updates metrics.