Expand description
Dedup implementation for flat format.
Structsยง
- Batch
Last ๐Row - State of the batch with the last row for dedup.
- Flat
Dedup Iterator - An iterator to dedup sorted batches from an iterator based on the dedup strategy.
- Flat
Dedup Reader - An async reader to dedup sorted record batches from a stream based on the dedup strategy.
- Flat
Last NonNull - Dedup strategy that keeps the last non-null field for the same key.
- Flat
Last Row - Dedup strategy that keeps the row with latest sequence of each key.
Traitsยง
- Record
Batch Dedup Strategy - Strategy to remove duplicate rows from sorted record batches.
Functionsยง
- filter_
deleted_ ๐from_ batch - Removes deleted rows from the batch and updates metrics.
- find_
boundaries ๐ - Returns a mask with bits set whenever the value or nullability changes
- maybe_
filter_ ๐deleted - Filters deleted rows from the record batch if
filter_deleted
is true. - primary_
key_ ๐at - Gets the primary key at
index
. - timestamp_
value ๐ - Gets the timestamp value from the timestamp array.