Module flat_dedup

Source
Expand description

Dedup implementation for flat format.

Structsยง

BatchLastRow ๐Ÿ”’
State of the batch with the last row for dedup.
FlatDedupIterator
An iterator to dedup sorted batches from an iterator based on the dedup strategy.
FlatDedupReader
An async reader to dedup sorted record batches from a stream based on the dedup strategy.
FlatLastNonNull
Dedup strategy that keeps the last non-null field for the same key.
FlatLastRow
Dedup strategy that keeps the row with latest sequence of each key.

Traitsยง

RecordBatchDedupStrategy
Strategy to remove duplicate rows from sorted record batches.

Functionsยง

filter_deleted_from_batch ๐Ÿ”’
Removes deleted rows from the batch and updates metrics.
find_boundaries ๐Ÿ”’
Returns a mask with bits set whenever the value or nullability changes
maybe_filter_deleted ๐Ÿ”’
Filters deleted rows from the record batch if filter_deleted is true.
primary_key_at ๐Ÿ”’
Gets the primary key at index.
timestamp_value ๐Ÿ”’
Gets the timestamp value from the timestamp array.