Function merge_and_dedup

Source

pub fn merge_and_dedup(
    schema: &SchemaRef,
    append_mode: bool,
    merge_mode: MergeMode,
    field_column_start: usize,
    input_iters: Vec<BoxedRecordBatchIterator>,
) -> Result<BoxedRecordBatchIterator>

Expand description

Merges multiple record batch iterators and applies deduplication based on the specified mode.

This function is used during the flush process to combine data from multiple memtable ranges into a single stream while handling duplicate records according to the configured merge strategy.

§Arguments

schema - The Arrow schema reference that defines the structure of the record batches
append_mode - When true, no deduplication is performed and all records are preserved. This is used for append-only workloads where duplicate handling is not required.
merge_mode - The strategy used for deduplication when not in append mode:
- MergeMode::LastRow: Keeps the last record for each primary key
- MergeMode::LastNonNull: Keeps the last non-null values for each field
field_column_start - The starting column index for fields in the record batch. Used when MergeMode::LastNonNull to identify which columns contain field values versus primary key columns.
input_iters - A vector of record batch iterators to be merged and deduplicated

§Returns

Returns a boxed record batch iterator that yields the merged and potentially deduplicated record batches.

§Behavior

Creates a FlatMergeIterator to merge all input iterators in sorted order based on primary key and timestamp
If append_mode is true, returns the merge iterator directly without deduplication
If append_mode is false, wraps the merge iterator with a FlatDedupIterator that applies the specified merge mode:
- LastRow: Removes duplicate rows, keeping only the last one
- LastNonNull: Removes duplicates but preserves the last non-null value for each field

§Examples

let merged_iter = merge_and_dedup(
    &schema,
    false,  // not append mode, apply dedup
    MergeMode::LastRow,
    2,  // fields start at column 2 after primary key columns
    vec![iter1, iter2, iter3],
)?;

merge_and_dedup

Function merge_and_dedup Copy item path

§Arguments

§Returns

§Behavior

§Examples

Function merge_and_dedup