Module scan_util

Module scan_util 

Source
Expand description

Utilities for scanners.

Structsยง

CompareCostReverse ๐Ÿ”’
Wrapper for file metrics that compares by total cost in reverse order. This allows using BinaryHeap as a min-heap for efficient top-K selection.
FileScanMetrics
Per-file scan metrics.
PartitionMetrics
Metrics while reading a partition.
PartitionMetricsInner ๐Ÿ”’
PartitionMetricsList ๐Ÿ”’
List of PartitionMetrics.
ScanMetricsSet ๐Ÿ”’
Verbose scan metrics for a partition.
SeriesDistributorMetrics ๐Ÿ”’
Metrics for the series distributor.
SplitRecordBatchStream ๐Ÿ”’
A stream wrapper that splits record batches from an inner stream.

Constantsยง

BATCH_SIZE_THRESHOLD ๐Ÿ”’
Minimum batch size after splitting. The batch size is less than 60 because a series may only have 60 samples per hour.
NUM_SERIES_THRESHOLD ๐Ÿ”’
Number of series threshold for splitting batches.
SPLIT_ROW_THRESHOLD ๐Ÿ”’
Files with row count greater than this threshold can contribute to the estimation.

Functionsยง

build_file_range_scan_stream
Build the stream of scanning the input FileRanges.
build_flat_file_range_scan_stream
Build the stream of scanning the input FileRanges using flat reader that returns RecordBatch.
can_split_series ๐Ÿ”’
maybe_scan_flat_other_ranges ๐Ÿ”’
maybe_scan_other_ranges ๐Ÿ”’
new_filter_metrics ๐Ÿ”’
Creates a new ReaderFilterMetrics with optional apply metrics initialized based on the explain_verbose flag.
scan_file_ranges ๐Ÿ”’
Scans file ranges at index.
scan_flat_file_ranges ๐Ÿ”’
Scans file ranges at index using flat reader that returns RecordBatch.
scan_flat_mem_ranges ๐Ÿ”’
Scans memtable ranges at index using flat format that returns RecordBatch.
scan_mem_ranges ๐Ÿ”’
Scans memtable ranges at index.
should_split_flat_batches_for_merge ๐Ÿ”’
Returns true if splitting flat record batches may improve merge performance.
split_record_batch ๐Ÿ”’
Splits the batch by timestamps.