Module format

Source
Expand description

Format to store in parquet.

We store three internal columns in parquet:

  • __primary_key, the primary key of the row (tags). Type: dictionary(uint32, binary)
  • __sequence, the sequence number of a row. Type: uint64
  • __op_type, the op type of the row. Type: uint8

The schema of a parquet file is:

field 0, field 1, ..., field N, time index, primary key, sequence, op type

We stores fields in the same order as RegionMetadata::field_columns().

StructsΒ§

FormatProjection πŸ”’
Helper to compute the projection for the SST.
PrimaryKeyReadFormat
Helper for reading the SST format.
PrimaryKeyWriteFormat πŸ”’
Helper for writing the SST format with primary key.

EnumsΒ§

ReadFormat
Helper to read parquet formats.
StatValues
Values of column statistics of the SST.

ConstantsΒ§

FIXED_POS_COLUMN_NUM πŸ”’
Number of columns that have fixed positions.

FunctionsΒ§

need_override_sequence πŸ”’
Checks if sequence override is needed based on all row groups’ statistics. Returns true if ALL row groups have sequence min-max values of 0.
new_primary_key_array πŸ”’
Creates a new array for specific primary_key.
parquet_row_group_time_range πŸ”’
Gets the min/max time index of the row group from the parquet meta. It assumes the parquet is created by the mito engine.
primary_key_offsets πŸ”’
Compute offsets of different primary keys in the array.

Type AliasesΒ§

PrimaryKeyArray πŸ”’
Arrow array type for the primary key dictionary.