Module format

Source
Expand description

Format to store in parquet.

We store three internal columns in parquet:

  • __primary_key, the primary key of the row (tags). Type: dictionary(uint32, binary)
  • __sequence, the sequence number of a row. Type: uint64
  • __op_type, the op type of the row. Type: uint8

The schema of a parquet file is:

field 0, field 1, ..., field N, time index, primary key, sequence, op type

We stores fields in the same order as RegionMetadata::field_columns().

StructsΒ§

ReadFormat
Helper for reading the SST format.
WriteFormat πŸ”’
Helper for writing the SST format.

EnumsΒ§

StatValues
Values of column statistics of the SST.

ConstantsΒ§

FIXED_POS_COLUMN_NUM πŸ”’
Number of columns that have fixed positions.

FunctionsΒ§

need_override_sequence πŸ”’
Checks if sequence override is needed based on all row groups’ statistics. Returns true if ALL row groups have sequence min-max values of 0.
new_primary_key_array πŸ”’
Creates a new array for specific primary_key.
parquet_row_group_time_range πŸ”’
Gets the min/max time index of the row group from the parquet meta. It assumes the parquet is created by the mito engine.
primary_key_offsets πŸ”’
Compute offsets of different primary keys in the array.

Type AliasesΒ§

PrimaryKeyArray πŸ”’
Arrow array type for the primary key dictionary.