Skip to main content

Module format

Module format 

Source
Expand description

Format to store in parquet.

We store three internal columns in parquet:

  • __primary_key, the primary key of the row (tags). Type: dictionary(uint32, binary)
  • __sequence, the sequence number of a row. Type: uint64
  • __op_type, the op type of the row. Type: uint8

The schema of a parquet file is:

field 0, field 1, ..., field N, time index, primary key, sequence, op type

We stores fields in the same order as RegionMetadata::field_columns().

StructsΒ§

FormatProjection πŸ”’
Helper to compute the projection for the SST.
PrimaryKeyReadFormat
Helper for reading the SST format.
PrimaryKeyWriteFormat πŸ”’
Helper for writing the SST format with primary key.

EnumsΒ§

StatValues
Values of column statistics of the SST.

ConstantsΒ§

FIXED_POS_COLUMN_NUM πŸ”’
Number of columns that have fixed positions.
INTERNAL_COLUMN_NUM πŸ”’
Number of internal columns.

FunctionsΒ§

column_null_counts πŸ”’
Returns null counts of specific columns. The column should not be encoded as a part of a primary key.
column_values πŸ”’
Returns min/max values of specific columns. Returns None if the column does not have statistics. The column should not be encoded as a part of a primary key.
need_override_sequence πŸ”’
Checks if sequence override is needed based on all row groups’ statistics. Returns true if ALL row groups have sequence min-max values of 0.
parquet_row_group_time_range πŸ”’
Gets the min/max time index of the row group from the parquet meta. It assumes the parquet is created by the mito engine.
primary_key_offsets πŸ”’
Compute offsets of different primary keys in the array.

Type AliasesΒ§

PrimaryKeyArray πŸ”’
Arrow array type for the primary key dictionary.
PrimaryKeyArrayBuilder πŸ”’
Builder type for primary key dictionary array.