pub struct ParquetReaderBuilder {
file_dir: String,
file_handle: FileHandle,
object_store: ObjectStore,
predicate: Option<Predicate>,
projection: Option<Vec<ColumnId>>,
cache_strategy: CacheStrategy,
inverted_index_applier: Option<Arc<InvertedIndexApplier>>,
bloom_filter_index_applier: Option<Arc<BloomFilterIndexApplier>>,
fulltext_index_applier: Option<Arc<FulltextIndexApplier>>,
expected_metadata: Option<RegionMetadataRef>,
}
Expand description
Parquet SST reader builder.
Fields§
§file_dir: String
SST directory.
file_handle: FileHandle
§object_store: ObjectStore
§predicate: Option<Predicate>
Predicate to push down.
projection: Option<Vec<ColumnId>>
Metadata of columns to read.
None
reads all columns. Due to schema change, the projection
can contain columns not in the parquet file.
cache_strategy: CacheStrategy
Strategy to cache SST data.
inverted_index_applier: Option<Arc<InvertedIndexApplier>>
Index appliers.
bloom_filter_index_applier: Option<Arc<BloomFilterIndexApplier>>
§fulltext_index_applier: Option<Arc<FulltextIndexApplier>>
§expected_metadata: Option<RegionMetadataRef>
Expected metadata of the region while reading the SST. This is usually the latest metadata of the region. The reader use it get the correct column id of a column by name.
Implementations§
Source§impl ParquetReaderBuilder
impl ParquetReaderBuilder
Sourcepub fn new(
file_dir: String,
file_handle: FileHandle,
object_store: ObjectStore,
) -> ParquetReaderBuilder
pub fn new( file_dir: String, file_handle: FileHandle, object_store: ObjectStore, ) -> ParquetReaderBuilder
Returns a new ParquetReaderBuilder to read specific SST.
Sourcepub fn predicate(self, predicate: Option<Predicate>) -> ParquetReaderBuilder
pub fn predicate(self, predicate: Option<Predicate>) -> ParquetReaderBuilder
Attaches the predicate to the builder.
Sourcepub fn projection(
self,
projection: Option<Vec<ColumnId>>,
) -> ParquetReaderBuilder
pub fn projection( self, projection: Option<Vec<ColumnId>>, ) -> ParquetReaderBuilder
Attaches the projection to the builder.
The reader only applies the projection to fields.
Sourcepub fn cache(self, cache: CacheStrategy) -> ParquetReaderBuilder
pub fn cache(self, cache: CacheStrategy) -> ParquetReaderBuilder
Attaches the cache to the builder.
Sourcepub(crate) fn inverted_index_applier(
self,
index_applier: Option<Arc<InvertedIndexApplier>>,
) -> Self
pub(crate) fn inverted_index_applier( self, index_applier: Option<Arc<InvertedIndexApplier>>, ) -> Self
Attaches the inverted index applier to the builder.
Sourcepub(crate) fn bloom_filter_index_applier(
self,
index_applier: Option<Arc<BloomFilterIndexApplier>>,
) -> Self
pub(crate) fn bloom_filter_index_applier( self, index_applier: Option<Arc<BloomFilterIndexApplier>>, ) -> Self
Attaches the bloom filter index applier to the builder.
Sourcepub(crate) fn fulltext_index_applier(
self,
index_applier: Option<Arc<FulltextIndexApplier>>,
) -> Self
pub(crate) fn fulltext_index_applier( self, index_applier: Option<Arc<FulltextIndexApplier>>, ) -> Self
Attaches the fulltext index applier to the builder.
Sourcepub fn expected_metadata(
self,
expected_metadata: Option<RegionMetadataRef>,
) -> Self
pub fn expected_metadata( self, expected_metadata: Option<RegionMetadataRef>, ) -> Self
Attaches the expected metadata to the builder.
Sourcepub async fn build(&self) -> Result<ParquetReader>
pub async fn build(&self) -> Result<ParquetReader>
Builds a ParquetReader.
This needs to perform IO operation.
Sourcepub(crate) async fn build_reader_input(
&self,
metrics: &mut ReaderMetrics,
) -> Result<(FileRangeContext, BTreeMap<usize, Option<RowSelection>>)>
pub(crate) async fn build_reader_input( &self, metrics: &mut ReaderMetrics, ) -> Result<(FileRangeContext, BTreeMap<usize, Option<RowSelection>>)>
Builds a FileRangeContext and collects row groups to read.
This needs to perform IO operation.
Sourcefn get_region_metadata(
file_path: &str,
key_value_meta: Option<&Vec<KeyValue>>,
) -> Result<RegionMetadata>
fn get_region_metadata( file_path: &str, key_value_meta: Option<&Vec<KeyValue>>, ) -> Result<RegionMetadata>
Decodes region metadata from key value.
Sourceasync fn read_parquet_metadata(
&self,
file_path: &str,
file_size: u64,
) -> Result<Arc<ParquetMetaData>>
async fn read_parquet_metadata( &self, file_path: &str, file_size: u64, ) -> Result<Arc<ParquetMetaData>>
Reads parquet metadata of specific file.
Sourceasync fn row_groups_to_read(
&self,
read_format: &ReadFormat,
parquet_meta: &ParquetMetaData,
metrics: &mut ReaderFilterMetrics,
) -> BTreeMap<usize, Option<RowSelection>>
async fn row_groups_to_read( &self, read_format: &ReadFormat, parquet_meta: &ParquetMetaData, metrics: &mut ReaderFilterMetrics, ) -> BTreeMap<usize, Option<RowSelection>>
Computes row groups to read, along with their respective row selections.
Sourceasync fn prune_row_groups_by_fulltext_index(
&self,
row_group_size: usize,
parquet_meta: &ParquetMetaData,
output: &mut BTreeMap<usize, Option<RowSelection>>,
metrics: &mut ReaderFilterMetrics,
) -> bool
async fn prune_row_groups_by_fulltext_index( &self, row_group_size: usize, parquet_meta: &ParquetMetaData, output: &mut BTreeMap<usize, Option<RowSelection>>, metrics: &mut ReaderFilterMetrics, ) -> bool
Prunes row groups by fulltext index. Returns true
if the row groups are pruned.
Sourcefn group_row_ids(
row_ids: BTreeSet<u32>,
row_group_size: usize,
num_row_groups: usize,
) -> Vec<(usize, Vec<usize>)>
fn group_row_ids( row_ids: BTreeSet<u32>, row_group_size: usize, num_row_groups: usize, ) -> Vec<(usize, Vec<usize>)>
Groups row IDs into row groups, with each group’s row IDs starting from 0.
Sourceasync fn prune_row_groups_by_inverted_index(
&self,
row_group_size: usize,
parquet_meta: &ParquetMetaData,
output: &mut BTreeMap<usize, Option<RowSelection>>,
metrics: &mut ReaderFilterMetrics,
) -> bool
async fn prune_row_groups_by_inverted_index( &self, row_group_size: usize, parquet_meta: &ParquetMetaData, output: &mut BTreeMap<usize, Option<RowSelection>>, metrics: &mut ReaderFilterMetrics, ) -> bool
Applies index to prune row groups.
TODO(zhongzc): Devise a mechanism to enforce the non-use of indices as an escape route in case of index issues, and it can be used to test the correctness of the index.
Sourcefn prune_row_groups_by_minmax(
&self,
read_format: &ReadFormat,
parquet_meta: &ParquetMetaData,
output: &mut BTreeMap<usize, Option<RowSelection>>,
metrics: &mut ReaderFilterMetrics,
) -> bool
fn prune_row_groups_by_minmax( &self, read_format: &ReadFormat, parquet_meta: &ParquetMetaData, output: &mut BTreeMap<usize, Option<RowSelection>>, metrics: &mut ReaderFilterMetrics, ) -> bool
Prunes row groups by min-max index.
async fn prune_row_groups_by_bloom_filter( &self, parquet_meta: &ParquetMetaData, output: &mut BTreeMap<usize, Option<RowSelection>>, metrics: &mut ReaderFilterMetrics, ) -> bool
Sourcefn prune_row_groups_by_rows(
parquet_meta: &ParquetMetaData,
rows_in_row_groups: Vec<(usize, Vec<usize>)>,
output: &mut BTreeMap<usize, Option<RowSelection>>,
filtered_row_groups: &mut usize,
filtered_rows: &mut usize,
)
fn prune_row_groups_by_rows( parquet_meta: &ParquetMetaData, rows_in_row_groups: Vec<(usize, Vec<usize>)>, output: &mut BTreeMap<usize, Option<RowSelection>>, filtered_row_groups: &mut usize, filtered_rows: &mut usize, )
Prunes row groups by rows. The rows_in_row_groups
is like a map from row group to
a list of row ids to keep.
Sourcefn prune_row_groups_by_ranges(
parquet_meta: &ParquetMetaData,
ranges_in_row_groups: impl Iterator<Item = (usize, impl Iterator<Item = Range<usize>>)>,
output: &mut BTreeMap<usize, Option<RowSelection>>,
filtered_row_groups: &mut usize,
filtered_rows: &mut usize,
)
fn prune_row_groups_by_ranges( parquet_meta: &ParquetMetaData, ranges_in_row_groups: impl Iterator<Item = (usize, impl Iterator<Item = Range<usize>>)>, output: &mut BTreeMap<usize, Option<RowSelection>>, filtered_row_groups: &mut usize, filtered_rows: &mut usize, )
Prunes row groups by ranges. The ranges_in_row_groups
is like a map from row group to
a list of row ranges to keep.
Auto Trait Implementations§
impl Freeze for ParquetReaderBuilder
impl !RefUnwindSafe for ParquetReaderBuilder
impl Send for ParquetReaderBuilder
impl Sync for ParquetReaderBuilder
impl Unpin for ParquetReaderBuilder
impl !UnwindSafe for ParquetReaderBuilder
Blanket Implementations§
§impl<T> AnySync for T
impl<T> AnySync for T
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
§impl<T> Conv for T
impl<T> Conv for T
§impl<T, V> Convert<T> for Vwhere
V: Into<T>,
impl<T, V> Convert<T> for Vwhere
V: Into<T>,
fn convert(value: Self) -> T
fn convert_box(value: Box<Self>) -> Box<T>
fn convert_vec(value: Vec<Self>) -> Vec<T>
fn convert_vec_box(value: Vec<Box<Self>>) -> Vec<Box<T>>
fn convert_matrix(value: Vec<Vec<Self>>) -> Vec<Vec<T>>
fn convert_option(value: Option<Self>) -> Option<T>
fn convert_option_box(value: Option<Box<Self>>) -> Option<Box<T>>
fn convert_option_vec(value: Option<Vec<Self>>) -> Option<Vec<T>>
§impl<T> Downcast for Twhere
T: Any,
impl<T> Downcast for Twhere
T: Any,
§fn into_any(self: Box<T>) -> Box<dyn Any>
fn into_any(self: Box<T>) -> Box<dyn Any>
Box<dyn Trait>
(where Trait: Downcast
) to Box<dyn Any>
. Box<dyn Any>
can
then be further downcast
into Box<ConcreteType>
where ConcreteType
implements Trait
.§fn into_any_rc(self: Rc<T>) -> Rc<dyn Any>
fn into_any_rc(self: Rc<T>) -> Rc<dyn Any>
Rc<Trait>
(where Trait: Downcast
) to Rc<Any>
. Rc<Any>
can then be
further downcast
into Rc<ConcreteType>
where ConcreteType
implements Trait
.§fn as_any(&self) -> &(dyn Any + 'static)
fn as_any(&self) -> &(dyn Any + 'static)
&Trait
(where Trait: Downcast
) to &Any
. This is needed since Rust cannot
generate &Any
’s vtable from &Trait
’s.§fn as_any_mut(&mut self) -> &mut (dyn Any + 'static)
fn as_any_mut(&mut self) -> &mut (dyn Any + 'static)
&mut Trait
(where Trait: Downcast
) to &Any
. This is needed since Rust cannot
generate &mut Any
’s vtable from &mut Trait
’s.§impl<T> DowncastSync for T
impl<T> DowncastSync for T
§impl<T> FmtForward for T
impl<T> FmtForward for T
§fn fmt_binary(self) -> FmtBinary<Self>where
Self: Binary,
fn fmt_binary(self) -> FmtBinary<Self>where
Self: Binary,
self
to use its Binary
implementation when Debug
-formatted.§fn fmt_display(self) -> FmtDisplay<Self>where
Self: Display,
fn fmt_display(self) -> FmtDisplay<Self>where
Self: Display,
self
to use its Display
implementation when
Debug
-formatted.§fn fmt_lower_exp(self) -> FmtLowerExp<Self>where
Self: LowerExp,
fn fmt_lower_exp(self) -> FmtLowerExp<Self>where
Self: LowerExp,
self
to use its LowerExp
implementation when
Debug
-formatted.§fn fmt_lower_hex(self) -> FmtLowerHex<Self>where
Self: LowerHex,
fn fmt_lower_hex(self) -> FmtLowerHex<Self>where
Self: LowerHex,
self
to use its LowerHex
implementation when
Debug
-formatted.§fn fmt_octal(self) -> FmtOctal<Self>where
Self: Octal,
fn fmt_octal(self) -> FmtOctal<Self>where
Self: Octal,
self
to use its Octal
implementation when Debug
-formatted.§fn fmt_pointer(self) -> FmtPointer<Self>where
Self: Pointer,
fn fmt_pointer(self) -> FmtPointer<Self>where
Self: Pointer,
self
to use its Pointer
implementation when
Debug
-formatted.§fn fmt_upper_exp(self) -> FmtUpperExp<Self>where
Self: UpperExp,
fn fmt_upper_exp(self) -> FmtUpperExp<Self>where
Self: UpperExp,
self
to use its UpperExp
implementation when
Debug
-formatted.§fn fmt_upper_hex(self) -> FmtUpperHex<Self>where
Self: UpperHex,
fn fmt_upper_hex(self) -> FmtUpperHex<Self>where
Self: UpperHex,
self
to use its UpperHex
implementation when
Debug
-formatted.§fn fmt_list(self) -> FmtList<Self>where
&'a Self: for<'a> IntoIterator,
fn fmt_list(self) -> FmtList<Self>where
&'a Self: for<'a> IntoIterator,
§impl<T> FutureExt for T
impl<T> FutureExt for T
§fn with_context(self, otel_cx: Context) -> WithContext<Self>
fn with_context(self, otel_cx: Context) -> WithContext<Self>
§fn with_current_context(self) -> WithContext<Self>
fn with_current_context(self) -> WithContext<Self>
§impl<T> Instrument for T
impl<T> Instrument for T
§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left
is true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left(&self)
returns true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read moreSource§impl<T> IntoRequest<T> for T
impl<T> IntoRequest<T> for T
Source§fn into_request(self) -> Request<T>
fn into_request(self) -> Request<T>
T
in a tonic::Request
Source§impl<T> IntoRequest<T> for T
impl<T> IntoRequest<T> for T
Source§fn into_request(self) -> Request<T>
fn into_request(self) -> Request<T>
T
in a tonic::Request
§impl<T> Pipe for Twhere
T: ?Sized,
impl<T> Pipe for Twhere
T: ?Sized,
§fn pipe<R>(self, func: impl FnOnce(Self) -> R) -> Rwhere
Self: Sized,
fn pipe<R>(self, func: impl FnOnce(Self) -> R) -> Rwhere
Self: Sized,
§fn pipe_ref<'a, R>(&'a self, func: impl FnOnce(&'a Self) -> R) -> Rwhere
R: 'a,
fn pipe_ref<'a, R>(&'a self, func: impl FnOnce(&'a Self) -> R) -> Rwhere
R: 'a,
self
and passes that borrow into the pipe function. Read more§fn pipe_ref_mut<'a, R>(&'a mut self, func: impl FnOnce(&'a mut Self) -> R) -> Rwhere
R: 'a,
fn pipe_ref_mut<'a, R>(&'a mut self, func: impl FnOnce(&'a mut Self) -> R) -> Rwhere
R: 'a,
self
and passes that borrow into the pipe function. Read more§fn pipe_borrow<'a, B, R>(&'a self, func: impl FnOnce(&'a B) -> R) -> R
fn pipe_borrow<'a, B, R>(&'a self, func: impl FnOnce(&'a B) -> R) -> R
§fn pipe_borrow_mut<'a, B, R>(
&'a mut self,
func: impl FnOnce(&'a mut B) -> R,
) -> R
fn pipe_borrow_mut<'a, B, R>( &'a mut self, func: impl FnOnce(&'a mut B) -> R, ) -> R
§fn pipe_as_ref<'a, U, R>(&'a self, func: impl FnOnce(&'a U) -> R) -> R
fn pipe_as_ref<'a, U, R>(&'a self, func: impl FnOnce(&'a U) -> R) -> R
self
, then passes self.as_ref()
into the pipe function.§fn pipe_as_mut<'a, U, R>(&'a mut self, func: impl FnOnce(&'a mut U) -> R) -> R
fn pipe_as_mut<'a, U, R>(&'a mut self, func: impl FnOnce(&'a mut U) -> R) -> R
self
, then passes self.as_mut()
into the pipe
function.§fn pipe_deref<'a, T, R>(&'a self, func: impl FnOnce(&'a T) -> R) -> R
fn pipe_deref<'a, T, R>(&'a self, func: impl FnOnce(&'a T) -> R) -> R
self
, then passes self.deref()
into the pipe function.§impl<T> Pointable for T
impl<T> Pointable for T
§impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
§fn to_subset(&self) -> Option<SS>
fn to_subset(&self) -> Option<SS>
self
from the equivalent element of its
superset. Read more§fn is_in_subset(&self) -> bool
fn is_in_subset(&self) -> bool
self
is actually part of its subset T
(and can be converted to it).§fn to_subset_unchecked(&self) -> SS
fn to_subset_unchecked(&self) -> SS
self.to_subset
but without any property checks. Always succeeds.§fn from_subset(element: &SS) -> SP
fn from_subset(element: &SS) -> SP
self
to the equivalent element of its superset.§impl<T> Tap for T
impl<T> Tap for T
§fn tap_borrow<B>(self, func: impl FnOnce(&B)) -> Self
fn tap_borrow<B>(self, func: impl FnOnce(&B)) -> Self
Borrow<B>
of a value. Read more§fn tap_borrow_mut<B>(self, func: impl FnOnce(&mut B)) -> Self
fn tap_borrow_mut<B>(self, func: impl FnOnce(&mut B)) -> Self
BorrowMut<B>
of a value. Read more§fn tap_ref<R>(self, func: impl FnOnce(&R)) -> Self
fn tap_ref<R>(self, func: impl FnOnce(&R)) -> Self
AsRef<R>
view of a value. Read more§fn tap_ref_mut<R>(self, func: impl FnOnce(&mut R)) -> Self
fn tap_ref_mut<R>(self, func: impl FnOnce(&mut R)) -> Self
AsMut<R>
view of a value. Read more§fn tap_deref<T>(self, func: impl FnOnce(&T)) -> Self
fn tap_deref<T>(self, func: impl FnOnce(&T)) -> Self
Deref::Target
of a value. Read more§fn tap_deref_mut<T>(self, func: impl FnOnce(&mut T)) -> Self
fn tap_deref_mut<T>(self, func: impl FnOnce(&mut T)) -> Self
Deref::Target
of a value. Read more§fn tap_dbg(self, func: impl FnOnce(&Self)) -> Self
fn tap_dbg(self, func: impl FnOnce(&Self)) -> Self
.tap()
only in debug builds, and is erased in release builds.§fn tap_mut_dbg(self, func: impl FnOnce(&mut Self)) -> Self
fn tap_mut_dbg(self, func: impl FnOnce(&mut Self)) -> Self
.tap_mut()
only in debug builds, and is erased in release
builds.§fn tap_borrow_dbg<B>(self, func: impl FnOnce(&B)) -> Self
fn tap_borrow_dbg<B>(self, func: impl FnOnce(&B)) -> Self
.tap_borrow()
only in debug builds, and is erased in release
builds.§fn tap_borrow_mut_dbg<B>(self, func: impl FnOnce(&mut B)) -> Self
fn tap_borrow_mut_dbg<B>(self, func: impl FnOnce(&mut B)) -> Self
.tap_borrow_mut()
only in debug builds, and is erased in release
builds.§fn tap_ref_dbg<R>(self, func: impl FnOnce(&R)) -> Self
fn tap_ref_dbg<R>(self, func: impl FnOnce(&R)) -> Self
.tap_ref()
only in debug builds, and is erased in release
builds.§fn tap_ref_mut_dbg<R>(self, func: impl FnOnce(&mut R)) -> Self
fn tap_ref_mut_dbg<R>(self, func: impl FnOnce(&mut R)) -> Self
.tap_ref_mut()
only in debug builds, and is erased in release
builds.§fn tap_deref_dbg<T>(self, func: impl FnOnce(&T)) -> Self
fn tap_deref_dbg<T>(self, func: impl FnOnce(&T)) -> Self
.tap_deref()
only in debug builds, and is erased in release
builds.