index::inverted_index::create::sort::external_sort

Struct ExternalSorter

Source
pub struct ExternalSorter {
    index_name: String,
    temp_file_provider: Arc<dyn ExternalTempFileProvider>,
    segment_null_bitmap: Bitmap,
    values_buffer: BTreeMap<Bytes, (Bitmap, usize)>,
    total_row_count: usize,
    segment_row_count: NonZeroUsize,
    current_memory_usage: usize,
    current_memory_usage_threshold: Option<usize>,
    global_memory_usage: Arc<AtomicUsize>,
    global_memory_usage_sort_limit: Option<usize>,
}
Expand description

ExternalSorter manages the sorting of data using both in-memory structures and external files. It dumps data to external files when the in-memory buffer crosses a certain memory threshold.

Fields§

§index_name: String

The index name associated with the sorting operation

§temp_file_provider: Arc<dyn ExternalTempFileProvider>

Manages creation and access to external temporary files

§segment_null_bitmap: Bitmap

Bitmap indicating which segments have null values

§values_buffer: BTreeMap<Bytes, (Bitmap, usize)>

In-memory buffer to hold values and their corresponding bitmaps until memory threshold is exceeded

§total_row_count: usize

Count of all rows ingested so far

§segment_row_count: NonZeroUsize

The number of rows per group for bitmap indexing which determines how rows are batched for indexing. It is used to determine which segment a row belongs to.

§current_memory_usage: usize

Tracks memory usage of the buffer

§current_memory_usage_threshold: Option<usize>

The threshold of current memory usage below which the buffer is not dumped, even if the global memory usage exceeds global_memory_usage_sort_limit. This allows for smaller buffers to remain in memory, providing a buffer against unnecessary dumps to external files, which can be costly in terms of performance. None indicates that only the global memory usage threshold is considered for dumping the buffer.

§global_memory_usage: Arc<AtomicUsize>

Tracks the global memory usage of all sorters

§global_memory_usage_sort_limit: Option<usize>

The memory usage limit that, when exceeded by the global memory consumption of all sorters, necessitates a reassessment of buffer retention. Surpassing this limit signals that there is a high overall memory pressure, potentially requiring buffer dumping to external storage for memory relief. None value indicates that no specific global memory usage threshold is established for triggering buffer dumps.

Implementations§

Source§

impl ExternalSorter

Source

pub fn new( index_name: String, temp_file_provider: Arc<dyn ExternalTempFileProvider>, segment_row_count: NonZeroUsize, current_memory_usage_threshold: Option<usize>, global_memory_usage: Arc<AtomicUsize>, global_memory_usage_sort_limit: Option<usize>, ) -> Self

Constructs a new ExternalSorter

Source

pub fn factory( temp_file_provider: Arc<dyn ExternalTempFileProvider>, current_memory_usage_threshold: Option<usize>, global_memory_usage: Arc<AtomicUsize>, global_memory_usage_sort_limit: Option<usize>, ) -> SorterFactory

Generates a factory function that creates new ExternalSorter instances

Source

fn push_not_null( &mut self, value: BytesRef<'_>, segment_index_range: RangeInclusive<usize>, ) -> usize

Pushes the non-null values to the values buffer and sets the bits within the specified range in the given bitmap to true. Returns the memory usage difference of the buffer after the operation.

Source

async fn may_dump_buffer(&mut self, memory_diff: usize) -> Result<()>

Checks if the in-memory buffer exceeds the threshold and offloads it to external storage if necessary

Source

fn segment_index_range(&self, n: usize) -> RangeInclusive<usize>

Determines the segment index range for the row index range [row_begin, row_begin + n - 1]

Source

fn segment_index(&self, row_index: usize) -> usize

Determines the segment index for the given row index

Trait Implementations§

Source§

impl Sorter for ExternalSorter

Source§

fn push_n<'life0, 'life1, 'async_trait>( &'life0 mut self, value: Option<BytesRef<'life1>>, n: usize, ) -> Pin<Box<dyn Future<Output = Result<()>> + Send + 'async_trait>>
where Self: 'async_trait, 'life0: 'async_trait, 'life1: 'async_trait,

Pushes n identical values into the sorter, adding them to the in-memory buffer and dumping the buffer to an external file if necessary

Source§

fn output<'life0, 'async_trait>( &'life0 mut self, ) -> Pin<Box<dyn Future<Output = Result<SortOutput>> + Send + 'async_trait>>
where Self: 'async_trait, 'life0: 'async_trait,

Finalizes the sorting operation, merging data from both in-memory buffer and external files into a sorted stream

Source§

fn push<'life0, 'life1, 'async_trait>( &'life0 mut self, value: Option<BytesRef<'life1>>, ) -> Pin<Box<dyn Future<Output = Result<()>> + Send + 'async_trait>>
where Self: 'async_trait, 'life0: 'async_trait, 'life1: 'async_trait,

Inputs a non-null or null value into the sorter. Should be equivalent to calling push_n with n = 1

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
§

impl<T> Any for T
where T: Any,

§

fn into_any(self: Box<T>) -> Box<dyn Any>

§

fn into_any_rc(self: Rc<T>) -> Rc<dyn Any>

§

fn type_name(&self) -> &'static str

§

impl<T> AnySync for T
where T: Any + Send + Sync,

§

fn into_any_arc(self: Arc<T>) -> Arc<dyn Any + Sync + Send>

Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
§

impl<T> Conv for T

§

fn conv<T>(self) -> T
where Self: Into<T>,

Converts self into T using Into<T>. Read more
§

impl<T, V> Convert<T> for V
where V: Into<T>,

§

fn convert(value: Self) -> T

§

fn convert_box(value: Box<Self>) -> Box<T>

§

fn convert_vec(value: Vec<Self>) -> Vec<T>

§

fn convert_vec_box(value: Vec<Box<Self>>) -> Vec<Box<T>>

§

fn convert_matrix(value: Vec<Vec<Self>>) -> Vec<Vec<T>>

§

fn convert_option(value: Option<Self>) -> Option<T>

§

fn convert_option_box(value: Option<Box<Self>>) -> Option<Box<T>>

§

fn convert_option_vec(value: Option<Vec<Self>>) -> Option<Vec<T>>

§

impl<T> Downcast for T
where T: Any,

§

fn into_any(self: Box<T>) -> Box<dyn Any>

Convert Box<dyn Trait> (where Trait: Downcast) to Box<dyn Any>. Box<dyn Any> can then be further downcast into Box<ConcreteType> where ConcreteType implements Trait.
§

fn into_any_rc(self: Rc<T>) -> Rc<dyn Any>

Convert Rc<Trait> (where Trait: Downcast) to Rc<Any>. Rc<Any> can then be further downcast into Rc<ConcreteType> where ConcreteType implements Trait.
§

fn as_any(&self) -> &(dyn Any + 'static)

Convert &Trait (where Trait: Downcast) to &Any. This is needed since Rust cannot generate &Any’s vtable from &Trait’s.
§

fn as_any_mut(&mut self) -> &mut (dyn Any + 'static)

Convert &mut Trait (where Trait: Downcast) to &Any. This is needed since Rust cannot generate &mut Any’s vtable from &mut Trait’s.
§

impl<T> DowncastSync for T
where T: Any + Send + Sync,

§

fn into_any_arc(self: Arc<T>) -> Arc<dyn Any + Sync + Send>

Convert Arc<Trait> (where Trait: Downcast) to Arc<Any>. Arc<Any> can then be further downcast into Arc<ConcreteType> where ConcreteType implements Trait.
§

impl<T> FmtForward for T

§

fn fmt_binary(self) -> FmtBinary<Self>
where Self: Binary,

Causes self to use its Binary implementation when Debug-formatted.
§

fn fmt_display(self) -> FmtDisplay<Self>
where Self: Display,

Causes self to use its Display implementation when Debug-formatted.
§

fn fmt_lower_exp(self) -> FmtLowerExp<Self>
where Self: LowerExp,

Causes self to use its LowerExp implementation when Debug-formatted.
§

fn fmt_lower_hex(self) -> FmtLowerHex<Self>
where Self: LowerHex,

Causes self to use its LowerHex implementation when Debug-formatted.
§

fn fmt_octal(self) -> FmtOctal<Self>
where Self: Octal,

Causes self to use its Octal implementation when Debug-formatted.
§

fn fmt_pointer(self) -> FmtPointer<Self>
where Self: Pointer,

Causes self to use its Pointer implementation when Debug-formatted.
§

fn fmt_upper_exp(self) -> FmtUpperExp<Self>
where Self: UpperExp,

Causes self to use its UpperExp implementation when Debug-formatted.
§

fn fmt_upper_hex(self) -> FmtUpperHex<Self>
where Self: UpperHex,

Causes self to use its UpperHex implementation when Debug-formatted.
§

fn fmt_list(self) -> FmtList<Self>
where &'a Self: for<'a> IntoIterator,

Formats each item in a sequence. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

§

impl<T> FutureExt for T

§

fn with_context(self, otel_cx: Context) -> WithContext<Self>

Attaches the provided Context to this type, returning a WithContext wrapper. Read more
§

fn with_current_context(self) -> WithContext<Self>

Attaches the current Context to this type, returning a WithContext wrapper. Read more
§

impl<T> Instrument for T

§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided [Span], returning an Instrumented wrapper. Read more
§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> IntoRequest<T> for T

Source§

fn into_request(self) -> Request<T>

Wrap the input message T in a tonic::Request
Source§

impl<T> IntoRequest<T> for T

Source§

fn into_request(self) -> Request<T>

Wrap the input message T in a tonic::Request
§

impl<T> Pipe for T
where T: ?Sized,

§

fn pipe<R>(self, func: impl FnOnce(Self) -> R) -> R
where Self: Sized,

Pipes by value. This is generally the method you want to use. Read more
§

fn pipe_ref<'a, R>(&'a self, func: impl FnOnce(&'a Self) -> R) -> R
where R: 'a,

Borrows self and passes that borrow into the pipe function. Read more
§

fn pipe_ref_mut<'a, R>(&'a mut self, func: impl FnOnce(&'a mut Self) -> R) -> R
where R: 'a,

Mutably borrows self and passes that borrow into the pipe function. Read more
§

fn pipe_borrow<'a, B, R>(&'a self, func: impl FnOnce(&'a B) -> R) -> R
where Self: Borrow<B>, B: 'a + ?Sized, R: 'a,

Borrows self, then passes self.borrow() into the pipe function. Read more
§

fn pipe_borrow_mut<'a, B, R>( &'a mut self, func: impl FnOnce(&'a mut B) -> R, ) -> R
where Self: BorrowMut<B>, B: 'a + ?Sized, R: 'a,

Mutably borrows self, then passes self.borrow_mut() into the pipe function. Read more
§

fn pipe_as_ref<'a, U, R>(&'a self, func: impl FnOnce(&'a U) -> R) -> R
where Self: AsRef<U>, U: 'a + ?Sized, R: 'a,

Borrows self, then passes self.as_ref() into the pipe function.
§

fn pipe_as_mut<'a, U, R>(&'a mut self, func: impl FnOnce(&'a mut U) -> R) -> R
where Self: AsMut<U>, U: 'a + ?Sized, R: 'a,

Mutably borrows self, then passes self.as_mut() into the pipe function.
§

fn pipe_deref<'a, T, R>(&'a self, func: impl FnOnce(&'a T) -> R) -> R
where Self: Deref<Target = T>, T: 'a + ?Sized, R: 'a,

Borrows self, then passes self.deref() into the pipe function.
§

fn pipe_deref_mut<'a, T, R>( &'a mut self, func: impl FnOnce(&'a mut T) -> R, ) -> R
where Self: DerefMut<Target = T> + Deref, T: 'a + ?Sized, R: 'a,

Mutably borrows self, then passes self.deref_mut() into the pipe function.
§

impl<T> Pointable for T

§

const ALIGN: usize

The alignment of pointer.
§

type Init = T

The type for initializers.
§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
§

impl<SS, SP> SupersetOf<SS> for SP
where SS: SubsetOf<SP>,

§

fn to_subset(&self) -> Option<SS>

The inverse inclusion map: attempts to construct self from the equivalent element of its superset. Read more
§

fn is_in_subset(&self) -> bool

Checks if self is actually part of its subset T (and can be converted to it).
§

fn to_subset_unchecked(&self) -> SS

Use with care! Same as self.to_subset but without any property checks. Always succeeds.
§

fn from_subset(element: &SS) -> SP

The inclusion map: converts self to the equivalent element of its superset.
§

impl<T> Tap for T

§

fn tap(self, func: impl FnOnce(&Self)) -> Self

Immutable access to a value. Read more
§

fn tap_mut(self, func: impl FnOnce(&mut Self)) -> Self

Mutable access to a value. Read more
§

fn tap_borrow<B>(self, func: impl FnOnce(&B)) -> Self
where Self: Borrow<B>, B: ?Sized,

Immutable access to the Borrow<B> of a value. Read more
§

fn tap_borrow_mut<B>(self, func: impl FnOnce(&mut B)) -> Self
where Self: BorrowMut<B>, B: ?Sized,

Mutable access to the BorrowMut<B> of a value. Read more
§

fn tap_ref<R>(self, func: impl FnOnce(&R)) -> Self
where Self: AsRef<R>, R: ?Sized,

Immutable access to the AsRef<R> view of a value. Read more
§

fn tap_ref_mut<R>(self, func: impl FnOnce(&mut R)) -> Self
where Self: AsMut<R>, R: ?Sized,

Mutable access to the AsMut<R> view of a value. Read more
§

fn tap_deref<T>(self, func: impl FnOnce(&T)) -> Self
where Self: Deref<Target = T>, T: ?Sized,

Immutable access to the Deref::Target of a value. Read more
§

fn tap_deref_mut<T>(self, func: impl FnOnce(&mut T)) -> Self
where Self: DerefMut<Target = T> + Deref, T: ?Sized,

Mutable access to the Deref::Target of a value. Read more
§

fn tap_dbg(self, func: impl FnOnce(&Self)) -> Self

Calls .tap() only in debug builds, and is erased in release builds.
§

fn tap_mut_dbg(self, func: impl FnOnce(&mut Self)) -> Self

Calls .tap_mut() only in debug builds, and is erased in release builds.
§

fn tap_borrow_dbg<B>(self, func: impl FnOnce(&B)) -> Self
where Self: Borrow<B>, B: ?Sized,

Calls .tap_borrow() only in debug builds, and is erased in release builds.
§

fn tap_borrow_mut_dbg<B>(self, func: impl FnOnce(&mut B)) -> Self
where Self: BorrowMut<B>, B: ?Sized,

Calls .tap_borrow_mut() only in debug builds, and is erased in release builds.
§

fn tap_ref_dbg<R>(self, func: impl FnOnce(&R)) -> Self
where Self: AsRef<R>, R: ?Sized,

Calls .tap_ref() only in debug builds, and is erased in release builds.
§

fn tap_ref_mut_dbg<R>(self, func: impl FnOnce(&mut R)) -> Self
where Self: AsMut<R>, R: ?Sized,

Calls .tap_ref_mut() only in debug builds, and is erased in release builds.
§

fn tap_deref_dbg<T>(self, func: impl FnOnce(&T)) -> Self
where Self: Deref<Target = T>, T: ?Sized,

Calls .tap_deref() only in debug builds, and is erased in release builds.
§

fn tap_deref_mut_dbg<T>(self, func: impl FnOnce(&mut T)) -> Self
where Self: DerefMut<Target = T> + Deref, T: ?Sized,

Calls .tap_deref_mut() only in debug builds, and is erased in release builds.
§

impl<T> TryConv for T

§

fn try_conv<T>(self) -> Result<T, Self::Error>
where Self: TryInto<T>,

Attempts to convert self into T using TryInto<T>. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

§

fn vzip(self) -> V

§

impl<T> WithSubscriber for T

§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a [WithDispatch] wrapper. Read more
§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a [WithDispatch] wrapper. Read more
§

impl<G1, G2> Within<G2> for G1
where G2: Contains<G1>,

§

fn is_within(&self, b: &G2) -> bool

§

impl<T> Any for T
where T: Any,

§

impl<T> ErasedDestructor for T
where T: 'static,

§

impl<T> Fruit for T
where T: Send + Downcast,

§

impl<T> MaybeSend for T
where T: Send,

§

impl<T> MaybeSendSync for T