Module df_optimizer

Source
Expand description

Datafusion optimizer for flow plan

Structsยง

AvgExpandRule ๐Ÿ”’
CheckGroupByRule ๐Ÿ”’
This rule check all group by exprs, and make sure they are also in select clause in a aggr query
ExpandAvgRewriter ๐Ÿ”’
rewrite avg(<expr>) function into CASE WHEN count(<expr>) !=0 THEN cast(sum((<expr>) AS avg_return_type)/count((<expr>) ELSE 0
FindColumn ๐Ÿ”’
Find all column names in a plan
TumbleExpand
This is a placeholder for tumble_start and tumble_end function, so that datafusion can recognize them as scalar function
TumbleExpandRule ๐Ÿ”’
expand tumble in aggr expr to tumble_start and tumble_end with column name like window_start

Functionsยง

apply_df_optimizer
check_group_by_analyzer ๐Ÿ”’
make sure everything in group byโ€™s expr is in select
expand_avg_analyzer ๐Ÿ”’
expand avg(<expr>) function into cast(sum((<expr>) AS f64)/count((<expr>)
expand_tumble_analyzer ๐Ÿ”’
expand tumble in aggr expr to tumble_start and tumble_end, also expand related alias and column ref
put_aggr_to_proj_analyzer ๐Ÿ”’
lift aggrโ€™s composite aggr_expr to outer proj, and leave aggr only with simple direct aggr expr i.e.
sql_to_flow_plan
To reuse existing code for parse sql, the sql is first parsed into a datafusion logical plan, then to a substrait plan, and finally to a flow plan.