flow

Module df_optimizer

Source
Expand description

Datafusion optimizer for flow plan

Structsยง

  • AvgExpandRule ๐Ÿ”’
  • CheckGroupByRule ๐Ÿ”’
    This rule check all group by exprs, and make sure they are also in select clause in a aggr query
  • ExpandAvgRewriter ๐Ÿ”’
    rewrite avg(<expr>) function into CASE WHEN count(<expr>) !=0 THEN cast(sum((<expr>) AS avg_return_type)/count((<expr>) ELSE 0
  • FindColumn ๐Ÿ”’
    Find all column names in a plan
  • This is a placeholder for tumble_start and tumble_end function, so that datafusion can recognize them as scalar function
  • TumbleExpandRule ๐Ÿ”’
    expand tumble in aggr expr to tumble_start and tumble_end with column name like window_start

Functionsยง

  • make sure everything in group byโ€™s expr is in select
  • expand avg(<expr>) function into cast(sum((<expr>) AS f64)/count((<expr>)
  • expand tumble in aggr expr to tumble_start and tumble_end, also expand related alias and column ref
  • lift aggrโ€™s composite aggr_expr to outer proj, and leave aggr only with simple direct aggr expr i.e.
  • To reuse existing code for parse sql, the sql is first parsed into a datafusion logical plan, then to a substrait plan, and finally to a flow plan.