Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support unparsing LogicalPlan::Extension to SQL tesxt #13753

Closed
goldmedal opened this issue Dec 13, 2024 · 9 comments · Fixed by #13880
Closed

Support unparsing LogicalPlan::Extension to SQL tesxt #13753

goldmedal opened this issue Dec 13, 2024 · 9 comments · Fixed by #13880
Assignees
Labels
enhancement New feature or request

Comments

@goldmedal
Copy link
Contributor

Is your feature request related to a problem or challenge?

LogicalPlan::Extension allows the user to implement their custom logical plan. I think it makes sense to allow to define custom unparsing behavior for it.

Describe the solution you'd like

I would like to introduce a new method for UserDefinedLogicalNode

    /// User-defined nodes can override this method to provide a custom
    /// implementation for the unparser.
    fn unparse(
        &self,
        unparser: &Unparser,
        query: &mut Option<QueryBuilder>,
        select: &mut Option<SelectBuilder>,
        relation: &mut Option<RelationBuilder>,
        table_with_joins: &mut Option<TableWithJoinsBuilder>,
    ) -> Result<()> {
        not_impl_err!("custom unparsing not implemented")
    }

Then, we can handle LogicalPlan::Extension in the unparser like

    LogicalPlan::Extension(extension) => {
        extension.node.unparse(self, plan, query, Some(select), Some(relation), None)
    },

However, the required builders haven't been made public yet. UserDefinedLogicalNode can't access the builders.
As per the comment in ast.rs, we planned to move builders to sqlparser-rs.

//! This file contains builders to create SQL ASTs. They are purposefully
//! not exported as they will eventually be move to the SQLparser package.
//!
//!
//! See <https://github.com/apache/datafusion/issues/8661>

I'm contemplating whether this move is required. When implementing a new unparsing feature, we may need to add some additional helper functions to the builder (e.g., SelectBuilder::already_projection). If we move it to the upstream crate, it would be difficult to add helper functions when required.

I prefer to move builders to datafusion-common and make them public for the user and datafusion-expr, where the UserDefinedLogicalNode is located.

Describe alternatives you've considered

No response

Additional context

No response

@goldmedal goldmedal added the enhancement New feature or request label Dec 13, 2024
@goldmedal
Copy link
Contributor Author

@alamb @phillipleblanc @sgrebnov @jayzhan211 @findepi
What do you think?

@jayzhan211
Copy link
Contributor

I would said we made it public in datafusion-sql (or upstream to sql-parser) but not moved to datafusion-common. I think we should eliminate dependency to sql-parser in datafusion-common, we need to move those functions or structs that have dependency to sql-parser out of datafusion-common to datafusion-sql

@alamb
Copy link
Contributor

alamb commented Dec 13, 2024

I would said we made it public in datafusion-sql (or upstream to sql-parser) but not moved to datafusion-common. I think we should eliminate dependency to sql-parser in datafusion-common, we need to move those functions or structs that have dependency to sql-parser out of datafusion-common to datafusion-sql

I agree that making the builders / etc public in datafusion-sql is better than dumping more stuff in datafusion-common

Another potential thought might be to move the unparsing code into its own crate now given its non trivial complexity now. For example datafusion-unparser

Another potential approach that would avoid adding a dependency on UserDefinedLogicalNode(which is in datafusion-expr) on the unparser / sqlparser might be:

  1. Create a new Unparseable trait
  2. Have the unparser try and downcast the UserDefinedLogicalNode

So something like

pub trait Unparseable {
    /// User-defined nodes can override this method to provide a custom
    /// implementation for the unparser.
    fn unparse(
        &self,
        unparser: &Unparser,
        query: &mut Option<QueryBuilder>,
        select: &mut Option<SelectBuilder>,
        relation: &mut Option<RelationBuilder>,
        table_with_joins: &mut Option<TableWithJoinsBuilder>,
    ) -> Result<()> {
        not_impl_err!("custom unparsing not implemented")
    }
}

And then the unparser could do something like:

let user_defined_local_node: dyn &UserDefinedLogicalNode = ...;
// is the user defined node unparseable?
let Some(unparseable) = user_defined_local_node
  .as_any()
  .downcast_ref::<Unparseable>() else {
  return plan_err!("Node type {} does not implement Unparseable", user_defined_local_node.name())
}

let sql_nodes = unparseable.unparse(unparser, ....)

🤔

@phillipleblanc
Copy link
Contributor

phillipleblanc commented Dec 13, 2024

AFAIK Rust doesn't allow downcasting from a dyn object to a trait (i.e. Unparseable) - it needs to be the concrete type.

Creating a new datafusion-unparser crate makes sense to me, we could make the builders public there. However, having a dependency from datafusion-expr to datafusion-unparser still feels a bit weird (and also not possible due to circular dependencies?). Perhaps instead of adding an unparse function on UserDefinedLogicalNode, we could add a registry on the Unparser object that takes UserDefinedLogicalNode and tries to unparse it, similar to how the optimizer rules work.

i.e.

pub trait UserDefinedLogicalNodeUnparser {
    fn unparse(
        &self,
        node: &dyn UserDefinedLogicalNode,
        query: &mut Option<QueryBuilder>,
        select: &mut Option<SelectBuilder>,
        relation: &mut Option<RelationBuilder>,
        table_with_joins: &mut Option<TableWithJoinsBuilder>,
    ) -> Result<()>;
}

And then the Unparser has a Vec<Arc<dyn UserDefinedLogicalNodeUnparser>> that it uses to try to unparse UserDefinedLogicalNode

@alamb
Copy link
Contributor

alamb commented Dec 13, 2024

AFAIK Rust doesn't allow downcasting from a dyn object to a trait (i.e. Unparseable) - it needs to be the concrete type.

I think it my example unparseable would be &dyn Unparseable 👍

let user_defined_local_node: dyn &UserDefinedLogicalNode = ...;
// is the user defined node unparseable?
let Some(unparseable) = user_defined_local_node. // <---- this is `&dyn Unparseable` I think 
  .as_any()
  .downcast_ref::<Unparseable>() else {
  return plan_err!("Node type {} does not implement Unparseable", user_defined_local_node.name())
}

let sql_nodes = unparseable.unparse(unparser, ....)

@phillipleblanc
Copy link
Contributor

I tried playing around with this in the Rust playground and wasn't able to get it to downcast properly: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=96cc916e29db395f4ec61374d9a2ce33

use std::any::Any;

trait UserDefinedLogicalNode: Any {
    fn as_any(&self) -> &dyn Any;
}

trait Unparseable: Any {
    fn as_any(&self) -> &dyn Any;
}

struct MyCustomNode {}

impl UserDefinedLogicalNode for MyCustomNode {
    fn as_any(&self) -> &dyn Any { self }
}

impl Unparseable for MyCustomNode {
    fn as_any(&self) -> &dyn Any { self }
}

fn main() {
    let s = MyCustomNode {};
    let user_defined_local_node: &dyn UserDefinedLogicalNode = &s;
    
    // This fails to compile - "error[E0782]: expected a type, found a trait"
    // replacing with `.downcast_ref::<dyn Unparseable>()` fails to compile due to it being unsized
    // and replacing with `.downcast_ref::<&dyn Unparseable>()` compiles but doesn't work
    if let Some(_) = user_defined_local_node.as_any().downcast_ref::<Unparseable>() {
        println!("Downcast worked");
    } else {
        println!("Downcast didn't work")
    }
}

@goldmedal
Copy link
Contributor Author

I summarized the proposal dependency, it will be like this
graph

As @phillipleblanc said, Rust can't downcast to a dyn trait. I feel his proposal is good if we don't want to add the method to UserDefinedLogicalNode directly.

And then the Unparser has a Vec<Arc> that it uses to try to unparse UserDefinedLogicalNode

I guess this idea is similar to ExprPlanner for the logical planner.

/// This trait allows users to customize the behavior of the SQL planner
pub trait ExprPlanner: Debug + Send + Sync {

So, datafusion-unparser will provide the builders and the trait, UserDefinedLogicalNodeUnparser.

@alamb
Copy link
Contributor

alamb commented Dec 16, 2024

Sounds like we have a plan!

@goldmedal
Copy link
Contributor Author

I guess this idea is similar to ExprPlanner for the logical planner.

/// This trait allows users to customize the behavior of the SQL planner
pub trait ExprPlanner: Debug + Send + Sync {

So, datafusion-unparser will provide the builders and the trait, UserDefinedLogicalNodeUnparser.

I followed the design proposed by @phillipleblanc in #13880. I found we don't need to expose the builders for datafusion-expr. We can just public them in datafusion-sql for the user directly. So datafusion-unparser isn't required anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
4 participants