Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

substrait generated by Apache Calcite does not run in DataFusion #14831

Open
alamb opened this issue Feb 22, 2025 · 4 comments
Open

substrait generated by Apache Calcite does not run in DataFusion #14831

alamb opened this issue Feb 22, 2025 · 4 comments
Labels
bug Something isn't working

Comments

@alamb
Copy link
Contributor

alamb commented Feb 22, 2025

Describe the bug

This is a report from @lmwnshn

As part of a series of issues that were discovered at CMU while working on DataFusion

Basically they found that DataFusion couldn't run substrait generated by Apache Calcite

Quoting the readme: https://github.com/lmwnshn/15799-s25-project1-remnants/blob/main/README.md

The idea behind this project was to:

  1. Convert SQL to a Substrait plan with Calcite.

  2. Execute the Substrait plan on different systems (e.g., DuckDB, DataFusion).
    Students would then experiment with Calcite's different rules. However, this project idea didn't quite work for the following reasons:

  3. substrait-java does not run optimizations - while in theory this can be enabled, I recall running into some issues.

  4. More critically, at the time of writing, both DuckDB and DataFusion have limited support for executing the Substrait generated by Calcite.

To Reproduce

See the code at https://github.com/lmwnshn/15799-s25-project1-remnants

Note I haven't looked at any of this myself so I won't be able to help with specific questionds

Expected behavior

datafusion should work for all the plans

Additional context

No response

@lmwnshn
Copy link

lmwnshn commented Feb 23, 2025

I think people can look directly at Substrait's consumer-testing repo for DataFusion if they want to fix this :) My repo is just a stripped-down rewrite for students.

Some relevant links:

@alamb
Copy link
Contributor Author

alamb commented Feb 23, 2025

FYI @vbarua and @Blizzara -- I think this may be relevant to your usecase

@alamb
Copy link
Contributor Author

alamb commented Feb 23, 2025

@niebayes said they may be interested in this:

@niebayes
Copy link
Contributor

Query federation is amazing. I will first look at the Substrait's consumer-testing repo for DataFusion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants