Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(substrait): remove dependency on datafusion default features #13594

Merged
merged 2 commits into from
Dec 4, 2024

Conversation

notfilippo
Copy link
Contributor

Which issue does this PR close?

Closes #13593

What changes are included in this PR?

  • physical feature, which enables production and consumption of physical substrait plans.

Are these changes tested?

Yes, running cargo tree -p datafusion-substrait with and without the --no-default-features shows that parquet and many other dependencies are not included.

Are there any user-facing changes?

No, the new feature is enabled by default. Users who might want to benefit from this change will have to use default-features = false when including the datafusion-substrait crate.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @notfilippo -- this looks like a good change to me.

I wonder if there is some way we can add a CI test for this (to avoid breaking it in the future) -- perhaps something like

run: cargo check --all-targets --no-default-features -p datafusion-common

@alamb
Copy link
Contributor

alamb commented Dec 2, 2024

I think we need to

  1. Add this to CI (suggested diff below)
  2. Make sure it actually compiles
index 50bebc5b4..0b2338cd3 100644
--- a/.github/workflows/rust.yml
+++ b/.github/workflows/rust.yml
@@ -80,9 +80,12 @@ jobs:
       - name: Check datafusion-common without default features
         run: cargo check --all-targets --no-default-features -p datafusion-common

-      - name: Check datafusion-functions
+      - name: Check datafusion-functions without default features
         run: cargo check --all-targets --no-default-features -p datafusion-functions

+      - name: Check datafusion-substrait without default features
+        run: cargo check --all-targets --no-default-features -p datafusion-substrait
+
       - name: Check workspace in debug mode
         run: cargo check --all-targets --workspace

I tested with cargo check --all-targets --no-default-features -p datafusion-substrait and the crate doesn't compile:

andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion$ cargo check --all-targets --no-default-features -p datafusion-substrait
warning: function `coerce_file_schema_to_view_type` is never used
   --> datafusion/core/src/datasource/file_format/mod.rs:428:15
    |
428 | pub(crate) fn coerce_file_schema_to_view_type(
    |               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |
    = note: `#[warn(dead_code)]` on by default

warning: function `coerce_file_schema_to_string_type` is never used
   --> datafusion/core/src/datasource/file_format/mod.rs:492:15
    |
492 | pub(crate) fn coerce_file_schema_to_string_type(
    |               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

warning: `datafusion` (lib) generated 2 warnings
    Checking datafusion-substrait v43.0.0 (/Users/andrewlamb/Software/datafusion/datafusion/substrait)
error[E0432]: unresolved import `datafusion::datasource::physical_plan::ParquetExec`
  --> datafusion/substrait/tests/cases/roundtrip_physical_plan.rs:25:61
   |
25 | use datafusion::datasource::physical_plan::{FileScanConfig, ParquetExec};
   |                                                             ^^^^^^^^^^^ no `ParquetExec` in `datasource::physical_plan`
   |
note: found an item that was configured out
  --> /Users/andrewlamb/Software/datafusion/datafusion/core/src/datasource/physical_plan/mod.rs:34:25
   |
34 | pub use self::parquet::{ParquetExec, ParquetFileMetrics, ParquetFileReaderFactory};
   |                         ^^^^^^^^^^^
note: the item is gated behind the `parquet` feature
  --> /Users/andrewlamb/Software/datafusion/datafusion/core/src/datasource/physical_plan/mod.rs:33:7
   |
33 | #[cfg(feature = "parquet")]
   |       ^^^^^^^^^^^^^^^^^^^

error[E0432]: unresolved import `datafusion_substrait::physical_plan`
  --> datafusion/substrait/tests/cases/roundtrip_physical_plan.rs:29:27
   |
29 | use datafusion_substrait::physical_plan::{consumer, producer};
   |                           ^^^^^^^^^^^^^ could not find `physical_plan` in `datafusion_substrait`
   |
note: found an item that was configured out
  --> /Users/andrewlamb/Software/datafusion/datafusion/substrait/src/lib.rs:79:9
   |
79 | pub mod physical_plan;
   |         ^^^^^^^^^^^^^
note: the item is gated behind the `physical` feature
  --> /Users/andrewlamb/Software/datafusion/datafusion/substrait/src/lib.rs:78:7
   |
78 | #[cfg(feature = "physical")]
   |       ^^^^^^^^^^^^^^^^^^^^

error[E0425]: cannot find function `parquet_test_data` in module `datafusion::test_util`
   --> datafusion/substrait/tests/cases/roundtrip_physical_plan.rs:154:43
    |
154 |     let testdata = datafusion::test_util::parquet_test_data();
    |                                           ^^^^^^^^^^^^^^^^^ help: a function with a similar name exists: `arrow_test_data`
    |
   ::: /Users/andrewlamb/Software/datafusion/datafusion/common/src/test_util.rs:201:1
    |
201 | pub fn arrow_test_data() -> String {
    | ---------------------------------- similarly named function `arrow_test_data` defined here
    |
note: found an item that was configured out
   --> /Users/andrewlamb/Software/datafusion/datafusion/core/src/test_util/mod.rs:60:39
    |
60  | pub use datafusion_common::test_util::parquet_test_data;
    |                                       ^^^^^^^^^^^^^^^^^
note: the item is gated behind the `parquet` feature
   --> /Users/andrewlamb/Software/datafusion/datafusion/core/src/test_util/mod.rs:59:7
    |
59  | #[cfg(feature = "parquet")]
    |       ^^^^^^^^^^^^^^^^^^^

error[E0599]: no method named `register_parquet` found for struct `datafusion::prelude::SessionContext` in the current scope
   --> datafusion/substrait/tests/cases/roundtrip_physical_plan.rs:145:9
    |
145 |     ctx.register_parquet("data", "tests/testdata/data.parquet", explicit_options)
    |         ^^^^^^^^^^^^^^^^
    |
help: there is a method `register_arrow` with a similar name
    |
145 |     ctx.register_arrow("data", "tests/testdata/data.parquet", explicit_options)
    |         ~~~~~~~~~~~~~~

error[E0599]: no method named `register_parquet` found for struct `datafusion::prelude::SessionContext` in the current scope
   --> datafusion/substrait/tests/cases/roundtrip_physical_plan.rs:155:9
    |
155 |     ctx.register_parquet(
    |     ----^^^^^^^^^^^^^^^^
    |
help: there is a method `register_arrow` with a similar name
    |
155 |     ctx.register_arrow(
    |         ~~~~~~~~~~~~~~

Some errors have detailed explanations: E0425, E0432, E0599.
For more information about an error, try `rustc --explain E0425`.
error: could not compile `datafusion-substrait` (test "substrait_integration") due to 5 previous errors
andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion$

@alamb alamb marked this pull request as draft December 2, 2024 18:23
@alamb
Copy link
Contributor

alamb commented Dec 2, 2024

Marking as draft as I think this PR is no longer waiting on feedback. Please mark it as ready for review when it is ready for another look

@notfilippo notfilippo marked this pull request as ready for review December 3, 2024 13:30
@github-actions github-actions bot added the development-process Related to development process of DataFusion label Dec 3, 2024
@notfilippo
Copy link
Contributor Author

notfilippo commented Dec 3, 2024

Thanks @alamb – Weirdly enough I didn't receive any pings from GitHub about activity on this issue / PR... Anyway thanks for pointing out the issue with the roundtrip test. The fix I posted should be enough.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @notfilippo -- makes sense to me

@alamb alamb merged commit 11d49b4 into apache:main Dec 4, 2024
27 checks passed
@alamb
Copy link
Contributor

alamb commented Dec 4, 2024

Thanks again @notfilippo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
development-process Related to development process of DataFusion substrait
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[substrait] make dependency on parquet optional
2 participants