Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-deprecated support for planning SQL without DDL, deprecate some more SessionContext methods #4721

Closed
wants to merge 1 commit into from

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Dec 23, 2022

Which issue does this PR close?

Closes #4720

Part of #4617

Rationale for this change

More details on ticket #4720

Basically this moves the planning / execution in DataFusion to be based on DataFrame rather than some sort of mx of SessionContext / SessionState / DataFrame. Started by @tustvold in #4679

What changes are included in this PR?

  1. Change APIs to allow query planning without DDL execution: SessionContext::dataframe_without_ddl
  2. Add doc comments about the usecases
  3. Deprecate SessionContext::optimize and SessionContext::create_physical_plan
  4. Update errors in physical planner to return DataFusion::Error someone attempts to plan a DDL it is unsupported

Are these changes tested?

Yes, existing tests

Are there any user-facing changes?

  1. API is slightly different (different names)
  2. Error messages are more precise

@github-actions github-actions bot added the core Core DataFusion crate label Dec 23, 2022
@alamb alamb changed the title Non-deprecated support for planning SQL without DDL Non-deprecated support for planning SQL without DDL, deprecate some more SessionContext methods Dec 23, 2022
@alamb alamb force-pushed the alamb/clarify_interface branch from beef643 to e733070 Compare December 23, 2022 20:47
@alamb alamb added the api change Changes the API exposed to users of the crate label Dec 23, 2022
@alamb alamb force-pushed the alamb/clarify_interface branch from e733070 to c6291ff Compare December 23, 2022 20:55

if debug {
println!("=== Logical plan ===\n{:?}\n", plan);
}

let plan = ctx.optimize(&plan)?;
let plan = ctx.dataframe(plan).await?.into_optimized_plan()?;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the new pattern to get an optimized plan (rather than calling ctx.optimize directly

/// query support (no way to create external tables, for example)
///
/// This method is `async` because queries of type `CREATE
/// EXTERNAL TABLE` might require the schema to be inferred.
pub async fn sql(&self, sql: &str) -> Result<DataFrame> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the core SessionContext::sql API does not change


/// Creates a [`DataFrame`] that will execute the specified
/// LogicalPlan, including DDL such as (such as `CREATE TABLE`).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// LogicalPlan, including DDL such as (such as `CREATE TABLE`).
/// LogicalPlan, including DDL (such as `CREATE TABLE`).

/// `CREATE TABLE`
///
/// Use [`Self::dataframe`] to run plans with DDL
pub fn dataframe_without_ddl(&self, plan: LogicalPlan) -> Result<DataFrame> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the API to support #4720

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to just be DataFrame new, do we really need this?

@@ -1097,15 +1097,15 @@ impl DefaultPhysicalPlanner {
// TABLE" -- it must be handled at a higher level (so
// that the appropriate table can be registered with
// the context)
Err(DataFusionError::Internal(
Err(DataFusionError::Plan(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are not really "internal errors" as they can be triggered by trying to run a sql query that contains DDL

Copy link
Contributor

@tustvold tustvold Dec 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imo this is a footgun we should aim to remove, I have a plan

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#4721 (comment) are the key usecases -- as long as they are possible / easy / well documented it will be great

@alamb alamb requested a review from tustvold December 23, 2022 20:56
@alamb alamb force-pushed the alamb/clarify_interface branch from c6291ff to 052caeb Compare December 23, 2022 21:14
@alamb alamb force-pushed the alamb/clarify_interface branch from 052caeb to b9ff071 Compare December 23, 2022 21:45
@alamb alamb marked this pull request as ready for review December 23, 2022 22:48
@alamb alamb requested a review from andygrove December 23, 2022 22:48
Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really not a fan of introducing more complexity into this interface, part of the current mess is from there being 14 different ways to do everything, this appears to be putting back a quirk I'm actively trying to remove.

I think IOx should use the lower-level APIs, such as SessionState and DFParser, longer-term I don't think it should be using the interior mutable SessionContext at all

pub fn create_logical_plan(&self, sql: &str) -> Result<LogicalPlan> {

/// Creates a [`LogicalPlan`] from a SQL query.
pub fn plan_sql(&self, sql: &str) -> Result<LogicalPlan> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again this appears to just call DFParser followed by the query planner

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is also problematic for the same reason that create_logical_plan is problematic - it returns a LogicalPlan without any mechanism to optimise/execute against the same state

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again this appears to just call DFParser followed by the query planner

Yes that is exactly what it does.

There needs to be some way for users to create a LogicalPlan and get datafusion to optimize and run it properly it (e.g. if the user makes the LogicalPlan directly from their own query language such as influxrpc or VegaFusion)

@alamb
Copy link
Contributor Author

alamb commented Dec 24, 2022

I think some key capabilities for all users of DataFusion (including IOx) are:

  1. Create a LogicalPlan from some source (e.g your own frontend) and have datafusion plan / run that for you
  2. Read only SQL (no DDL / session state modifications via SQL)

As long as those are possible and well documented I do not have strong opinions on the API

@alamb alamb marked this pull request as draft December 24, 2022 12:33
@alamb
Copy link
Contributor Author

alamb commented Dec 24, 2022

Marking as a Draft as I think @tustvold plans an alternate proposal and we can continue to us the deprecated APIs in IOx for the time being

@tustvold
Copy link
Contributor

I will work on this after christmas (27th). I think the key thing is to not be mixing high-level APIs i.e. SessionContext with low-level concepts LogicalPlan. The key thing is to ensure that planning, optimisation and execution take place against the same SessionState. The interior mutability of SessionContext makes this impossible

@tustvold
Copy link
Contributor

POC in https://github.com/influxdata/influxdb_iox/pull/6469 - the TLDR is to use SessionState directly

@alamb
Copy link
Contributor Author

alamb commented Dec 28, 2022

This PR has been rendered obsolete via #4750

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api change Changes the API exposed to users of the crate core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Non-deprecated support for planning SQL without DDL
2 participants