-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-deprecated support for planning SQL without DDL, deprecate some more SessionContext methods #4721
Conversation
beef643
to
e733070
Compare
e733070
to
c6291ff
Compare
|
||
if debug { | ||
println!("=== Logical plan ===\n{:?}\n", plan); | ||
} | ||
|
||
let plan = ctx.optimize(&plan)?; | ||
let plan = ctx.dataframe(plan).await?.into_optimized_plan()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the new pattern to get an optimized plan (rather than calling ctx.optimize
directly
/// query support (no way to create external tables, for example) | ||
/// | ||
/// This method is `async` because queries of type `CREATE | ||
/// EXTERNAL TABLE` might require the schema to be inferred. | ||
pub async fn sql(&self, sql: &str) -> Result<DataFrame> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the core SessionContext::sql API does not change
|
||
/// Creates a [`DataFrame`] that will execute the specified | ||
/// LogicalPlan, including DDL such as (such as `CREATE TABLE`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// LogicalPlan, including DDL such as (such as `CREATE TABLE`). | |
/// LogicalPlan, including DDL (such as `CREATE TABLE`). |
/// `CREATE TABLE` | ||
/// | ||
/// Use [`Self::dataframe`] to run plans with DDL | ||
pub fn dataframe_without_ddl(&self, plan: LogicalPlan) -> Result<DataFrame> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the API to support #4720
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This appears to just be DataFrame new, do we really need this?
@@ -1097,15 +1097,15 @@ impl DefaultPhysicalPlanner { | |||
// TABLE" -- it must be handled at a higher level (so | |||
// that the appropriate table can be registered with | |||
// the context) | |||
Err(DataFusionError::Internal( | |||
Err(DataFusionError::Plan( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are not really "internal errors" as they can be triggered by trying to run a sql query that contains DDL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Imo this is a footgun we should aim to remove, I have a plan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#4721 (comment) are the key usecases -- as long as they are possible / easy / well documented it will be great
c6291ff
to
052caeb
Compare
052caeb
to
b9ff071
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm really not a fan of introducing more complexity into this interface, part of the current mess is from there being 14 different ways to do everything, this appears to be putting back a quirk I'm actively trying to remove.
I think IOx should use the lower-level APIs, such as SessionState and DFParser, longer-term I don't think it should be using the interior mutable SessionContext at all
pub fn create_logical_plan(&self, sql: &str) -> Result<LogicalPlan> { | ||
|
||
/// Creates a [`LogicalPlan`] from a SQL query. | ||
pub fn plan_sql(&self, sql: &str) -> Result<LogicalPlan> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again this appears to just call DFParser followed by the query planner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is also problematic for the same reason that create_logical_plan is problematic - it returns a LogicalPlan without any mechanism to optimise/execute against the same state
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again this appears to just call DFParser followed by the query planner
Yes that is exactly what it does.
There needs to be some way for users to create a LogicalPlan and get datafusion to optimize and run it properly it (e.g. if the user makes the LogicalPlan directly from their own query language such as influxrpc or VegaFusion)
I think some key capabilities for all users of DataFusion (including IOx) are:
As long as those are possible and well documented I do not have strong opinions on the API |
Marking as a Draft as I think @tustvold plans an alternate proposal and we can continue to us the deprecated APIs in IOx for the time being |
I will work on this after christmas (27th). I think the key thing is to not be mixing high-level APIs i.e. SessionContext with low-level concepts LogicalPlan. The key thing is to ensure that planning, optimisation and execution take place against the same SessionState. The interior mutability of SessionContext makes this impossible |
POC in https://github.com/influxdata/influxdb_iox/pull/6469 - the TLDR is to use SessionState directly |
This PR has been rendered obsolete via #4750 |
Which issue does this PR close?
Closes #4720
Part of #4617
Rationale for this change
More details on ticket #4720
Basically this moves the planning / execution in DataFusion to be based on DataFrame rather than some sort of mx of SessionContext / SessionState / DataFrame. Started by @tustvold in #4679
What changes are included in this PR?
SessionContext::dataframe_without_ddl
SessionContext::optimize
andSessionContext::create_physical_plan
Are these changes tested?
Yes, existing tests
Are there any user-facing changes?