Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL-2620: Add support for UNION distinct #40

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 0 additions & 8 deletions errors.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,6 @@ The following errors occur when something goes wrong while converting the SQL qu
|---------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| [Error 3002](#error-3002) | A SELECT list with multiple values cannot contain a non-namespaced `*` (i.e., `SELECT a, *, b FROM myTable` is not supported). A non-namespaced `*` must be used by itself. |
| [Error 3004](#error-3004) | The array data source contains an identifier. Array data sources must be constant. |
| [Error 3006](#error-3006) | Distinct UNION is not allowed. |
| [Error 3007](#error-3007) | A data source referenced in the SELECT list could not be found. |
| [Error 3008](#error-3008) | A field could not be found in any data source. |
| [Error 3009](#error-3009) | A field exists in multiple data sources and is ambiguous. |
Expand Down Expand Up @@ -211,13 +210,6 @@ The following errors occur when something goes wrong while using the excludeName
- **Common Causes:** Accessing a field in an array data source as shown by this query: `SELECT * FROM [{'a': foo.a}] AS arr`.
- **Resolution Steps:** Modify your array data source to only contain constants. Corrected example query: `SELECT * FROM [{'a': 34}] AS arr`.

### Error 3006

- **Description:** Distinct UNION is not allowed. You can only do `UNION ALL` (i.e., duplicate values always have to be allowed).
- **Common Causes:** Using `UNION` instead of `UNION ALL`. For example, the query `SELECT a FROM foo AS foo UNION SELECT b, c FROM bar AS bar`
causes this error.
- **Resolution Steps:** Only use `UNION ALL` when doing unions. Corrected example query: `SELECT a FROM foo AS foo UNION ALL SELECT b, c FROM bar AS bar`.

### Error 3007

- **Description:** A data source referenced in the SELECT list could not be found.
Expand Down
63 changes: 62 additions & 1 deletion mongosql/src/algebrizer/definitions.rs
Original file line number Diff line number Diff line change
Expand Up @@ -366,7 +366,68 @@ impl<'a> Algebrizer<'a> {

pub fn algebrize_set_query(&self, ast_node: ast::SetQuery) -> Result<mir::Stage> {
match ast_node.op {
ast::SetOperator::Union => Err(Error::DistinctUnion),
ast::SetOperator::Union => {
let union_all_stage = mir::Stage::Set(mir::Set {
operation: mir::SetOperation::UnionAll,
left: Box::new(self.algebrize_query(*ast_node.left)?),
right: Box::new(self.algebrize_query(*ast_node.right)?),
cache: SchemaCache::new(),
});

let union_result_set = union_all_stage.schema(&self.schema_inference_state())?;
let datasources: BTreeSet<Key> = union_result_set
.schema_env
.keys()
.filter(|key| key.scope == self.scope_level)
.cloned()
.collect();

// Create group keys for each datasource
let group_keys: Vec<OptionallyAliasedExpr> = datasources
.iter()
.enumerate()
.map(|(i, key)| {
OptionallyAliasedExpr::Aliased(AliasedExpr {
alias: format!("__groupKey{}", i),
expr: Expression::Reference(ReferenceExpr { key: key.clone() }),
})
})
.collect();

// Adding group stage to union all to remove duplicates
let group_stage = mir::Stage::Group(mir::Group {
source: Box::new(union_all_stage),
keys: group_keys,
aggregations: vec![],
cache: SchemaCache::new(),
scope: self.scope_level,
});

let mut project_expression = BindingTuple::new();
for (i, key) in datasources.iter().enumerate() {
project_expression.insert(
key.clone(),
Expression::FieldAccess(FieldAccess {
expr: Box::new(Expression::Reference(ReferenceExpr {
key: Key::bot(self.scope_level),
})),
field: format!("__groupKey{}", i),
// Setting is_nullable to true because the result set coming into the
// UNION clause could be empty.
is_nullable: true,
}),
);
}

let project_stage = mir::Stage::Project(mir::Project {
source: Box::new(group_stage),
expression: project_expression,
is_add_fields: false,
cache: SchemaCache::new(),
});

schema_check_return!(self, project_stage)
}
ast::SetOperator::UnionAll => schema_check_return!(
self,
mir::Stage::Set(mir::Set {
Expand Down
4 changes: 0 additions & 4 deletions mongosql/src/algebrizer/errors.rs
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@ use std::collections::HashSet;
pub enum Error {
NonStarStandardSelectBody,
ArrayDatasourceMustBeLiteral,
DistinctUnion,
NoSuchDatasource(DatasourceName),
FieldNotFound(String, Option<Vec<String>>, ClauseType, u16),
AmbiguousField(String, ClauseType, u16),
Expand Down Expand Up @@ -64,7 +63,6 @@ impl UserError for Error {
match self {
Error::NonStarStandardSelectBody => 3002,
Error::ArrayDatasourceMustBeLiteral => 3004,
Error::DistinctUnion => 3006,
Error::NoSuchDatasource(_) => 3007,
Error::FieldNotFound(_, _, _, _) => 3008,
Error::AmbiguousField(_, _, _) => 3009,
Expand Down Expand Up @@ -95,7 +93,6 @@ impl UserError for Error {
match self {
Error::NonStarStandardSelectBody => None,
Error::ArrayDatasourceMustBeLiteral => None,
Error::DistinctUnion => None,
Error::NoSuchDatasource(_) => None,
Error::FieldNotFound(field, found_fields, clause_type, scope_level) => {
if let Some(possible_fields) = found_fields {
Expand Down Expand Up @@ -183,7 +180,6 @@ impl UserError for Error {
match self{
Error::NonStarStandardSelectBody => "standard SELECT expressions can only contain *".to_string(),
Error::ArrayDatasourceMustBeLiteral => "array datasource must be constant".to_string(),
Error::DistinctUnion => "UNION DISTINCT not allowed".to_string(),
Error::NoSuchDatasource(datasource_name) => format!("no such datasource: {0:?}", datasource_name),
Error::FieldNotFound(field, _, clause_type, scope_level) => format!("field `{}` in the `{}` clause at the {} scope level cannot be resolved to any datasource", field, clause_type, scope_level),
Error::AmbiguousField(field, clause_type, scope_level) => format!("ambiguous field `{}` in the `{}` clause at the {} scope level", field, clause_type, scope_level),
Expand Down
Loading