diff --git a/benchmarks/src/main/java/io/druid/benchmark/query/SqlBenchmark.java b/benchmarks/src/main/java/io/druid/benchmark/query/SqlBenchmark.java index 53fa05f1c764..e9ea84fa8bd0 100644 --- a/benchmarks/src/main/java/io/druid/benchmark/query/SqlBenchmark.java +++ b/benchmarks/src/main/java/io/druid/benchmark/query/SqlBenchmark.java @@ -45,6 +45,7 @@ import io.druid.segment.serde.ComplexMetrics; import io.druid.sql.calcite.planner.Calcites; import io.druid.sql.calcite.planner.PlannerConfig; +import io.druid.sql.calcite.rel.QueryMaker; import io.druid.sql.calcite.table.DruidTable; import io.druid.sql.calcite.util.CalciteTests; import io.druid.sql.calcite.util.SpecificSegmentsQuerySegmentWalker; @@ -133,9 +134,8 @@ public void setup() throws Exception final Map tableMap = ImmutableMap.of( "foo", new DruidTable( - walker, + new QueryMaker(walker, plannerConfig), new TableDataSource("foo"), - plannerConfig, ImmutableMap.of( "__time", ValueType.LONG, "dimSequential", ValueType.STRING, diff --git a/docs/content/configuration/broker.md b/docs/content/configuration/broker.md index 079aba4d3d7c..a70083473622 100644 --- a/docs/content/configuration/broker.md +++ b/docs/content/configuration/broker.md @@ -101,6 +101,7 @@ The broker's [SQL planner](../querying/sql.html) can be configured through the f |Property|Description|Default| |--------|-----------|-------| +|`druid.sql.planner.maxQueryCount`|Maximum number of queries to issue, including nested queries. Set to 1 to disable sub-queries, or set to 0 for unlimited.|8| |`druid.sql.planner.maxSemiJoinRowsInMemory`|Maximum number of rows to keep in memory for executing two-stage semi-join queries like `SELECT * FROM Employee WHERE DeptName IN (SELECT DeptName FROM Dept)`.|100000| |`druid.sql.planner.maxTopNLimit`|Maximum threshold for a [TopN query](../querying/topnquery.html). Higher limits will be planned as [GroupBy queries](../querying/groupbyquery.html) instead.|100000| |`druid.sql.planner.metadataRefreshPeriod`|Throttle for metadata refreshes.|PT1M| diff --git a/docs/content/querying/groupbyquery.md b/docs/content/querying/groupbyquery.md index c1cc7ebf7942..a33da89f3219 100644 --- a/docs/content/querying/groupbyquery.md +++ b/docs/content/querying/groupbyquery.md @@ -156,6 +156,12 @@ indexing mechanism, and runs the outer query on these materialized results. "v2" inner query's results stream with off-heap fact map and on-heap string dictionary that can spill to disk. Both strategy perform the outer query on the broker in a single-threaded fashion. +Note that groupBys require a separate merge buffer on the broker for each layer beyond the first layer of the groupBy. +With the v2 groupBy strategy, this can potentially lead to deadlocks for groupBys nested beyond two layers, since the +merge buffers are limited in number and are acquired one-by-one and not as a complete set. At this time we recommend +that you avoid deeply-nested groupBys with the v2 strategy. Doubly-nested groupBys (groupBy -> groupBy -> table) are +safe and do not suffer from this issue. + #### Server configuration When using the "v1" strategy, the following runtime properties apply: diff --git a/docs/content/querying/sql.md b/docs/content/querying/sql.md index 98da2be86aab..1e56b58fe856 100644 --- a/docs/content/querying/sql.md +++ b/docs/content/querying/sql.md @@ -77,6 +77,20 @@ If `druid.sql.planner.useFallback` is enabled, full SQL is possible on metadata recommended in production since it can generate unscalable query plans. The JDBC driver allows accessing table and column metadata through `connection.getMetaData()` even if useFallback is off. +### Approximate queries + +The following SQL queries and features may be executed using approximate algorithms: + +- `COUNT(DISTINCT col)` aggregations use [HyperLogLog](http://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf), a +fast approximate distinct counting algorithm. If you need exact distinct counts, you can instead use +`SELECT COUNT(*) FROM (SELECT DISTINCT col FROM druid.foo)`, which will use a slower and more resource intensive exact +algorithm. +- TopN-style queries with a single grouping column, like +`SELECT col1, SUM(col2) FROM druid.foo GROUP BY col1 ORDER BY SUM(col2) DESC LIMIT 100`, by default will be executed +as [TopN queries](topnquery.html), which use an approximate algorithm. To disable this behavior, and use exact +algorithms for topN-style queries, set +[druid.sql.planner.useApproximateTopN](../configuration/broker.html#sql-planner-configuration) to "false". + ### Time functions Druid's SQL language supports a number of time operations, including: @@ -85,12 +99,33 @@ Druid's SQL language supports a number of time operations, including: - `EXTRACT( FROM __time)` for grouping or filtering on time parts, like `SELECT EXTRACT(HOUR FROM __time), SUM(cnt) FROM druid.foo GROUP BY EXTRACT(HOUR FROM __time)` - Comparisons to `TIMESTAMP '