Skip to content

Commit

Permalink
Add docs about filtering and indexes on numeric columns. (apache#4035)
Browse files Browse the repository at this point in the history
  • Loading branch information
gianm authored and fjy committed Mar 10, 2017
1 parent adbe89e commit cab2e2f
Show file tree
Hide file tree
Showing 2 changed files with 47 additions and 9 deletions.
5 changes: 4 additions & 1 deletion docs/content/ingestion/schema-design.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,11 @@ Below, we outline some best practices with schema design:

If the user wishes to ingest a column as a numeric-typed dimension (Long or Float), it is necessary to specify the type of the column in the `dimensions` section of the `dimensionsSpec`. If the type is omitted, Druid will ingest a column as the default String type.

See [Dimension Schema](../ingestion/index.html#dimension-schema) for more information.
There are performance tradeoffs between string and numeric columns. Numeric columns are generally faster to group on
than string columns. But unlike string columns, numeric columns don't have indexes, so they are generally slower to
filter on.

See [Dimension Schema](../ingestion/index.html#dimension-schema) for more information.

## High cardinality dimensions (e.g. unique IDs)

Expand Down
51 changes: 43 additions & 8 deletions docs/content/querying/filters.md
Original file line number Diff line number Diff line change
Expand Up @@ -392,20 +392,53 @@ The following matches dimension values in `[product_1, product_3, product_5]` fo
}
```

### Filtering on the Timestamp Column
Filters can also be applied to the timestamp column. The timestamp column has long millisecond values.
## Column types

To refer to the timestamp column, use the string `__time` as the dimension name.
Druid supports filtering on timestamp, string, long, and float columns.

The filter parameters (e.g., the selector value for the SelectorFilter) should be provided as Strings.
Note that only string columns have bitmap indexes. Therefore, queries that filter on other column types will need to
scan those columns.

If the user wishes to interpret the timestamp with a specific format, timezone, or locale, the [Time Format Extraction Function](./dimensionspecs.html#time-format-extraction-function) is useful.
### Filtering on numeric columns

Note that the timestamp column does not have a bitmap index. Thus, filtering on timestamp in a query requires a scan of the column, and performance will be affected accordingly. If possible, excluding time ranges by specifying the query interval will be faster.
When filtering on numeric columns, you can write filters as if they were strings. In most cases, your filter will be
converted into a numeric predicate and will be applied to the numeric column values directly. In some cases (such as
the "regex" filter) the numeric column values will be converted to strings during the scan.

**Example**
For example, filtering on a specific value, `myFloatColumn = 10.1`:

```json
"filter": {
"type": "selector",
"dimension": "myFloatColumn",
"value": "10.1"
}
```

Filtering on a range of values, `10 <= myFloatColumn < 20`:

```json
"filter": {
"type": "bound",
"dimension": "myFloatColumn",
"ordering": "numeric",
"lowerBound": "10",
"lowerStrict": false,
"upperBound": "20",
"upperStrict": true
}
```

### Filtering on the Timestamp Column

Query filters can also be applied to the timestamp column. The timestamp column has long millisecond values. To refer
to the timestamp column, use the string `__time` as the dimension name. Like numeric dimensions, timestamp filters
should be specified as if the timestamp values were strings.

If the user wishes to interpret the timestamp with a specific format, timezone, or locale, the [Time Format Extraction Function](./dimensionspecs.html#time-format-extraction-function) is useful.

For example, filtering on a long timestamp value:

Filtering on a long timestamp value:
```json
"filter": {
"type": "selector",
Expand All @@ -415,6 +448,7 @@ Filtering on a long timestamp value:
```

Filtering on day of week:

```json
"filter": {
"type": "selector",
Expand All @@ -430,6 +464,7 @@ Filtering on day of week:
```

Filtering on a set of ISO 8601 intervals:

```json
{
"type" : "interval",
Expand Down

0 comments on commit cab2e2f

Please sign in to comment.