Skip to content

Commit

Permalink
Merge pull request apache#2690 from jon-wei/filter_support
Browse files Browse the repository at this point in the history
Allow filters to use extraction functions
  • Loading branch information
fjy committed Apr 5, 2016
2 parents aaf40e6 + 0e481d6 commit 289bb6f
Show file tree
Hide file tree
Showing 48 changed files with 1,762 additions and 448 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,8 @@ public class BoundFilterBenchmark
String.valueOf(START_INT),
true,
false,
false
false,
null
)
);

Expand All @@ -85,7 +86,8 @@ public class BoundFilterBenchmark
String.valueOf(END_INT),
false,
false,
false
false,
null
)
);

Expand All @@ -96,7 +98,8 @@ public class BoundFilterBenchmark
String.valueOf(END_INT),
false,
false,
false
false,
null
)
);

Expand All @@ -107,7 +110,8 @@ public class BoundFilterBenchmark
String.valueOf(START_INT),
true,
false,
true
true,
null
)
);

Expand All @@ -118,7 +122,8 @@ public class BoundFilterBenchmark
String.valueOf(END_INT),
false,
false,
true
true,
null
)
);

Expand All @@ -129,7 +134,8 @@ public class BoundFilterBenchmark
String.valueOf(END_INT),
false,
false,
true
true,
null
)
);

Expand Down
2 changes: 1 addition & 1 deletion docs/content/querying/dimensionspecs.md
Original file line number Diff line number Diff line change
Expand Up @@ -259,7 +259,7 @@ For instance the following filter
```json
{
"filter": {
"type": "extraction",
"type": "selector",
"dimension": "product",
"value": "bar_1",
"extractionFn": {
Expand Down
62 changes: 61 additions & 1 deletion docs/content/querying/filters.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@ The grammar for a SELECTOR filter is as follows:

This is the equivalent of `WHERE <dimension_string> = '<dimension_value_string>'`.

The selector filter supports the use of extraction functions, see [Filtering with Extraction Functions](#filtering-with-extraction-functions) for details.

### Regular expression filter

The regular expression filter is similar to the selector filter, but using regular expressions. It matches the specified dimension with the given pattern. The pattern can be any standard [Java regular expression](http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html).
Expand All @@ -24,6 +26,9 @@ The regular expression filter is similar to the selector filter, but using regul
"filter": { "type": "regex", "dimension": <dimension_string>, "pattern": <pattern_string> }
```

The regex filter supports the use of extraction functions, see [Filtering with Extraction Functions](#filtering-with-extraction-functions) for details.


### Logical expression filters

#### AND
Expand Down Expand Up @@ -81,11 +86,19 @@ The following matches any dimension values for the dimension `name` between `'ba
}
```

The JavaScript filter supports the use of extraction functions, see [Filtering with Extraction Functions](#filtering-with-extraction-functions) for details.


### Extraction filter

<div class="note caution">
The extraction filter is now deprecated. The selector filter with an extraction function specified
provides identical functionality and should be used instead.
</div>

Extraction filter matches a dimension using some specific [Extraction function](./dimensionspecs.html#extraction-functions).
The following filter matches the values for which the extraction function has transformation entry `input_key=output_value` where
`output_value` is equal to the filter `value` and `input_key` is present as dimension.
`output_value` is equal to the filter `value` and `input_key` is present as dimension.

**Example**
The following matches dimension values in `[product_1, product_3, product_5]` for the column `product`
Expand All @@ -110,6 +123,7 @@ The following matches dimension values in `[product_1, product_3, product_5]` fo
}
}
```

### Search filter

Search filters can be used to filter on partial string matches.
Expand All @@ -132,6 +146,10 @@ Search filters can be used to filter on partial string matches.
|type|This String should always be "search".|yes|
|dimension|The dimension to perform the search over.|yes|
|query|A JSON object for the type of search. See below for more information.|yes|
|extractionFn|[Extraction function](#filtering-with-extraction-functions) to apply to the dimension|no|

The search filter supports the use of extraction functions, see [Filtering with Extraction Functions](#filtering-with-extraction-functions) for details.


### In filter

Expand All @@ -151,13 +169,18 @@ The grammar for a IN filter is as follows:
}
```

The IN filter supports the use of extraction functions, see [Filtering with Extraction Functions](#filtering-with-extraction-functions) for details.


### Bound filter

Bound filter can be used to filter by comparing dimension values to an upper value or/and a lower value.
By default Comparison is string based and **case sensitive**.
To use numeric comparison you can set `alphaNumeric` to `true`.
By default the bound filter is a not a strict inclusion `inputString <= upper && inputSting >= lower`.

The bound filter supports the use of extraction functions, see [Filtering with Extraction Functions](#filtering-with-extraction-functions) for details.

The grammar for a bound filter is as follows:

```json
Expand Down Expand Up @@ -246,6 +269,8 @@ For instance suppose lower bound is `100` and value is `10K` the filter will mat
Now suppose that the lower bound is `110` the filter will not match (`110 < 10K` returns `false`)




#### Search Query Spec

##### Insensitive Contains
Expand All @@ -270,3 +295,38 @@ Now suppose that the lower bound is `110` the filter will not match (`110 < 10K`
|type|This String should always be "contains".|yes|
|value|A String value to run the search over.|yes|
|caseSensitive|Whether two string should be compared as case sensitive or not|yes|


### Filtering with Extraction Functions
Some filters optionally support the use of extraction functions.
An extraction function is defined by setting the "extractionFn" field on a filter.
See [Extraction function](./dimensionspecs.html#extraction-functions) for more details on extraction functions.

If specified, the extraction function will be used to transform input values before the filter is applied.
The example below shows a selector filter combined with an extraction function. This filter will transform input values
according to the values defined in the lookup map; transformed values will then be matched with the string "bar_1".


**Example**
The following matches dimension values in `[product_1, product_3, product_5]` for the column `product`

```json
{
"filter": {
"type": "selector",
"dimension": "product",
"value": "bar_1",
"extractionFn": {
"type": "lookup",
"lookup": {
"type": "map",
"map": {
"product_1": "bar_1",
"product_5": "bar_1",
"product_3": "bar_1"
}
}
}
}
}
```
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ public void testSingleIntervalSerde() throws Exception
interval,
null,
null,
new SelectorDimFilter("dim", "value"),
new SelectorDimFilter("dim", "value", null),
QueryGranularity.DAY,
Lists.newArrayList("d1", "d2"),
Lists.newArrayList("m1", "m2", "m3"),
Expand Down Expand Up @@ -132,7 +132,7 @@ public void testMultiIntervalSerde() throws Exception
128
)
),
new SelectorDimFilter("dim", "value"),
new SelectorDimFilter("dim", "value", null),
QueryGranularity.DAY,
Lists.newArrayList("d1", "d2"),
Lists.newArrayList("m1", "m2", "m3"),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -305,7 +305,7 @@ public List<StorageLocationConfig> getLocations()
new IngestSegmentFirehoseFactory(
DATA_SOURCE_NAME,
FOREVER,
new SelectorDimFilter(DIM_NAME, DIM_VALUE),
new SelectorDimFilter(DIM_NAME, DIM_VALUE, null),
dim_names,
metric_names,
Guice.createInjector(
Expand Down
18 changes: 9 additions & 9 deletions processing/src/main/java/io/druid/query/Druids.java
Original file line number Diff line number Diff line change
Expand Up @@ -163,9 +163,9 @@ public OrDimFilterBuilder copy(OrDimFilterBuilder builder)

public OrDimFilterBuilder fields(String dimensionName, String value, String... values)
{
fields = Lists.<DimFilter>newArrayList(new SelectorDimFilter(dimensionName, value));
fields = Lists.<DimFilter>newArrayList(new SelectorDimFilter(dimensionName, value, null));
for (String val : values) {
fields.add(new SelectorDimFilter(dimensionName, val));
fields.add(new SelectorDimFilter(dimensionName, val, null));
}
return this;
}
Expand Down Expand Up @@ -256,7 +256,7 @@ public SelectorDimFilterBuilder()

public SelectorDimFilter build()
{
return new SelectorDimFilter(dimension, value);
return new SelectorDimFilter(dimension, value, null);
}

public SelectorDimFilterBuilder copy(SelectorDimFilterBuilder builder)
Expand Down Expand Up @@ -459,13 +459,13 @@ public TimeseriesQueryBuilder intervals(List<Interval> l)

public TimeseriesQueryBuilder filters(String dimensionName, String value)
{
dimFilter = new SelectorDimFilter(dimensionName, value);
dimFilter = new SelectorDimFilter(dimensionName, value, null);
return this;
}

public TimeseriesQueryBuilder filters(String dimensionName, String value, String... values)
{
dimFilter = new InDimFilter(dimensionName, Lists.asList(value, values));
dimFilter = new InDimFilter(dimensionName, Lists.asList(value, values), null);
return this;
}

Expand Down Expand Up @@ -615,13 +615,13 @@ public SearchQueryBuilder dataSource(DataSource d)

public SearchQueryBuilder filters(String dimensionName, String value)
{
dimFilter = new SelectorDimFilter(dimensionName, value);
dimFilter = new SelectorDimFilter(dimensionName, value, null);
return this;
}

public SearchQueryBuilder filters(String dimensionName, String value, String... values)
{
dimFilter = new InDimFilter(dimensionName, Lists.asList(value, values));
dimFilter = new InDimFilter(dimensionName, Lists.asList(value, values), null);
return this;
}

Expand Down Expand Up @@ -1159,13 +1159,13 @@ public SelectQueryBuilder context(Map<String, Object> c)

public SelectQueryBuilder filters(String dimensionName, String value)
{
dimFilter = new SelectorDimFilter(dimensionName, value);
dimFilter = new SelectorDimFilter(dimensionName, value, null);
return this;
}

public SelectQueryBuilder filters(String dimensionName, String value, String... values)
{
dimFilter = new InDimFilter(dimensionName, Lists.asList(value, values));
dimFilter = new InDimFilter(dimensionName, Lists.asList(value, values), null);
return this;
}

Expand Down
31 changes: 26 additions & 5 deletions processing/src/main/java/io/druid/query/filter/BoundDimFilter.java
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
import com.fasterxml.jackson.annotation.JsonProperty;
import com.google.common.base.Preconditions;
import com.metamx.common.StringUtils;
import io.druid.query.extraction.ExtractionFn;
import io.druid.segment.filter.BoundFilter;

import java.nio.ByteBuffer;
Expand All @@ -35,6 +36,7 @@ public class BoundDimFilter implements DimFilter
private final boolean lowerStrict;
private final boolean upperStrict;
private final boolean alphaNumeric;
private final ExtractionFn extractionFn;

@JsonCreator
public BoundDimFilter(
Expand All @@ -43,7 +45,8 @@ public BoundDimFilter(
@JsonProperty("upper") String upper,
@JsonProperty("lowerStrict") Boolean lowerStrict,
@JsonProperty("upperStrict") Boolean upperStrict,
@JsonProperty("alphaNumeric") Boolean alphaNumeric
@JsonProperty("alphaNumeric") Boolean alphaNumeric,
@JsonProperty("extractionFn") ExtractionFn extractionFn
)
{
this.dimension = Preconditions.checkNotNull(dimension, "dimension can not be null");
Expand All @@ -53,6 +56,7 @@ public BoundDimFilter(
this.lowerStrict = (lowerStrict == null) ? false : lowerStrict;
this.upperStrict = (upperStrict == null) ? false : upperStrict;
this.alphaNumeric = (alphaNumeric == null) ? false : alphaNumeric;
this.extractionFn = extractionFn;
}

@JsonProperty
Expand Down Expand Up @@ -101,6 +105,12 @@ public boolean hasUpperBound()
return upper != null;
}

@JsonProperty
public ExtractionFn getExtractionFn()
{
return extractionFn;
}

@Override
public byte[] getCacheKey()
{
Expand All @@ -118,11 +128,14 @@ public byte[] getCacheKey()
byte upperStrictByte = (this.isUpperStrict() == false) ? 0x0 : (byte) 1;
byte AlphaNumericByte = (this.isAlphaNumeric() == false) ? 0x0 : (byte) 1;

byte[] extractionFnBytes = extractionFn == null ? new byte[0] : extractionFn.getCacheKey();

ByteBuffer boundCacheBuffer = ByteBuffer.allocate(
8
9
+ dimensionBytes.length
+ upperBytes.length
+ lowerBytes.length
+ extractionFnBytes.length
);
boundCacheBuffer.put(DimFilterCacheHelper.BOUND_CACHE_ID)
.put(boundType)
Expand All @@ -134,7 +147,9 @@ public byte[] getCacheKey()
.put(DimFilterCacheHelper.STRING_SEPARATOR)
.put(upperBytes)
.put(DimFilterCacheHelper.STRING_SEPARATOR)
.put(lowerBytes);
.put(lowerBytes)
.put(DimFilterCacheHelper.STRING_SEPARATOR)
.put(extractionFnBytes);
return boundCacheBuffer.array();
}

Expand All @@ -156,7 +171,7 @@ public boolean equals(Object o)
if (this == o) {
return true;
}
if (!(o instanceof BoundDimFilter)) {
if (o == null || getClass() != o.getClass()) {
return false;
}

Expand All @@ -177,7 +192,12 @@ public boolean equals(Object o)
if (getUpper() != null ? !getUpper().equals(that.getUpper()) : that.getUpper() != null) {
return false;
}
return !(getLower() != null ? !getLower().equals(that.getLower()) : that.getLower() != null);
if (getLower() != null ? !getLower().equals(that.getLower()) : that.getLower() != null) {
return false;
}
return getExtractionFn() != null
? getExtractionFn().equals(that.getExtractionFn())
: that.getExtractionFn() == null;

}

Expand All @@ -190,6 +210,7 @@ public int hashCode()
result = 31 * result + (isLowerStrict() ? 1 : 0);
result = 31 * result + (isUpperStrict() ? 1 : 0);
result = 31 * result + (isAlphaNumeric() ? 1 : 0);
result = 31 * result + (getExtractionFn() != null ? getExtractionFn().hashCode() : 0);
return result;
}
}
Loading

0 comments on commit 289bb6f

Please sign in to comment.