Skip to content

Commit

Permalink
Document how to use roaring bitmaps (apache#2824)
Browse files Browse the repository at this point in the history
* Document how to use roaring bitmaps

This fixes apache#2408.
While not all indexSpec properties are explained, it does explain how roaring bitmaps can be turned on.

* fix

* fix

* fix

* fix
  • Loading branch information
fjy committed Apr 13, 2016
1 parent db35dd7 commit abd951d
Show file tree
Hide file tree
Showing 3 changed files with 20 additions and 2 deletions.
9 changes: 9 additions & 0 deletions docs/content/ingestion/batch-ingestion.md
Original file line number Diff line number Diff line change
Expand Up @@ -162,6 +162,7 @@ The tuningConfig is optional and default parameters will be used if no tuningCon
|combineText|Boolean|Use CombineTextInputFormat to combine multiple files into a file split. This can speed up Hadoop jobs when processing a large number of small files.|no (default == false)|
|useCombiner|Boolean|Use Hadoop combiner to merge rows at mapper if possible.|no (default == false)|
|jobProperties|Object|A map of properties to add to the Hadoop job configuration, see below for details.|no (default == null)|
|indexSpec|Object|Tune how data is indexed. See below for more information.|no|
|buildV9Directly|Boolean|Build v9 index directly instead of building v8 index and converting it to v9 format.|no (default = false)|
|numBackgroundPersistThreads|Integer|The number of new background threads to use for incremental persists. Using this feature causes a notable increase in memory pressure and cpu usage but will make the job finish more quickly. If changing from the default of 0 (use current thread for persists), we recommend setting it to 1.|no (default == 0)|

Expand All @@ -186,6 +187,14 @@ The following properties can be used to tune how the MapReduce job is configured

**Please note that using `mapreduce.job.user.classpath.first` is an expert feature and should not be used without a deep understanding of Hadoop and Java class loading mechanism.**

#### IndexSpec

|Field|Type|Description|Required|
|-----|----|-----------|--------|
|bitmap|String|The type of bitmap index to create. Choose from `roaring` or `concise`, or null to use the default (`concise`).|No|
|dimensionCompression|String|Compression format for dimension columns. Choose from `LZ4`, `LZF`, or `uncompressed`. The default is `LZ4`.|No|
|metricCompression|String|Compression format for metric columns. Choose from `LZ4`, `LZF`, or `uncompressed`. The default is `LZ4`.|No|

### Partitioning specification

Segments are always partitioned based on timestamp (according to the granularitySpec) and may be further partitioned in
Expand Down
10 changes: 9 additions & 1 deletion docs/content/ingestion/stream-pull.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ The property `druid.realtime.specFile` has the path of a file (absolute or relat
},
"tuningConfig": {
"type" : "realtime",
"maxRowsInMemory": 500000,
"maxRowsInMemory": 75000,
"intermediatePersistPeriod": "PT10m",
"windowPeriod": "PT10m",
"basePersistDirectory": "\/tmp\/realtime\/basePersist",
Expand Down Expand Up @@ -155,6 +155,7 @@ The tuningConfig is optional and default parameters will be used if no tuningCon
|mergeThreadPriority|int|If `-XX:+UseThreadPriorities` is properly enabled, this will set the thread priority of the merging thread to `Thread.NORM_PRIORITY` plus this value within the bounds of `Thread.MIN_PRIORITY` and `Thread.MAX_PRIORITY`. A value of 0 indicates to not change the thread priority.|no (default = 0; inherit and do not override)|
|reportParseExceptions|Boolean|If true, exceptions encountered during parsing will be thrown and will halt ingestion. If false, unparseable rows and fields will be skipped. If an entire row is skipped, the "unparseable" counter will be incremented. If some fields in a row were parseable and some were not, the parseable fields will be indexed and the "unparseable" counter will not be incremented.|false|
|handoffConditionTimeout|long|Milliseconds to wait for segment handoff. It must be >= 0 and 0 means wait forerver.|0|
|indexSpec|Object|Tune how data is indexed. See below for more information.|no|

Before enabling thread priority settings, users are highly encouraged to read the [original pull request](https://github.com/druid-io/druid/pull/984) and other documentation about proper use of `-XX:+UseThreadPriorities`.

Expand All @@ -166,6 +167,13 @@ The following policies are available:
* `messageTime` – Can be used for non-"current time" as long as that data is relatively in sequence. Events are rejected if they are less than `windowPeriod` from the event with the latest timestamp. Hand off only occurs if an event is seen after the segmentGranularity and `windowPeriod` (hand off will not periodically occur unless you have a constant stream of data).
* `none` – All events are accepted. Never hands off data unless shutdown() is called on the configured firehose.

### Index Spec

|Field|Type|Description|Required|
|-----|----|-----------|--------|
|bitmap|String|The type of bitmap index to create. Choose from `roaring` or `concise`, or null to use the default (`concise`).|No|
|dimensionCompression|String|Compression format for dimension columns. Choose from `LZ4`, `LZF`, or `uncompressed`. The default is `LZ4`.|No|
|metricCompression|String|Compression format for metric columns. Choose from `LZ4`, `LZF`, or `uncompressed`. The default is `LZ4`.|No|

#### Sharding

Expand Down
3 changes: 2 additions & 1 deletion processing/src/main/java/io/druid/segment/IndexSpec.java
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,8 @@ public IndexSpec()
* Defaults to the bitmap type specified by the (deprecated) "druid.processing.bitmap.type"
* setting, or, if none was set, uses the default @{link BitmapSerde.DefaultBitmapSerdeFactory}
*
* @param dimensionCompression compression format for dimension columns. The default, null, means no compression
* @param dimensionCompression compression format for dimension columns, null to use the default
* Defaults to @{link CompressedObjectStrategy.DEFAULT_COMPRESSION_STRATEGY}
*
* @param metricCompression compression format for metric columns, null to use the default.
* Defaults to @{link CompressedObjectStrategy.DEFAULT_COMPRESSION_STRATEGY}
Expand Down

0 comments on commit abd951d

Please sign in to comment.