Skip to content

Commit

Permalink
Merge branch 'master' into router-node
Browse files Browse the repository at this point in the history
  • Loading branch information
fjy committed Mar 24, 2014
2 parents a96dfc7 + c97caa3 commit fcd7522
Show file tree
Hide file tree
Showing 69 changed files with 1,919 additions and 142 deletions.
2 changes: 1 addition & 1 deletion build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -30,4 +30,4 @@ echo "For examples, see: "
echo " "
ls -1 examples/*/*sh
echo " "
echo "See also http://druid.io/docs/0.6.72"
echo "See also http://druid.io/docs/0.6.73"
2 changes: 1 addition & 1 deletion cassandra-storage/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
<parent>
<groupId>io.druid</groupId>
<artifactId>druid</artifactId>
<version>0.6.73-SNAPSHOT</version>
<version>0.6.74-SNAPSHOT</version>
</parent>

<dependencies>
Expand Down
2 changes: 1 addition & 1 deletion common/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
<parent>
<groupId>io.druid</groupId>
<artifactId>druid</artifactId>
<version>0.6.73-SNAPSHOT</version>
<version>0.6.74-SNAPSHOT</version>
</parent>

<dependencies>
Expand Down
4 changes: 2 additions & 2 deletions docs/content/Coordinator.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ io.druid.cli.Main server coordinator
Rules
-----

Segments are loaded and dropped from the cluster based on a set of rules. Rules indicate how segments should be assigned to different historical node tiers and how many replicants of a segment should exist in each tier. Rules may also indicate when segments should be dropped entirely from the cluster. The coordinator loads a set of rules from the database. Rules may be specific to a certain datasource and/or a default set of rules can be configured. Rules are read in order and hence the ordering of rules is important. The coordinator will cycle through all available segments and match each segment with the first rule that applies. Each segment may only match a single rule
Segments are loaded and dropped from the cluster based on a set of rules. Rules indicate how segments should be assigned to different historical node tiers and how many replicants of a segment should exist in each tier. Rules may also indicate when segments should be dropped entirely from the cluster. The coordinator loads a set of rules from the database. Rules may be specific to a certain datasource and/or a default set of rules can be configured. Rules are read in order and hence the ordering of rules is important. The coordinator will cycle through all available segments and match each segment with the first rule that applies. Each segment may only match a single rule.

For more information on rules, see [Rule Configuration](Rule-Configuration.html).

Expand Down Expand Up @@ -136,4 +136,4 @@ FAQ

No. If the Druid coordinator is not started up, no new segments will be loaded in the cluster and outdated segments will not be dropped. However, the coordinator node can be started up at any time, and after a configurable delay, will start running coordinator tasks.

This also means that if you have a working cluster and all of your coordinators die, the cluster will continue to function, it just won’t experience any changes to its data topology.
This also means that if you have a working cluster and all of your coordinators die, the cluster will continue to function, it just won’t experience any changes to its data topology.
4 changes: 2 additions & 2 deletions docs/content/Examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,13 +19,13 @@ Clone Druid and build it:
git clone https://github.com/metamx/druid.git druid
cd druid
git fetch --tags
git checkout druid-0.6.72
git checkout druid-0.6.73
./build.sh
```

### Downloading the DSK (Druid Standalone Kit)

[Download](http://static.druid.io/artifacts/releases/druid-services-0.6.72-bin.tar.gz) a stand-alone tarball and run it:
[Download](http://static.druid.io/artifacts/releases/druid-services-0.6.73-bin.tar.gz) a stand-alone tarball and run it:

``` bash
tar -xzf druid-services-0.X.X-bin.tar.gz
Expand Down
20 changes: 8 additions & 12 deletions docs/content/Granularities.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,13 @@
layout: doc_page
---
# Aggregation Granularity
The granularity field determines how data gets bucketed across the time dimension, i.e how it gets aggregated by hour, day, minute, etc.
The granularity field determines how data gets bucketed across the time dimension, or how it gets aggregated by hour, day, minute, etc.

It can be specified either as a string for simple granularities or as an object for arbitrary granularities.

### Simple Granularities

Simple granularities are specified as a string and bucket timestamps by their UTC time (i.e. days start at 00:00 UTC).
Simple granularities are specified as a string and bucket timestamps by their UTC time (e.g., days start at 00:00 UTC).

Supported granularity strings are: `all`, `none`, `minute`, `fifteen_minute`, `thirty_minute`, `hour` and `day`

Expand All @@ -35,25 +35,21 @@ This chunks up every hour on the half-hour.

### Period Granularities

Period granularities are specified as arbitrary period combinations of years, months, weeks, hours, minutes and seconds (e.g. P2W, P3M, PT1H30M, PT0.750S) in ISO8601 format.
Period granularities are specified as arbitrary period combinations of years, months, weeks, hours, minutes and seconds (e.g. P2W, P3M, PT1H30M, PT0.750S) in ISO8601 format. They support specifying a time zone which determines where period boundaries start as well as the timezone of the returned timestamps. By default, years start on the first of January, months start on the first of the month and weeks start on Mondays unless an origin is specified.

They support specifying a time zone which determines where period boundaries start and also determines the timezone of the returned timestamps.

By default years start on the first of January, months start on the first of the month and weeks start on Mondays unless an origin is specified.

Time zone is optional (defaults to UTC)
Origin is optional (defaults to 1970-01-01T00:00:00 in the given time zone)
Time zone is optional (defaults to UTC). Origin is optional (defaults to 1970-01-01T00:00:00 in the given time zone).

```
{"type": "period", "period": "P2D", "timeZone": "America/Los_Angeles"}
```

This will bucket by two day chunks in the Pacific timezone.
This will bucket by two-day chunks in the Pacific timezone.

```
{"type": "period", "period": "P3M", "timeZone": "America/Los_Angeles", "origin": "2012-02-01T00:00:00-08:00"}
```

This will bucket by 3 month chunks in the Pacific timezone where the three-month quarters are defined as starting from February.
This will bucket by 3-month chunks in the Pacific timezone where the three-month quarters are defined as starting from February.

Supported time zones: timezone support is provided by the [Joda Time library](http://www.joda.org), which uses the standard IANA time zones. [Joda Time supported timezones](http://joda-time.sourceforge.net/timezones.html)
#### Supported Time Zones
Timezone support is provided by the [Joda Time library](http://www.joda.org), which uses the standard IANA time zones. See the [Joda Time supported timezones](http://joda-time.sourceforge.net/timezones.html).
4 changes: 2 additions & 2 deletions docs/content/GroupByQuery.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
layout: doc_page
---
# groupBy Queries
These types of queries take a groupBy query object and return an array of JSON objects where each object represents a grouping asked for by the query. Note: If you only want to do straight aggreagates for some time range, we highly recommend using [TimeseriesQueries](TimeseriesQuery.html) instead. The performance will be substantially better.
These types of queries take a groupBy query object and return an array of JSON objects where each object represents a grouping asked for by the query. Note: If you only want to do straight aggregates for some time range, we highly recommend using [TimeseriesQueries](TimeseriesQuery.html) instead. The performance will be substantially better.
An example groupBy query object is shown below:

``` json
Expand Down Expand Up @@ -87,4 +87,4 @@ To pull it all together, the above query would return *n\*m* data points, up to
},
...
]
```
```
4 changes: 2 additions & 2 deletions docs/content/Indexing-Service-Config.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ druid.host=#{IP_ADDR}:8080
druid.port=8080
druid.service=druid/prod/indexer
druid.extensions.coordinates=["io.druid.extensions:druid-s3-extensions:0.6.72"]
druid.extensions.coordinates=["io.druid.extensions:druid-s3-extensions:0.6.73"]
druid.zk.service.host=#{ZK_IPs}
druid.zk.paths.base=/druid/prod
Expand Down Expand Up @@ -115,7 +115,7 @@ druid.host=#{IP_ADDR}:8080
druid.port=8080
druid.service=druid/prod/worker
druid.extensions.coordinates=["io.druid.extensions:druid-s3-extensions:0.6.72","io.druid.extensions:druid-kafka-seven:0.6.72"]
druid.extensions.coordinates=["io.druid.extensions:druid-s3-extensions:0.6.73","io.druid.extensions:druid-kafka-seven:0.6.73"]
druid.zk.service.host=#{ZK_IPs}
druid.zk.paths.base=/druid/prod
Expand Down
4 changes: 2 additions & 2 deletions docs/content/Ingestion-FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ druid.storage.baseKey=sample
```

## I don't see my Druid segments on my historical nodes
You can check the coordinator console located at <COORDINATOR_IP>:<PORT>/cluster.html. Make sure that your segments have actually loaded on [historical nodes](Historical.html). If your segments are not present, check the coordinator logs for messages about capacity of replication errors. One reason that segments are not downloaded is because historical nodes have maxSizes that are too small, making them incapable of downloading more data. You can change that with (for example):
You can check the coordinator console located at `<COORDINATOR_IP>:<PORT>/cluster.html`. Make sure that your segments have actually loaded on [historical nodes](Historical.html). If your segments are not present, check the coordinator logs for messages about capacity of replication errors. One reason that segments are not downloaded is because historical nodes have maxSizes that are too small, making them incapable of downloading more data. You can change that with (for example):

```
-Ddruid.segmentCache.locations=[{"path":"/tmp/druid/storageLocation","maxSize":"500000000000"}]
Expand All @@ -31,7 +31,7 @@ You can check the coordinator console located at <COORDINATOR_IP>:<PORT>/cluster

## My queries are returning empty results

You can check <BROKER_IP>:<PORT>/druid/v2/datasources/<YOUR_DATASOURCE> for the dimensions and metrics that have been created for your datasource. Make sure that the name of the aggregators you use in your query match one of these metrics. Also make sure that the query interval you specify match a valid time range where data exists.
You can check `<BROKER_IP>:<PORT>/druid/v2/datasources/<YOUR_DATASOURCE>` for the dimensions and metrics that have been created for your datasource. Make sure that the name of the aggregators you use in your query match one of these metrics. Also make sure that the query interval you specify match a valid time range where data exists.

## More information

Expand Down
4 changes: 2 additions & 2 deletions docs/content/Realtime-Config.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ druid.host=localhost
druid.service=realtime
druid.port=8083
druid.extensions.coordinates=["io.druid.extensions:druid-kafka-seven:0.6.72"]
druid.extensions.coordinates=["io.druid.extensions:druid-kafka-seven:0.6.73"]
druid.zk.service.host=localhost
Expand Down Expand Up @@ -76,7 +76,7 @@ druid.host=#{IP_ADDR}:8080
druid.port=8080
druid.service=druid/prod/realtime
druid.extensions.coordinates=["io.druid.extensions:druid-s3-extensions:0.6.72","io.druid.extensions:druid-kafka-seven:0.6.72"]
druid.extensions.coordinates=["io.druid.extensions:druid-s3-extensions:0.6.73","io.druid.extensions:druid-kafka-seven:0.6.73"]
druid.zk.service.host=#{ZK_IPs}
druid.zk.paths.base=/druid/prod
Expand Down
2 changes: 1 addition & 1 deletion docs/content/SearchQuery.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ There are several main parts to a search query:
|intervals|A JSON Object representing ISO-8601 Intervals. This defines the time ranges to run the query over.|yes|
|searchDimensions|The dimensions to run the search over. Excluding this means the search is run over all dimensions.|no|
|query|See [SearchQuerySpec](SearchQuerySpec.html).|yes|
|sort|How the results of the search should sorted. Two possible types here are "lexicographic" and "strlen".|yes|
|sort|How the results of the search should be sorted. Two possible types here are "lexicographic" and "strlen".|yes|
|context|An additional JSON Object which can be used to specify certain flags.|no|

The format of the result is:
Expand Down
2 changes: 1 addition & 1 deletion docs/content/SegmentMetadataQuery.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Segment metadata queries return per segment information about:
{
"queryType":"segmentMetadata",
"dataSource":"sample_datasource",
"intervals":["2013-01-01/2014-01-01"],
"intervals":["2013-01-01/2014-01-01"]
}
```

Expand Down
26 changes: 13 additions & 13 deletions docs/content/Tasks.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ There are several different types of tasks.
Segment Creation Tasks
----------------------

#### Index Task
### Index Task

The Index Task is a simpler variation of the Index Hadoop task that is designed to be used for smaller data sets. The task executes within the indexing service and does not require an external Hadoop setup to use. The grammar of the index task is as follows:

Expand Down Expand Up @@ -51,15 +51,15 @@ The Index Task is a simpler variation of the Index Hadoop task that is designed
|--------|-----------|---------|
|type|The task type, this should always be "index".|yes|
|id|The task ID.|no|
|granularitySpec|See [granularitySpec](Tasks.html)|yes|
|spatialDimensions|Dimensions to build spatial indexes over. See [Spatial-Indexing](Spatial-Indexing.html)|no|
|granularitySpec|Specifies the segment chunks that the task will process. `type` is always "uniform"; `gran` sets the granularity of the chunks ("DAY" means all segments containing timestamps in the same day, while `intervals` sets the interval that the chunks will cover.|yes|
|spatialDimensions|Dimensions to build spatial indexes over. See [Geographic Queries](GeographicQueries.html).|no|
|aggregators|The metrics to aggregate in the data set. For more info, see [Aggregations](Aggregations.html)|yes|
|indexGranularity|The rollup granularity for timestamps.|no|
|targetPartitionSize|Used in sharding. Determines how many rows are in each segment.|no|
|firehose|The input source of data. For more info, see [Firehose](Firehose.html)|yes|
|rowFlushBoundary|Used in determining when intermediate persist should occur to disk.|no|

#### Index Hadoop Task
### Index Hadoop Task

The Hadoop Index Task is used to index larger data sets that require the parallelization and processing power of a Hadoop cluster.

Expand All @@ -79,19 +79,19 @@ The Hadoop Index Task is used to index larger data sets that require the paralle

The Hadoop Index Config submitted as part of an Hadoop Index Task is identical to the Hadoop Index Config used by the `HadoopBatchIndexer` except that three fields must be omitted: `segmentOutputPath`, `workingPath`, `updaterJobSpec`. The Indexing Service takes care of setting these fields internally.

##### Using your own Hadoop distribution
#### Using your own Hadoop distribution

Druid is compiled against Apache hadoop-core 1.0.3. However, if you happen to use a different flavor of hadoop that is API compatible with hadoop-core 1.0.3, you should only have to change the hadoopCoordinates property to point to the maven artifact used by your distribution.

##### Resolving dependency conflicts running HadoopIndexTask
#### Resolving dependency conflicts running HadoopIndexTask

Currently, the HadoopIndexTask creates a single classpath to run the HadoopDruidIndexerJob, which can lead to version conflicts between various dependencies of Druid, extension modules, and Hadoop's own dependencies.

The Hadoop index task will put Druid's dependencies first on the classpath, followed by any extensions dependencies, and any Hadoop dependencies last.

If you are having trouble with any extensions in HadoopIndexTask, it may be the case that Druid, or one of its dependencies, depends on a different version of a library than what you are using as part of your extensions, but Druid's version overrides the one in your extension. In that case you probably want to build your own Druid version and override the offending library by adding an explicit dependency to the pom.xml of each druid sub-module that depends on it.

#### Realtime Index Task
### Realtime Index Task

The indexing service can also run real-time tasks. These tasks effectively transform a middle manager into a real-time node. We introduced real-time tasks as a way to programmatically add new real-time data sources without needing to manually add nodes. The grammar for the real-time task is as follows:

Expand Down Expand Up @@ -169,7 +169,7 @@ For schema, fireDepartmentConfig, windowPeriod, segmentGranularity, and rejectio
Segment Merging Tasks
---------------------

#### Append Task
### Append Task

Append tasks append a list of segments together into a single segment (one after the other). The grammar is:

Expand All @@ -181,7 +181,7 @@ Append tasks append a list of segments together into a single segment (one after
}
```

#### Merge Task
### Merge Task

Merge tasks merge a list of segments together. Any common timestamps are merged. The grammar is:

Expand All @@ -196,7 +196,7 @@ Merge tasks merge a list of segments together. Any common timestamps are merged.
Segment Destroying Tasks
------------------------

#### Delete Task
### Delete Task

Delete tasks create empty segments with no data. The grammar is:

Expand All @@ -208,7 +208,7 @@ Delete tasks create empty segments with no data. The grammar is:
}
```

#### Kill Task
### Kill Task

Kill tasks delete all information about a segment and removes it from deep storage. Killable segments must be disabled (used==0) in the Druid segment table. The available grammar is:

Expand All @@ -223,7 +223,7 @@ Kill tasks delete all information about a segment and removes it from deep stora
Misc. Tasks
-----------

#### Version Converter Task
### Version Converter Task

These tasks convert segments from an existing older index version to the latest index version. The available grammar is:

Expand All @@ -237,7 +237,7 @@ These tasks convert segments from an existing older index version to the latest
}
```

#### Noop Task
### Noop Task

These tasks start, sleep for a time and are used only for testing. The available grammar is:

Expand Down
1 change: 1 addition & 0 deletions docs/content/TopNQuery.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ TopN queries return a sorted set of results for the values in a given dimension
A topN query object looks like:

```json
{
"queryType": "topN",
"dataSource": "sample_data",
"dimension": "sample_dim",
Expand Down
4 changes: 2 additions & 2 deletions docs/content/Tutorial:-A-First-Look-at-Druid.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ There are two ways to setup Druid: download a tarball, or [Build From Source](Bu

### Download a Tarball

We've built a tarball that contains everything you'll need. You'll find it [here](http://static.druid.io/artifacts/releases/druid-services-0.6.72-bin.tar.gz). Download this file to a directory of your choosing.
We've built a tarball that contains everything you'll need. You'll find it [here](http://static.druid.io/artifacts/releases/druid-services-0.6.73-bin.tar.gz). Download this file to a directory of your choosing.

You can extract the awesomeness within by issuing:

Expand All @@ -60,7 +60,7 @@ tar -zxvf druid-services-*-bin.tar.gz
Not too lost so far right? That's great! If you cd into the directory:

```
cd druid-services-0.6.72
cd druid-services-0.6.73
```

You should see a bunch of files:
Expand Down
6 changes: 3 additions & 3 deletions docs/content/Tutorial:-The-Druid-Cluster.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ In this tutorial, we will set up other types of Druid nodes and external depende

If you followed the first tutorial, you should already have Druid downloaded. If not, let's go back and do that first.

You can download the latest version of druid [here](http://static.druid.io/artifacts/releases/druid-services-0.6.72-bin.tar.gz)
You can download the latest version of druid [here](http://static.druid.io/artifacts/releases/druid-services-0.6.73-bin.tar.gz)

and untar the contents within by issuing:

Expand Down Expand Up @@ -149,7 +149,7 @@ druid.port=8081
druid.zk.service.host=localhost
druid.extensions.coordinates=["io.druid.extensions:druid-s3-extensions:0.6.72"]
druid.extensions.coordinates=["io.druid.extensions:druid-s3-extensions:0.6.73"]
# Dummy read only AWS account (used to download example data)
druid.s3.secretKey=QyyfVZ7llSiRg6Qcrql1eEUG7buFpAK6T6engr1b
Expand Down Expand Up @@ -240,7 +240,7 @@ druid.port=8083
druid.zk.service.host=localhost
druid.extensions.coordinates=["io.druid.extensions:druid-examples:0.6.72","io.druid.extensions:druid-kafka-seven:0.6.72"]
druid.extensions.coordinates=["io.druid.extensions:druid-examples:0.6.73","io.druid.extensions:druid-kafka-seven:0.6.73"]
# Change this config to db to hand off to the rest of the Druid cluster
druid.publish.type=noop
Expand Down
Loading

0 comments on commit fcd7522

Please sign in to comment.