Skip to content

Commit

Permalink
Merge pull request apache#2537 from druid-io/refactor-ext
Browse files Browse the repository at this point in the history
refactor extensions into core and contrib
  • Loading branch information
drcrallen committed Mar 9, 2016
2 parents 94da1f8 + e3e932a commit 4c3a3f8
Show file tree
Hide file tree
Showing 271 changed files with 652 additions and 550 deletions.
10 changes: 0 additions & 10 deletions distribution/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -71,10 +71,6 @@
<argument>-c</argument>
<argument>io.druid.extensions:druid-examples</argument>
<argument>-c</argument>
<argument>io.druid.extensions:druid-azure-extensions</argument>
<argument>-c</argument>
<argument>io.druid.extensions:druid-cassandra-storage</argument>
<argument>-c</argument>
<argument>io.druid.extensions:druid-datasketches</argument>
<argument>-c</argument>
<argument>io.druid.extensions:druid-hdfs-storage</argument>
Expand All @@ -83,8 +79,6 @@
<argument>-c</argument>
<argument>io.druid.extensions:druid-kafka-eight</argument>
<argument>-c</argument>
<argument>io.druid.extensions:druid-kafka-eight-simple-consumer</argument>
<argument>-c</argument>
<argument>io.druid.extensions:druid-kafka-extraction-namespace</argument>
<argument>-c</argument>
<argument>io.druid.extensions:mysql-metadata-storage</argument>
Expand All @@ -93,11 +87,7 @@
<argument>-c</argument>
<argument>io.druid.extensions:postgresql-metadata-storage</argument>
<argument>-c</argument>
<argument>io.druid.extensions:druid-rabbitmq</argument>
<argument>-c</argument>
<argument>io.druid.extensions:druid-s3-extensions</argument>
<argument>-c</argument>
<argument>io.druid.extensions:druid-cloudfiles-extensions</argument>
</arguments>
</configuration>
</execution>
Expand Down
51 changes: 3 additions & 48 deletions docs/content/dependencies/deep-storage.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,7 @@ layout: doc_page
# Deep Storage
Deep storage is where segments are stored. It is a storage mechanism that Druid does not provide. This deep storage infrastructure defines the level of durability of your data, as long as Druid nodes can see this storage infrastructure and get at the segments stored on it, you will not lose data no matter how many Druid nodes you lose. If segments disappear from this storage layer, then you will lose whatever data those segments represented.

## Production Tested Deep Stores

### Local Mount
## Local Mount

A local mount can be used for storage of segments as well. This allows you to use just your local file system or anything else that can be mount locally like NFS, Ceph, etc. This is the default deep storage implementation.

Expand All @@ -21,22 +19,20 @@ Note that you should generally set `druid.storage.storageDirectory` to something

If you are using the Hadoop indexer in local mode, then just give it a local file as your output directory and it will work.


### S3-compatible
## S3-compatible

S3-compatible deep storage is basically either S3 or something like Google Storage which exposes the same API as S3.

S3 configuration parameters are


|Property|Possible Values|Description|Default|
|--------|---------------|-----------|-------|
|`druid.s3.accessKey`||S3 access key.|Must be set.|
|`druid.s3.secretKey`||S3 secret key.|Must be set.|
|`druid.storage.bucket`||Bucket to store in.|Must be set.|
|`druid.storage.baseKey`||Base key prefix to use, i.e. what directory.|Must be set.|

### HDFS
## HDFS

In order to use hdfs for deep storage, you need to set the following configuration in your common configs.

Expand All @@ -46,44 +42,3 @@ In order to use hdfs for deep storage, you need to set the following configurati
|`druid.storage.storageDirectory`||Directory for storing segments.|Must be set.|

If you are using the Hadoop indexer, set your output directory to be a location on Hadoop and it will work

## Community Contributed Deep Stores

### Cassandra

[Apache Cassandra](http://www.datastax.com/what-we-offer/products-services/datastax-enterprise/apache-cassandra) can also be leveraged for deep storage. This requires some additional druid configuration as well as setting up the necessary schema within a Cassandra keystore.

Please note that this is a community contributed module and does not support Cassandra 2.x or hadoop-based batch indexing. For more information on using Cassandra as deep storage, see [Cassandra Deep Storage](../dependencies/cassandra-deep-storage.html).

## Azure

[Microsoft Azure Storage](http://azure.microsoft.com/en-us/services/storage/) is another option for deep storage. This requires some additional druid configuration.

|Property|Possible Values|Description|Default|
|--------|---------------|-----------|-------|
|`druid.storage.type`|azure||Must be set.|
|`druid.azure.account`||Azure Storage account name.|Must be set.|
|`druid.azure.key`||Azure Storage account key.|Must be set.|
|`druid.azure.container`||Azure Storage container name.|Must be set.|
|`druid.azure.protocol`|http or https||https|
|`druid.azure.maxTries`||Number of tries before cancel an Azure operation.|3|

Please note that this is a community contributed module. See [Azure Services](http://azure.microsoft.com/en-us/pricing/free-trial/) for more information.

### Rackspace

[Rackspace Cloud Files](http://www.rackspace.com/cloud/files/) is another option for deep storage. This requires some additional druid configuration.

|Property|Possible Values|Description|Default|
|--------|---------------|-----------|-------|
|`druid.storage.type`|cloudfiles||Must be set.|
|`druid.storage.region`||Rackspace Cloud Files region.|Must be set.|
|`druid.storage.container`||Rackspace Cloud Files container name.|Must be set.|
|`druid.storage.basePath`||Rackspace Cloud Files base path to use in the container.|Must be set.|
|`druid.storage.operationMaxRetries`||Number of tries before cancel a Rackspace operation.|10|
|`druid.cloudfiles.userName`||Rackspace Cloud username|Must be set.|
|`druid.cloudfiles.apiKey`||Rackspace Cloud api key.|Must be set.|
|`druid.cloudfiles.provider`|rackspace-cloudfiles-us,rackspace-cloudfiles-uk|Name of the provider depending on the region.|Must be set.|
|`druid.cloudfiles.useServiceNet`|true,false|Whether to use the internal service net.|true|

Please note that this is a community contributed module.
62 changes: 62 additions & 0 deletions docs/content/development/community-extensions/azure.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
---
layout: doc_page
---

# Microsoft Azure

## Deep Storage

[Microsoft Azure Storage](http://azure.microsoft.com/en-us/services/storage/) is another option for deep storage. This requires some additional druid configuration.

|Property|Possible Values|Description|Default|
|--------|---------------|-----------|-------|
|`druid.storage.type`|azure||Must be set.|
|`druid.azure.account`||Azure Storage account name.|Must be set.|
|`druid.azure.key`||Azure Storage account key.|Must be set.|
|`druid.azure.container`||Azure Storage container name.|Must be set.|
|`druid.azure.protocol`|http or https||https|
|`druid.azure.maxTries`||Number of tries before cancel an Azure operation.|3|

See [Azure Services](http://azure.microsoft.com/en-us/pricing/free-trial/) for more information.

## Firehose

#### StaticAzureBlobStoreFirehose

This firehose ingests events, similar to the StaticS3Firehose, but from an Azure Blob Store.

Data is newline delimited, with one JSON object per line and parsed as per the `InputRowParser` configuration.

The storage account is shared with the one used for Azure deep storage functionality, but blobs can be in a different container.

As with the S3 blobstore, it is assumed to be gzipped if the extension ends in .gz

Sample spec:

```json
"firehose" : {
"type" : "static-azure-blobstore",
"blobs": [
{
"container": "container",
"path": "/path/to/your/file.json"
},
{
"container": "anothercontainer",
"path": "/another/path.json"
}
]
}
```

|property|description|default|required?|
|--------|-----------|-------|---------|
|type|This should be "static-azure-blobstore".|N/A|yes|
|blobs|JSON array of [Azure blobs](https://msdn.microsoft.com/en-us/library/azure/ee691964.aspx).|N/A|yes|

Azure Blobs:

|property|description|default|required?|
|--------|-----------|-------|---------|
|container|Name of the azure [container](https://azure.microsoft.com/en-us/documentation/articles/storage-dotnet-how-to-use-blobs/#create-a-container)|N/A|yes|
|path|The path where data is located.|N/A|yes|
9 changes: 9 additions & 0 deletions docs/content/development/community-extensions/cassandra.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
---
layout: doc_page
---

# Apache Cassandra

[Apache Cassandra](http://www.datastax.com/what-we-offer/products-services/datastax-enterprise/apache-cassandra) can also
be leveraged for deep storage. This requires some additional druid configuration as well as setting up the necessary
schema within a Cassandra keystore.
65 changes: 65 additions & 0 deletions docs/content/development/community-extensions/cloudfiles.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
---
layout: doc_page
---

# Rackspace Cloud Files

## Deep Storage

[Rackspace Cloud Files](http://www.rackspace.com/cloud/files/) is another option for deep storage. This requires some additional druid configuration.

|Property|Possible Values|Description|Default|
|--------|---------------|-----------|-------|
|`druid.storage.type`|cloudfiles||Must be set.|
|`druid.storage.region`||Rackspace Cloud Files region.|Must be set.|
|`druid.storage.container`||Rackspace Cloud Files container name.|Must be set.|
|`druid.storage.basePath`||Rackspace Cloud Files base path to use in the container.|Must be set.|
|`druid.storage.operationMaxRetries`||Number of tries before cancel a Rackspace operation.|10|
|`druid.cloudfiles.userName`||Rackspace Cloud username|Must be set.|
|`druid.cloudfiles.apiKey`||Rackspace Cloud api key.|Must be set.|
|`druid.cloudfiles.provider`|rackspace-cloudfiles-us,rackspace-cloudfiles-uk|Name of the provider depending on the region.|Must be set.|
|`druid.cloudfiles.useServiceNet`|true,false|Whether to use the internal service net.|true|

## Firehose

#### StaticCloudFilesFirehose

This firehose ingests events, similar to the StaticAzureBlobStoreFirehose, but from Rackspace's Cloud Files.

Data is newline delimited, with one JSON object per line and parsed as per the `InputRowParser` configuration.

The storage account is shared with the one used for Racksapce's Cloud Files deep storage functionality, but blobs can be in a different region and container.

As with the Azure blobstore, it is assumed to be gzipped if the extension ends in .gz

Sample spec:

```json
"firehose" : {
"type" : "static-cloudfiles",
"blobs": [
{
"region": "DFW"
"container": "container",
"path": "/path/to/your/file.json"
},
{
"region": "ORD"
"container": "anothercontainer",
"path": "/another/path.json"
}
]
}
```

|property|description|default|required?|
|--------|-----------|-------|---------|
|type|This should be "static-cloudfiles".|N/A|yes|
|blobs|JSON array of Cloud Files blobs.|N/A|yes|

Cloud Files Blobs:

|property|description|default|required?|
|--------|-----------|-------|---------|
|container|Name of the Cloud Files container|N/A|yes|
|path|The path where data is located.|N/A|yes|
Original file line number Diff line number Diff line change
@@ -1,9 +1,15 @@
## introduction
---
layout: doc_page
---

# Graphite Emitter

## Introduction

This extension emits druid metrics to a graphite carbon server.
Events are sent after been [pickled](http://graphite.readthedocs.org/en/latest/feeding-carbon.html#the-pickle-protocol); the size of the batch is configurable.

## configuration
## Configuration

All the configuration parameters for graphite emitter are under `druid.emitter.graphite`.

Expand Down Expand Up @@ -69,4 +75,4 @@ druid.emitter.graphite.eventConverter={"type":"whiteList", "namespacePrefix": "d

```

**Druid emits a huge number of metrics we highly recommend to use the `whiteList` converter**
**Druid emits a huge number of metrics we highly recommend to use the `whiteList` converter**
Original file line number Diff line number Diff line change
@@ -1,19 +1,23 @@
---
layout: doc_page
---
# KafkaSimpleConsumerFirehose

# Kafka Simple Consumer

## Firehose

This is an experimental firehose to ingest data from kafka using kafka simple consumer api. Currently, this firehose would only work inside standalone realtime nodes.
The configuration for KafkaSimpleConsumerFirehose is similar to the KafkaFirehose [Kafka firehose example](../ingestion/stream-pull.html#realtime-specfile), except `firehose` should be replaced with `firehoseV2` like this:
The configuration for KafkaSimpleConsumerFirehose is similar to the Kafka Eight Firehose , except `firehose` should be replaced with `firehoseV2` like this:

```json
"firehoseV2": {
"type" : "kafka-0.8-v2",
"brokerList" : ["localhost:4443"],
"queueBufferLength":10001,
"resetOffsetToEarliest":"true",
"partitionIdList" : ["0"],
"clientId" : "localclient",
"feed": "wikipedia"
"type" : "kafka-0.8-v2",
"brokerList" : ["localhost:4443"],
"queueBufferLength":10001,
"resetOffsetToEarliest":"true",
"partitionIdList" : ["0"],
"clientId" : "localclient",
"feed": "wikipedia"
}
```

Expand Down
59 changes: 59 additions & 0 deletions docs/content/development/community-extensions/rabbitmq.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
---
layout: doc_page
---

# RabbitMQ

## Firehose

#### RabbitMQFirehose

This firehose ingests events from a define rabbit-mq queue.

**Note:** Add **amqp-client-3.2.1.jar** to lib directory of druid to use this firehose.

A sample spec for rabbitmq firehose:

```json
"firehose" : {
"type" : "rabbitmq",
"connection" : {
"host": "localhost",
"port": "5672",
"username": "test-dude",
"password": "test-word",
"virtualHost": "test-vhost",
"uri": "amqp://mqserver:1234/vhost"
},
"config" : {
"exchange": "test-exchange",
"queue" : "druidtest",
"routingKey": "#",
"durable": "true",
"exclusive": "false",
"autoDelete": "false",
"maxRetries": "10",
"retryIntervalSeconds": "1",
"maxDurationSeconds": "300"
}
}
```

|property|description|default|required?|
|--------|-----------|-------|---------|
|type|This should be "rabbitmq"|N/A|yes|
|host|The hostname of the RabbitMQ broker to connect to|localhost|no|
|port|The port number to connect to on the RabbitMQ broker|5672|no|
|username|The username to use to connect to RabbitMQ|guest|no|
|password|The password to use to connect to RabbitMQ|guest|no|
|virtualHost|The virtual host to connect to|/|no|
|uri|The URI string to use to connect to RabbitMQ| |no|
|exchange|The exchange to connect to| |yes|
|queue|The queue to connect to or create| |yes|
|routingKey|The routing key to use to bind the queue to the exchange| |yes|
|durable|Whether the queue should be durable|false|no|
|exclusive|Whether the queue should be exclusive|false|no|
|autoDelete|Whether the queue should auto-delete on disconnect|false|no|
|maxRetries|The max number of reconnection retry attempts| |yes|
|retryIntervalSeconds|The reconnection interval| |yes|
|maxDurationSeconds|The max duration of trying to reconnect| |yes|
7 changes: 7 additions & 0 deletions docs/content/development/community-extensions/rocketmq.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
layout: doc_page
---

# RocketMQ

Original author: [https://github.com/lizhanhui](https://github.com/lizhanhui).
Loading

0 comments on commit 4c3a3f8

Please sign in to comment.