forked from apache/druid
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* move ProtoBufInputRowParser from processing module to protobuf extensions * Ported PR apache#3509 * add DynamicMessage * fix local test stuff that slipped in * add license header * removed redundant type name * removed commented code * fix code style * rename ProtoBuf -> Protobuf * pom.xml: shade protobuf classes, handle .desc resource file as binary file * clean up error messages * pick first message type from descriptor if not specified * fix protoMessageType null check. add test case * move protobuf-extension from contrib to core * document: add new configuration keys, and descriptions * update document. add examples * move protobuf-extension from contrib to core (2nd try) * touch * include protobuf extensions in the distribution * fix whitespace * include protobuf example in the distribution * example: create new pb obj everytime * document: use properly quoted json * fix whitespace * bump parent version to 0.10.1-SNAPSHOT * ignore Override check * touch
- Loading branch information
Showing
26 changed files
with
4,230 additions
and
1,412 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,203 @@ | ||
--- | ||
layout: doc_page | ||
--- | ||
|
||
# Protobuf | ||
|
||
This extension enables Druid to ingest and understand the Protobuf data format. Make sure to [include](../../operations/including-extensions.html) `druid-protobuf-extensions` as an extension. | ||
|
||
## Protobuf Parser | ||
|
||
|
||
| Field | Type | Description | Required | | ||
|-------|------|-------------|----------| | ||
| type | String | This should say `protobuf`. | no | | ||
| descriptor | String | Protobuf descriptor file name in the classpath or URL. | yes | | ||
| protoMessageType | String | Protobuf message type in the descriptor. Both short name and fully qualified name are accepted. The parser uses the first message type found in the descriptor if not specified. | no | | ||
| parseSpec | JSON Object | Specifies the timestamp and dimensions of the data. The format must be json. See [JSON ParseSpec](../../ingestion/index.html) for more configuration options. Please note timeAndDims parseSpec is no longer supported. | yes | | ||
|
||
## Example: Load Protobuf messages from Kafka | ||
|
||
This example demonstrates how to load Protobuf messages from Kafka. Please read the [Load from Kafka tutorial](../../tutorial/tutorial-kafka.html) first. This example will use the same "metrics" dataset. | ||
|
||
Files used in this example are found at `./examples/quickstart/protobuf` in your Druid directory. | ||
|
||
- We will use [Kafka Indexing Service](./kafka-ingestion.html) instead of Tranquility. | ||
- Kafka broker host is `localhost:9092`. | ||
- Kafka topic is `metrics_pb` instead of `metrics`. | ||
- datasource name is `metrics-kafka-pb` instead of `metrics-kafka` to avoid the confusion. | ||
|
||
Here is the metrics JSON example. | ||
|
||
```json | ||
{ | ||
"unit": "milliseconds", | ||
"http_method": "GET", | ||
"value": 44, | ||
"timestamp": "2017-04-06T02:36:22Z", | ||
"http_code": "200", | ||
"page": "/", | ||
"metricType": "request/latency", | ||
"server": "www1.example.com" | ||
} | ||
``` | ||
|
||
### Proto file | ||
|
||
The proto file should look like this. Save it as metrics.proto. | ||
|
||
``` | ||
syntax = "proto3"; | ||
message Metrics { | ||
string unit = 1; | ||
string http_method = 2; | ||
int32 value = 3; | ||
string timestamp = 4; | ||
string http_code = 5; | ||
string page = 6; | ||
string metricType = 7; | ||
string server = 8; | ||
} | ||
``` | ||
|
||
### Descriptor file | ||
|
||
Using the `protoc` Protobuf compiler to generate the descriptor file. Save the metrics.desc file either in the classpath or reachable by URL. In this example the descriptor file was saved at /tmp/metrics.desc. | ||
|
||
``` | ||
protoc -o /tmp/metrics.desc metrics.proto | ||
``` | ||
|
||
### Supervisor spec JSON | ||
|
||
Below is the complete Supervisor spec JSON to be submitted to the Overlord. | ||
Please make sure these keys are properly configured for successful ingestion. | ||
|
||
- `descriptor` for the descriptor file URL. | ||
- `protoMessageType` from the proto definition. | ||
- parseSpec `format` must be `json`. | ||
- `topic` to subscribe. The topic is "metrics_pb" instead of "metrics". | ||
- `bootstrap.server` is the kafka broker host. | ||
|
||
```json | ||
{ | ||
"type": "kafka", | ||
"dataSchema": { | ||
"dataSource": "metrics-kafka2", | ||
"parser": { | ||
"type": "protobuf", | ||
"descriptor": "file:///tmp/metrics.desc", | ||
"protoMessageType": "Metrics", | ||
"parseSpec": { | ||
"format": "json", | ||
"timestampSpec": { | ||
"column": "timestamp", | ||
"format": "auto" | ||
}, | ||
"dimensionsSpec": { | ||
"dimensions": [ | ||
"unit", | ||
"http_method", | ||
"http_code", | ||
"page", | ||
"metricType", | ||
"server" | ||
], | ||
"dimensionExclusions": [ | ||
"timestamp", | ||
"value" | ||
] | ||
} | ||
} | ||
}, | ||
"metricsSpec": [ | ||
{ | ||
"name": "count", | ||
"type": "count" | ||
}, | ||
{ | ||
"name": "value_sum", | ||
"fieldName": "value", | ||
"type": "doubleSum" | ||
}, | ||
{ | ||
"name": "value_min", | ||
"fieldName": "value", | ||
"type": "doubleMin" | ||
}, | ||
{ | ||
"name": "value_max", | ||
"fieldName": "value", | ||
"type": "doubleMax" | ||
} | ||
], | ||
"granularitySpec": { | ||
"type": "uniform", | ||
"segmentGranularity": "HOUR", | ||
"queryGranularity": "NONE" | ||
} | ||
}, | ||
"tuningConfig": { | ||
"type": "kafka", | ||
"maxRowsPerSegment": 5000000 | ||
}, | ||
"ioConfig": { | ||
"topic": "metrics_pb", | ||
"consumerProperties": { | ||
"bootstrap.servers": "localhost:9092" | ||
}, | ||
"taskCount": 1, | ||
"replicas": 1, | ||
"taskDuration": "PT1H" | ||
} | ||
} | ||
``` | ||
|
||
## Kafka Producer | ||
|
||
Here is the sample script that publishes the metrics to Kafka in Protobuf format. | ||
|
||
1. Run `protoc` again with the Python binding option. This command generates `metrics_pb2.py` file. | ||
``` | ||
protoc -o metrics.desc metrics.proto --python_out=. | ||
``` | ||
|
||
2. Create Kafka producer script. | ||
|
||
This script requires `protobuf` and `kafka-python` modules. | ||
|
||
```python | ||
#!/usr/bin/env python | ||
|
||
import sys | ||
import json | ||
|
||
from kafka import KafkaProducer | ||
from metrics_pb2 import Metrics | ||
|
||
producer = KafkaProducer(bootstrap_servers='localhost:9092') | ||
topic = 'metrics_pb' | ||
metrics = Metrics() | ||
|
||
for row in iter(sys.stdin): | ||
d = json.loads(row) | ||
for k, v in d.items(): | ||
setattr(metrics, k, v) | ||
pb = metrics.SerializeToString() | ||
producer.send(topic, pb) | ||
``` | ||
|
||
3. run producer | ||
|
||
``` | ||
./bin/generate-example-metrics | ./pb_publisher.py | ||
``` | ||
|
||
4. test | ||
|
||
``` | ||
kafka-console-consumer --zookeeper localhost --topic metrics_pb | ||
``` | ||
|
||
It should print messages like this | ||
> millisecondsGETR"2017-04-06T03:23:56Z*2002/list:request/latencyBwww1.example.com |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,71 @@ | ||
{ | ||
"type": "kafka", | ||
"dataSchema": { | ||
"dataSource": "metrics-kafka2", | ||
"parser": { | ||
"type": "protobuf", | ||
"descriptor": "file:///tmp/metrics.desc", | ||
"protoMessageType": "Metrics", | ||
"parseSpec": { | ||
"format": "json", | ||
"timestampSpec": { | ||
"column": "timestamp", | ||
"format": "auto" | ||
}, | ||
"dimensionsSpec": { | ||
"dimensions": [ | ||
"unit", | ||
"http_method", | ||
"http_code", | ||
"page", | ||
"metricType", | ||
"server" | ||
], | ||
"dimensionExclusions": [ | ||
"timestamp", | ||
"value" | ||
] | ||
} | ||
} | ||
}, | ||
"metricsSpec": [ | ||
{ | ||
"name": "count", | ||
"type": "count" | ||
}, | ||
{ | ||
"name": "value_sum", | ||
"fieldName": "value", | ||
"type": "doubleSum" | ||
}, | ||
{ | ||
"name": "value_min", | ||
"fieldName": "value", | ||
"type": "doubleMin" | ||
}, | ||
{ | ||
"name": "value_max", | ||
"fieldName": "value", | ||
"type": "doubleMax" | ||
} | ||
], | ||
"granularitySpec": { | ||
"type": "uniform", | ||
"segmentGranularity": "HOUR", | ||
"queryGranularity": "NONE" | ||
} | ||
}, | ||
"tuningConfig": { | ||
"type": "kafka", | ||
"maxRowsPerSegment": 5000000 | ||
}, | ||
"ioConfig": { | ||
"topic": "metrics_pb", | ||
"consumerProperties": { | ||
"bootstrap.servers": "localhost:9092" | ||
}, | ||
"taskCount": 1, | ||
"replicas": 1, | ||
"taskDuration": "PT1H" | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
syntax = "proto3"; | ||
message Metrics { | ||
string unit = 1; | ||
string http_method = 2; | ||
int32 value = 3; | ||
string timestamp = 4; | ||
string http_code = 5; | ||
string page = 6; | ||
string metricType = 7; | ||
string server = 8; | ||
} |
Oops, something went wrong.