New extension loading mechanism

1) Remove maven client from downloading extensions at runtime. 2) Provide a way to load Druid extensions and hadoop dependencies through file system. 3) Refactor pull-deps so that it can download extensions into extension directories. 4) Add documents on how to use this new extension loading mechanism. 5) Change the way how Druid tarball is generated. Now all the extensions + hadoop-client 2.3.0 are packaged within the Druid tarball.
Igosuki · Oct 21, 2015 · 4914925 · 4914925
1 parent b7c68ec
commit 4914925
Show file tree

Hide file tree

Showing 27 changed files with 1,320 additions and 558 deletions.
diff --git a/distribution/pom.xml b/distribution/pom.xml
@@ -38,24 +38,69 @@
             <artifactId>druid-services</artifactId>
             <version>${project.parent.version}</version>
         </dependency>
-        <dependency>
-            <groupId>io.druid</groupId>
-            <artifactId>extensions-distribution</artifactId>
-            <version>${project.parent.version}</version>
-            <classifier>extensions-repo</classifier>
-            <type>zip</type>
-        </dependency>
     </dependencies>
 
     <build>
         <plugins>
+            <plugin>
+                <groupId>org.codehaus.mojo</groupId>
+                <artifactId>exec-maven-plugin</artifactId>
+                <executions>
+                    <execution>
+                        <phase>install</phase>
+                        <goals>
+                            <goal>exec</goal>
+                        </goals>
+                        <configuration>
+                            <executable>java</executable>
+                            <arguments>
+                                <argument>-classpath</argument>
+                                <classpath/>
+                                <argument>-Ddruid.extensions.loadList=[]</argument>
+                                <argument>io.druid.cli.Main</argument>
+                                <argument>tools</argument>
+                                <argument>pull-deps</argument>
+                                <argument>--clean</argument>
+                                <argument>--defaultVersion</argument>
+                                <argument>${project.parent.version}</argument>
+                                <argument>-c</argument>
+                                <argument>io.druid.extensions:druid-examples</argument>
+                                <argument>-c</argument>
+                                <argument>io.druid.extensions:druid-azure-extensions</argument>
+                                <argument>-c</argument>
+                                <argument>io.druid.extensions:druid-cassandra-storage</argument>
+                                <argument>-c</argument>
+                                <argument>io.druid.extensions:druid-hdfs-storage</argument>
+                                <argument>-c</argument>
+                                <argument>io.druid.extensions:druid-histogram</argument>
+                                <argument>-c</argument>
+                                <argument>io.druid.extensions:druid-kafka-eight</argument>
+                                <argument>-c</argument>
+                                <argument>io.druid.extensions:druid-kafka-eight-simple-consumer</argument>
+                                <argument>-c</argument>
+                                <argument>io.druid.extensions:druid-kafka-extraction-namespace</argument>
+                                <argument>-c</argument>
+                                <argument>io.druid.extensions:mysql-metadata-storage</argument>
+                                <argument>-c</argument>
+                                <argument>io.druid.extensions:druid-namespace-lookup</argument>
+                                <argument>-c</argument>
+                                <argument>io.druid.extensions:postgresql-metadata-storage</argument>
+                                <argument>-c</argument>
+                                <argument>io.druid.extensions:druid-rabbitmq</argument>
+                                <argument>-c</argument>
+                                <argument>io.druid.extensions:druid-s3-extensions</argument>
+                            </arguments>
+                        </configuration>
+                    </execution>
+                </executions>
+            </plugin>
             <plugin>
                 <groupId>org.apache.maven.plugins</groupId>
                 <artifactId>maven-assembly-plugin</artifactId>
                 <executions>
                     <execution>
                         <id>distro-assembly</id>
-                        <phase>package</phase>
+                        <phase>install</phase>
                         <goals>
                             <goal>single</goal>
                         </goals>
@@ -67,6 +112,20 @@
                             </descriptors>
                         </configuration>
                     </execution>
+                    <execution>
+                        <id>mysql-distro-assembly</id>
+                        <phase>install</phase>
+                        <goals>
+                            <goal>single</goal>
+                        </goals>
+                        <configuration>
+                            <finalName>mysql-metdata-storage</finalName>
+                            <tarLongFileMode>posix</tarLongFileMode>
+                            <descriptors>
+                                <descriptor>src/assembly/mysql_assembly.xml</descriptor>
+                            </descriptors>
+                        </configuration>
+                    </execution>
                 </executions>
             </plugin>
             <plugin>
@@ -81,6 +140,20 @@
                     </execution>
                 </executions>
             </plugin>
+            <plugin>
+                <groupId>org.apache.maven.plugins</groupId>
+                <artifactId>maven-clean-plugin</artifactId>
+                <configuration>
+                    <filesets>
+                        <fileset>
+                            <directory>${project.basedir}/druid_extensions</directory>
+                        </fileset>
+                        <fileset>
+                            <directory>${project.basedir}/hadoop_druid_dependencies</directory>
+                        </fileset>
+                    </filesets>
+                </configuration>
+            </plugin>
         </plugins>
     </build>
 </project>
diff --git a/distribution/src/assembly/assembly.xml b/distribution/src/assembly/assembly.xml
@@ -24,6 +24,23 @@
         <format>tar.gz</format>
     </formats>
     <fileSets>
+        <fileSet>
+            <directory>druid_extensions</directory>
+            <includes>
+                <include>*/*</include>
+            </includes>
+            <excludes>
+                <exclude>mysql-metadata-storage/**</exclude>
+            </excludes>
+            <outputDirectory>druid_extensions</outputDirectory>
+        </fileSet>
+        <fileSet>
+            <directory>hadoop_druid_dependencies</directory>
+            <includes>
+                <include>*/*/*</include>
+            </includes>
+            <outputDirectory>hadoop_druid_dependencies</outputDirectory>
+        </fileSet>
         <fileSet>
             <directory>../examples/config</directory>
             <includes>

diff --git a/distribution/src/assembly/mysql_assembly.xml b/distribution/src/assembly/mysql_assembly.xml
@@ -0,0 +1,35 @@
+<?xml version="1.0"?>
+<!--
+  ~ Druid - a distributed column store.
+  ~ Copyright 2012 - 2015 Metamarkets Group Inc.
+  ~
+  ~ Licensed under the Apache License, Version 2.0 (the "License");
+  ~ you may not use this file except in compliance with the License.
+  ~ You may obtain a copy of the License at
+  ~
+  ~     http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing, software
+  ~ distributed under the License is distributed on an "AS IS" BASIS,
+  ~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  ~ See the License for the specific language governing permissions and
+  ~ limitations under the License.
+  -->
+
+<assembly xmlns="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.3"
+          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+          xsi:schemaLocation="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.3 http://maven.apache.org/xsd/assembly-1.1.3.xsd">
+    <id>bin</id>
+    <formats>
+        <format>tar.gz</format>
+    </formats>
+    <fileSets>
+        <fileSet>
+            <directory>druid_extensions/mysql-metadata-storage</directory>
+            <includes>
+                <include>*</include>
+            </includes>
+            <outputDirectory>./</outputDirectory>
+        </fileSet>
+    </fileSets>
+</assembly>
diff --git a/docs/content/configuration/index.md b/docs/content/configuration/index.md
@@ -21,10 +21,9 @@ Many of Druid's external dependencies can be plugged in as modules. Extensions c
 
 |Property|Description|Default|
 |--------|-----------|-------|
-|`druid.extensions.remoteRepositories`|This is a JSON Array list of remote repositories to load dependencies from. If this is not set to '[]', Druid will try to download extensions at the specified remote repository.|["http://repo1.maven.org/maven2/", "https://metamx.artifactoryonline.com/metamx/pub-libs-releases-local"]|
-|`druid.extensions.localRepository`|.  The way maven gets dependencies is that it downloads them to a "local repository" on your local disk and then collects the paths to each of the jars.  This specifies the directory to consider the "local repository". If this is set, remoteRepositories is not required.|`~/.m2/repository`|
-|`druid.extensions.coordinates`|This is a JSON array of "groupId:artifactId[:version]" maven coordinates. For artifacts without version specified, Druid will append the default version. Notice: extensions explicitly specified in this property will have precedence over ones included in the classpath when Druid loads extensions. If there are duplicate extensions, Druid will only load ones explicitly specified here|[]|
-|`druid.extensions.defaultVersion`|Version to use for extension artifacts without version information.|`druid-server` artifact version.|
+|`druid.extensions.directory`|The root extension directory where user can put extensions related files. Druid will load extensions stored under this directory.|`druid_extensions` (This is a relative path to Druid's working directory)|
+|`druid.extensions.hadoopDependenciesDir`|The root hadoop dependencies directory where user can put hadoop related dependencies files. Druid will load the dependencies based on the hadoop coordinate specified in the hadoop index task.|`hadoop_druid_dependencies` (This is a relative path to Druid's working directory|
+|`druid.extensions.loadList`|A JSON array of extensions to load from extension directories by Druid. If it is not specified, its value will be `null` and Druid will load all the extensions under `druid.extensions.directory`. If its value is empty list `[]`, then no extensions will be loaded at all.|null|
 |`druid.extensions.searchCurrentClassloader`|This is a boolean flag that determines if Druid will search the main classloader for extensions.  It defaults to true but can be turned off if you have reason to not automatically add all modules on the classpath.|true|
 
 ### Zookeeper

diff --git a/docs/content/dependencies/metadata-storage.md b/docs/content/dependencies/metadata-storage.md
@@ -15,8 +15,13 @@ The following metadata storage engines are supported:
 * MySQL (io.druid.extensions:mysql-metadata-storage)
 * PostgreSQL (io.druid.extensions:postgresql-metadata-storage)
 
-To choose a metadata storage, set the `druid.extensions` configuration to
-include the extension for the metadata storage you plan to use.
+To choose a metadata storage,
+
+1. Make sure Druid can pick up the extension files from either classpath or
+extensions directory, see [Including Extensions](../operations/including-extensions.html) for more information.
+
+2. set the `druid.extensions` configuration to include the extension for the
+metadata storage you plan to use. See below.
 
 
 ## Setting up MySQL
@@ -55,13 +60,18 @@ include the extension for the metadata storage you plan to use.
   with the hostname of the database.
 
   ```properties
-  druid.extensions.coordinates=[\"io.druid.extensions:mysql-metadata-storage"]
+  druid.extensions.loadList=["mysql-metadata-storage"]
   druid.metadata.storage.type=mysql
   druid.metadata.storage.connector.connectURI=jdbc:mysql://<host>/druid_test
   druid.metadata.storage.connector.user=druid
   druid.metadata.storage.connector.password=diurd
   ```
 
+  Note: metadata storage extension is not packaged within the main Druid tarball, it is
+  packaged in a separate tarball that can be downloaded from [here](http://druid.io/downloads.html).
+  However, you can always get it using [pull-deps](../pull-deps.html), or you can even build
+  it from source code, see [Build from Source](../development/build.html)
+
 ## Setting up PostgreSQL
 
 1. Install PostgreSQL
@@ -97,7 +107,7 @@ include the extension for the metadata storage you plan to use.
   with the hostname of the database.
 
   ```properties
-  druid.extensions.coordinates=[\"io.druid.extensions:postgresql-metadata-storage"]
+  druid.extensions.loadList=["postgresql-metadata-storage"]
   druid.metadata.storage.type=postgresql
   druid.metadata.storage.connector.connectURI=jdbc:postgresql://<host>/druid_test
   druid.metadata.storage.connector.user=druid

diff --git a/docs/content/development/build.md b/docs/content/development/build.md
@@ -16,11 +16,14 @@ To do so, run these commands:
 ```
 git clone [email protected]:druid-io/druid.git
 cd druid
-mvn clean package
+mvn clean install
 ```
 
 This will compile the project and create the Druid binary distribution tar under
-`services/target/druid-VERSION-bin.tar.gz`.
+`distribution/target/druid-VERSION-bin.tar.gz`.
+
+This will also create a tarball that contains `mysql-metadata-storage` extension under
+ `distribution/target/mysql-metdata-storage-bin.tar.gz`. If you want Druid to load `mysql-metadata-storage`, you can first untar `druid-VERSION-bin.tar.gz`, then go to ```druid-<version>/druid_extensions```, untar `mysql-metdata-storage-bin.tar.gz` there. Now just specifiy `mysql-metadata-storage` in `druid.extensions.loadList` so that Druid will pick it up. See [Including Extensions](../operations/including-extensions.html) for more infomation.
 
 You can find the example executables in the examples/bin directory:
 

diff --git a/docs/content/ingestion/batch-ingestion.md b/docs/content/ingestion/batch-ingestion.md
@@ -413,7 +413,7 @@ The tuningConfig is optional and default parameters will be used if no tuningCon
 
 ### Running the Task
 
-The Hadoop Index Config submitted as part of an Hadoop Index Task is identical to the Hadoop Index Config used by the `HadoopDruidIndexer` except that three fields must be omitted: `segmentOutputPath`, `workingPath`, `updaterJobSpec`. The Indexing Service takes care of setting these fields internally.
+The Hadoop Index Config submitted as part of an Hadoop Index Task is identical to the Hadoop Index Config used by the `HadoopDruidIndexer` except that three fields must be omitted: `segmentOutputPath`, `workingPath`, `metadataUpdateSpec`. The Indexing Service takes care of setting these fields internally.
 
 To run the task:
 

diff --git a/docs/content/misc/tasks.md b/docs/content/misc/tasks.md
@@ -123,7 +123,7 @@ The indexSpec is optional and default parameters will be used if not specified.
 |dimensionCompression|compression format for dimension columns (currently only affects single-value dimensions, multi-value dimensions are always uncompressed)|`"uncompressed"`, `"lz4"`, `"lzf"`|`"lz4"`|no|
 |metricCompression|compression format for metric columns, defaults to LZ4|`"lz4"`, `"lzf"`|`"lz4"`|no|
 
-### Index Hadoop Task
+### Hadoop Index Task
 
 The Hadoop Index Task is used to index larger data sets that require the parallelization and processing power of a Hadoop cluster.
 
@@ -138,14 +138,17 @@ The Hadoop Index Task is used to index larger data sets that require the paralle
 |--------|-----------|---------|
 |type|The task type, this should always be "index_hadoop".|yes|
 |spec|A Hadoop Index Spec. See [Batch Ingestion](../ingestion/batch-ingestion.html)|yes|
-|hadoopCoordinates|The Maven \<groupId\>:\<artifactId\>:\<version\> of Hadoop to use. The default is "org.apache.hadoop:hadoop-client:2.3.0".|no|
+|hadoopDependencyCoordinates|A JSON array of Hadoop dependency coordinates that Druid will use, this property will override the default Hadoop coordinates. Once specified, Druid will look for those Hadoop dependencies from the location specified by `druid.extensions.hadoopDependenciesDir`|no|
+|classpathPrefix|Classpath that will be pre-appended for the peon process.|no|
 
+The Hadoop Index Config submitted as part of an Hadoop Index Task is identical to the Hadoop Index Config used by the `HadoopDruidIndexer` except that three fields must be omitted: `segmentOutputPath`, `workingPath`, `metadataUpdateSpec`. The Indexing Service takes care of setting these fields internally.
 
-The Hadoop Index Config submitted as part of an Hadoop Index Task is identical to the Hadoop Index Config used by the `HadoopDruidIndexer` except that three fields must be omitted: `segmentOutputPath`, `workingPath`, `updaterJobSpec`. The Indexing Service takes care of setting these fields internally.
+Note: Before using Hadoop Index Task, please make sure to include Hadoop dependencies so that Druid knows where to pick them up during runtime, see [Include Hadoop Dependencies](../operations/other-hadoop.html).
+Druid uses hadoop-client 2.3.0 as the default Hadoop version, you can get it from the released Druid tarball(under folder ```hadoop_druid_dependencies```) or use [pull-deps](../pull-deps.html).
 
 #### Using your own Hadoop distribution
 
-Druid is compiled against Apache hadoop-client 2.3.0. However, if you happen to use a different flavor of hadoop that is API compatible with hadoop-client 2.3.0, you should only have to change the hadoopCoordinates property to point to the maven artifact used by your distribution. For non-API compatible versions, please see [here](../operations/other-hadoop.html).
+Druid is compiled against Apache hadoop-client 2.3.0. However, if you happen to use a different flavor of Hadoop that is API compatible with hadoop-client 2.3.0, you should first make sure Druid knows where to pick it up, then you should only have to change the `hadoopDependencyCoordinates` property to point to the list of maven artifact used by your distribution. For non-API compatible versions and more information, please see [here](../operations/other-hadoop.html).
 
 #### Resolving dependency conflicts running HadoopIndexTask