> cd demo
> gradle jar
This is a demo application showing you how to integrate Hive with Cascading. The application creates a hive table, then populates it with data from the local file system. Then it uses the table to bootstrap a second table which is read by a pure Cascading flow and written to a third table. Finally the data from the third table is read back via Hive's JDBC support to show the seamless integration between the two.
> yarn jar build/libs/cascading-hive-demo-1.0.jar cascading.hive.HiveDemo
If you run the application against a local MetaStore it will create some files and directories, that you should remove afterwards, if you want to run the app again. In production deployments you will typically have a remote meta store, so that will not happen.
> rm -rf metastore_db/ derby.log TempStatsStore/
This demo shows how to create a partitioned hive table from a Cascading flow.
> yarn jar build/libs/cascading-hive-demo-1.0.jar cascading.hive.HivePartitionDemo
This demo will only work if you are using a hosted HiveMetaStore since the Cascading flow has to be able to register partitions in the MetaStore as they are created.
Demo that builds on top of the HivePartitionDemo, but creates a view via a HiveFlow and selects data via JDBC from that view
> yarn jar build/libs/cascading-hive-demo-1.0.jar cascading.hive.HiveViewDemo
Demo that uses the corc library to read records from a transactional Hive table that is backed by an ORC ACID dataset. Also obtains a shared read lock to ensure consistent reads when other clients might be mutating the table or the system performs compactions. Be sure that your installation supports ACID and has been configured as described on the Apache Hive Wiki.
> export YARN_OPTS="-Dhive.server.url=jdbc:hive2://localhost:10000/default \
-Dhive.server.user=root \
-Dhive.server.password=hadoop \
-Dhive.metastore.uris=thrift://sandbox.hortonworks.com:9083"
> yarn jar build/libs/cascading-hive-demo-1.0.jar cascading.hive.TransactionalTableDemo
Demo that copies access.log
into HDFS and makes it available in Hive as the table default.access_log
. It
uses a sink HCatTap
to copy the data from the file to the table. Then default.access_log
is read with a
source HCatTap
that selects ASIA and EU partitions and counts the number of records for each. The output
is stored in hdfs:/tmp/filtered_access.log/
.
> export YARN_OPTS="-Dhive.server.url=jdbc:hive2://localhost:10000/default \
-Dhive.server.user=root \
-Dhive.server.password=hadoop \
-Dhive.metastore.uris=thrift://sandbox.hortonworks.com:9083"
> yarn jar build/libs/cascading-hive-demo-1.0.jar cascading.hive.HCatTapDemo