Skip to content

MongoDB adapter for Hadoop. Small mongo hadoop pig patch to allow to use mongodb fields that starts with an underscore by prefixing them with u_ (e.g., u__id instead of _id).

Notifications You must be signed in to change notification settings

darthbear/mongo-hadoop

 
 

Repository files navigation

MongoDB Hadoop Adapter

The MongoDB Hadoop Adapter is a plugin for Hadoop that provides Hadoop the ability to use MongoDB as an input source and/or an output source.

CURRENT RELEASE: 1.0.0

This release primarly supports Hadoop 1.0 or Cloudera CDH3 Update 3 (Which ships 0.20.2). If you wish to use Hadoop Streaming with MongoDB, please see the notes on Streaming Hadoop versions below.

This product only supports MongoDB 2.0+; although it should (mostly) work with 1.8.x. We cannot provide support for legacy MongoDB builds.

Note: If you have questions please email the mongodb-user Mailing List, rather than directly contacting contributors or maintainers.

Maintainers

Contributors

Support

You will need the MongoDB Java Driver 2.7.3+.

Issue tracking: https://jira.mongodb.org/browse/HADOOP/

Discussion: http://groups.google.com/group/mongodb-user/

Documentation and Build Details: http://api.mongodb.org/hadoop/MongoDB%2BHadoop+Connector.html

Small update in this fork

Updated the mongodb hadoop pig adaptor so one can use field names that starts with an underscore (e.g., _id). Pig Latin does not allow to use underscore as the first character of a field. In order to use mongodb fields starting with an underscore, just prefix the field name with u_, so _id becomes u__id. When the field is stored in mongodb, we remove the u_ prefix. Please note that it is a simple fix and so the code is not optimized. To build the mongo hadoop pig package type: ./sbt mongo-hadoop-pig/package

About

MongoDB adapter for Hadoop. Small mongo hadoop pig patch to allow to use mongodb fields that starts with an underscore by prefixing them with u_ (e.g., u__id instead of _id).

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Java 67.8%
  • Python 13.1%
  • Scala 7.0%
  • Shell 6.2%
  • JavaScript 4.1%
  • Ruby 1.8%