-
Notifications
You must be signed in to change notification settings - Fork 51
Home
Jörn Franke edited this page Jul 19, 2018
·
55 revisions
Welcome to the hadoopcryptoledger wiki!
hadoopcryptoledger is a library for processing crypto ledgers, such as the Bitcoin and Ethereum blockchain, on Hadoop and ecosystem components (e.g. Spark/Hive). It allows you analysing them and combining them with other data, such as stock markets, criminal evidence or weather patterns.
It contains the following components:
-
Hadoop File Format to enable any MapReduce/Tez/Spark application to read blocks and transactions from files containing crypto ledger data in HDFS. This format supports the original mapreduce api (mapred.*) and the alternative mapreduce api (mapreduce.*)
- Javadoc, Unit Test Results, Mutation Test Results (deactivated until PITest supports JUnit5), Security: OWASP Dependency Check (Note: All results related to HadoopLibraries dependent on the version used by your Hadoop distribution!)
- Hive Serde for making blocks and transactions from files containing crypto ledger data in HDFS available as tables in Hive
- Hive UDF for providing CryptoLedger specific functionality to facilitate working with them in Hive
- Spark Datasource to use the HadoopCryptoLedger library via the Spark DataSource
- Flink Datasource to use the HadoopCryptoLedger library in Apache Flink
Currently supported JDKs: 7 and 8.
Find here some HowTo-Guides:
- Bitcoin and Altcoins
- MapReduce: Count the number of transactions from files containing Bitcoin Blockchain data
- MapReduce: Count the total number of inputs of all transactions from files containing Bitcoin Blockchain data
- Spark: Use Spark to count the number of transactions from files containing Bitcoin Blockchain data
- Spark: Use Spark and Scala with Bitcoin Blockchain data
- Hive: Using Hive to analyze Bitcoin Blockchain data
- Hive: Use the HadoopCryptoLedger UDF to ease processing of Bitcoin specific data in Hive
- Spark: Using Spark+Scala+Graphx to analyze the Bitcoin transaction graph
- Spark: Use HadoopCrytoLedger library as Spark datasource
- Flink: Analyzing the Bitcoin Blockchain with Apache Flink
- Spark: Analyse Litecoin data using Apache Spark
- Spark: Analyse Namecoin data using Apache Spark
- Hive: Use Hive to analyse Namecoin data
- Support for Altcoins based on Bitcoin (e.g. Litecoin, Namecoin)
- Ethereum and Altcoins
- MapReduce: Count the number of transactions from files containing Ethereum Blockchain data
- Hive: Using Hive to analyze Ethereum Blockchain data
- Flink: Analyzing the Ethereum Blockchain with Apache Flink
- Spark: Use Spark to count the number of transactions from files containing Ethereum Blockchain data
- Spark: Use HadoopCrytoLedger library as Spark datasource to read Ethereum data
- Fetching Blockchain data - fetch blockchain data for analysis: Bitcoin, Ethereum, NameCoin, Litecoin etc.
- Useful Utility functions - for analysing Blockchains
- Recommended practice: ELT HIVE process for analyzing blockchain
Find here the status from the continuous integration (CI) platform:
Find here the status from the static code analyzer platform:
- Sonarqube: https://sonarcloud.io/dashboard?id=ZuInnoTe%3Ahadoopcryptoledger
- Codacy (includes also Scala): https://www.codacy.com/app/jornfranke/hadoopcryptoledger
Find here the OpenHub report.
Find here some release notes
Join us on Gitter.im