-
Make sure you installed
Docker
andDocker-compose
. -
Spawn the container,
compose/build
- Open new tab / window for terminal and access bash inside that container,
compose/bash
- And you can start instructions below!
You can check HDFS file system, Utilities -> Browse the file system
- Get job id,
2018-10-28 07:35:18,170 INFO impl.YarnClientImpl: Submitted application application_1540702588430_0006
2018-10-28 07:35:18,261 INFO mapreduce.Job: The url to track the job: http://6de2d313baf4:8088/proxy/application_1540702588430_0006/
2018-10-28 07:35:18,268 INFO mapreduce.Job: Running job: job_1540702588430_0006
- kill the application,
yarn application -kill application_1540702588430_0006
Delete a HDFS directory,
hdfs dfs -rm -r /user/output_lower
Copy a HDFS directory to your local directory,
hadoop fs -get /user/output_lower
Copy a file to HDFS,
hadoop fs -put dictionary-test.json /user/dictionary-test.json
Distributing lowercase, lowercase.py
- Run hadoop streaming,
hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-3.1.1.jar -file lowercase.py -mapper lowercase.py -file reducer.py -reducer reducer.py -input /user/input_text/* -output /user/output_lower
- Copy the HDFS output to your local,
hadoop fs -get /user/output_lower
- Check output_lower/part-00000
i love you so much
the story is a shit
the story is really good
the story totally disgusting
Mapping word with dictionary, mapping.py
- Run hadoop streaming,
hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-3.1.1.jar -file mapping.py -file dictionary-test.json -mapper mapping.py -file reducer.py -reducer reducer.py -input /user/input_text/* -output /user/output_mapping
- Copy the HDFS output to your local,
hadoop fs -get /user/output_mapping
a: 5
disgusting: UNK
good: 45
i: 47
is: 9
is: 9
Distributing text classification, classification.py
You can check the architecture of the model, a very simple model just for an example, notebook
- Run hadoop streaming,
hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-3.1.1.jar -file classification.py -file dictionary-test.json -file frozen_model.pb -mapper classification.py -file reducer.py -reducer reducer.py -input /user/input_text/* -output /user/output_classification
- Copy the HDFS output to your local,
hadoop fs -get /user/output_classification
i love you so much: negative
the story is a shit: positive
the story is really good: positive
the story totally disgusting: negative