tin6150 / pyspark Public

Notifications You must be signed in to change notification settings
Fork 0
Star 0

snipplet of bigdata python that works in apache spark, bridge betwen taxonomy_reporter and its spark inspiration

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.rst		README.rst
databrick_6.py		databrick_6.py
node2trace.py		node2trace.py
passwd2person.sh		passwd2person.sh
person.txt		person.txt
ref.txt		ref.txt
spark_1.py		spark_1.py
spark_2.py		spark_2.py
spark_5.py		spark_5.py
spark_acc2taxid.py		spark_acc2taxid.py
spark_person_1.py		spark_person_1.py
spark_person_2.py		spark_person_2.py
spark_sql3.py		spark_sql3.py
spark_sql4.py		spark_sql4.py
spark_sql_cloudera1.py		spark_sql_cloudera1.py
trace_load.py		trace_load.py

Repository files navigation

pyspark

This is the public facing repo. It will not contain private IP data.

contains various code snipplet for spark using python trial code as taxonomy_reporter is revamped to use hadoop spark and SparkSQL

Plan is to code generic, non-IP code here with personal hobby time. Then if anything is usable later on, it maybe "forked" or migrated to private repo (eg in bitbucket).

eg:: node2trace.py, trace_load.py tba --> taxoTraceTbl.py

config:

git init

git config --global user.email "[email protected]"
git config --global user.name tin
git config --global credential.helper 'cache --timeout=3600'
git config --global github.user   tin6150

git add *
git commit -m "first commit"
git remote add origin https://github.com/tin6150/pyspark.git
git push -u origin master