GitHub - tin6150/pyspark: snipplet of bigdata python that works in apache spark, bridge betwen taxonomy_reporter and its spark inspiration

pyspark

This is the public facing repo. It will not contain private IP data.

contains various code snipplet for spark using python trial code as taxonomy_reporter is revamped to use hadoop spark and SparkSQL

Plan is to code generic, non-IP code here with personal hobby time. Then if anything is usable later on, it maybe "forked" or migrated to private repo (eg in bitbucket).

eg:

node2trace.py, trace_load.py
tba --> taxoTraceTbl.py

rst quirkyness

i prefer .rst over .md, but there are still some issue. notably, simple hard line break is not heeded. thus, if have several lines that should read more like block, need to mark it as preformatted, which is a paragraph ending with WORD:: and then an empty line, and subsequent line as "block quote" as to be indented. the preformat ends when indent block ends. see config:: below. Qucik Ref: http://docutils.sourceforge.net/docs/user/rst/quickstart.html#preformatting-code-samples

config:

git init

git config --global user.email "[email protected]"
git config --global user.name tin
git config --global credential.helper 'cache --timeout=3600'
git config --global github.user   tin6150

git add *
git commit -m "first commit"
git remote add origin https://github.com/tin6150/pyspark.git
git push -u origin master

rst eg... end of preformat here :)

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.rst		README.rst
databrick_6.py		databrick_6.py
node2trace.py		node2trace.py
passwd2person.sh		passwd2person.sh
person.txt		person.txt
ref.txt		ref.txt
spark_1.py		spark_1.py
spark_2.py		spark_2.py
spark_5.py		spark_5.py
spark_acc2taxid.py		spark_acc2taxid.py
spark_person_1.py		spark_person_1.py
spark_person_2.py		spark_person_2.py
spark_sql3.py		spark_sql3.py
spark_sql4.py		spark_sql4.py
spark_sql_cloudera1.py		spark_sql_cloudera1.py
trace_load.py		trace_load.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

pyspark

rst quirkyness

About

Uh oh!

Releases

Packages

Languages

tin6150/pyspark

Folders and files

Latest commit

History

Repository files navigation

pyspark

rst quirkyness

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages