GitHub - TaskForceX/pydruid at 5a19cc104440375a09f5cd2a9af18b7356cbb5fe

Name	Name	Last commit message	Last commit date
Latest commit History 61 Commits
docs	docs
pydruid	pydruid
.gitignore	.gitignore
CHANGES.txt	CHANGES.txt
LICENSE	LICENSE
MANIFEST	MANIFEST
MANIFEST.in	MANIFEST.in
README.md	README.md
setup.py	setup.py

Name

Last commit message

Last commit date

#pydruid pydruid exposes a simple API to create, execute, and analyze Druid queries. pydruid can parse query results into Pandas DataFrame objects for subsequent data analysis -- this offers a tight integration between Druid, the SciPy stack (for scientific computing) and scikit-learn (for machine learning). Additionally, pydruid can export query results into TSV or JSON for further processing with your favorite tool, e.g., R, Julia, Matlab, Excel.

#setup

#documentation

#examples

The following exampes show how to execute and analyze the results of three types of queries:timeseries, topN, and groupby. We analyze the twitter data set

timeseries query

What was the average tweet length, per day, surrounding the 2014 Sochi olympics?

from pydruid.client import *
from pylab import plt

query = PyDruid(bard_url_goes_here, 'druid/v2')

ts = query.timeseries(
    datasource='twitterstream',
    granularity='day',
    intervals='2014-02-02/p4w',
    aggregations={'length': doublesum('tweet_length'), 'count': doublesum('count')},
    post_aggregations={'avg_tweet_length': (Field('length') / Field('count'))},
    filter=Dimension('first_hashtag') == 'sochi2014'
)
df = query.export_pandas()
df['timestamp'] = df['timestamp'].map(lambda x: x.split('T')[0])
df.plot(x='timestamp', y='avg_tweet_length', ylim=(80, 140), rot=20,
        title='Sochi 2014')
plt.ylabel('avg tweet length (chars)')
plt.show()

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

timeseries query

About

Uh oh!

Releases

Packages

Languages

License

TaskForceX/pydruid

Folders and files

Latest commit

History

Repository files navigation

timeseries query

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages