GitHub - Adriy1/pystore at 5d05157a82e4956c76ec4cab7c1127f36d704d00

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
pystore		pystore
.gitignore		.gitignore
.travis.yml		.travis.yml
CHANGELOG.rst		CHANGELOG.rst
LICENSE.txt		LICENSE.txt
MANIFEST.in		MANIFEST.in
README.rst		README.rst
requirements.txt		requirements.txt
setup.cgf		setup.cgf
setup.py		setup.py

Repository files navigation

PyStore - Datastore for timeseries data

PyStore is a simple (yet powerful) datastore for timeseries data. It's built on top of Pandas, Numpy, Dask, and Parquet (via Fastparquet), to provide an easy to use datastore for Python developers that can easily query millions of rows per second per client.

PyStore is hugely inspired by Man AHL's Arctic. I highly reommend you check it out.

Quickstart

Install PyStore

Install using pip:

$ pip install PyStore

Or upgrade using:

$ pip install PyStore --upgrade --no-cache-dir

Using PyStore

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import pystore
import quandl

# Connect to local datastore
store = pystore.Store('mydatastore')
# default path is `~/.pystore`, otherwise:
# store = pystore.Store('mydatastore', path='/usr/share/pystore')

# List existing collections
store.list_collections()

# Create a collection
store.create_collection('NASDAQ')

# Access the collection
collection = store.collection('NASDAQ')

# List items in collection
collection.list_items()

# Load some data from Quandl
aapl = quandl.get("WIKI/AAPL", authtoken="your token here")

# Store the first 100 rows of the data in the collection under "AAPL"
collection.write('AAPL', aapl[:100], metadata={'source': 'Quandl'})

# Reading the item's data
item = collection.item('AAPL')
data = item.data  # <-- Dask dataframe (see dask.pydata.org)
metadata = item.metadata
df = item.to_pandas()

# Append the rest of the rows to the "AAPL" item
collection.append('AAPL', aapl[100:])

# Reading the item's data
item = collection.item('AAPL')
data = item.data
metadata = item.metadata
df = item.to_pandas()

Concepts

PyStore provides namespaced collections of data. These collections allow bucketing data by source, user or some other metric (for example frequency: End-Of-Day; Minute Bars; etc.). Each collection (or namespace) maps to a directory containing partitioned parquet files for each item (e.g. symbol).

A good practice it to create collections that may look something like this:

collection.EOD
collection.ONEMINUTE

Known Limitation

PyStore currently only offers support for local filesystem.

I plan on adding support for Amazon S3 (via s3fs), Google Cloud Storage (via gcsfs) and Hadoop Distributed File System (via hdfs3) in the future.

Requirements

PyStore currently works with:

Python 3.5 or higher
Pandas
Numpy
Dask
Fastparquet

Tested to work on:

Linux
Unix
macOS

Acknowledgements

PyStore is hugely inspired by Man AHL's Arctic which uses MongoDB for storage and allow for versioning and other features. I highly reommend you check it out.

Contributions welcome!

License

PyStore is licensed under the GNU Lesser General Public License v2.1. A copy of which is included in LICENSE.txt.

I'm very interested in your experience with pystore. Please drop me an note with any feedback you have.

- Ran Aroussi

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PyStore - Datastore for timeseries data

Quickstart

Install PyStore

Using PyStore

Concepts

Known Limitation

Requirements

Acknowledgements

License

About

Releases

Packages

Languages

License

Adriy1/pystore

Folders and files

Latest commit

History

Repository files navigation

PyStore - Datastore for timeseries data

Quickstart

Install PyStore

Using PyStore

Concepts

Known Limitation

Requirements

Acknowledgements

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages