Arctic is a high performance datastore for numeric data. It supports Pandas, numpy arrays and pickled objects out-of-the-box, with pluggable support for other data types and optional versioning.
Arctic can query millions of rows per second per client, achieves ~10x compression on network bandwidth, ~10x compression on disk, and scales to hundreds of millions of rows per second per MongoDB instance.
Arctic has been under active development at Man Group since 2012.
📢📢📢 BUILDING THE NEXT GENERATION OF ARCTIC 📢📢📢
This will offer the same intuitive Python-centric API whilst utilizing a custom C++ storage engine and modern S3 compatible object storage to provide a timeseries database that is:
- Fast: Capable of processing billions of rows in seconds
- Flexible: Designed to handle complex real-world datasets
- Familiar: Built for the modern Python Data Science ecosystem - Pandas In/Pandas Out!
For more information, please contact us at [email protected].
pip install git+https://github.com/man-group/arctic.git
mongod --dbpath <path/to/db_directory>
from arctic import Arctic import quandl
store = Arctic('localhost')
store.initialize_library('NASDAQ')
library = store['NASDAQ']
aapl = quandl.get("WIKI/AAPL", authtoken="your token here")
library.write('AAPL', aapl, metadata={'source': 'Quandl'})
item = library.read('AAPL') aapl = item.data metadata = item.metadata
VersionStore supports much more: See the HowTo!
Plugging a custom class in as a library type is straightforward. This example shows how.
You can find complete documentation at Arctic docs
Arctic provides namespaced libraries of data. These libraries allow bucketing data by source, user or some other metric (for example frequency: End-Of-Day; Minute Bars; etc.).
Arctic supports multiple data libraries per user. A user (or namespace) maps to a MongoDB database (the granularity of mongo authentication). The library itself is composed of a number of collections within the database. Libraries look like:
- user.EOD
- user.ONEMINUTE
A library is mapped to a Python class. All library databases in MongoDB are prefixed with 'arctic_'
Arctic includes three storage engines:
- VersionStore: a key-value versioned TimeSeries store. It supports:
- Pandas data types (other Python types pickled)
- Multiple versions of each data item. Can easily read previous versions.
- Create point-in-time snapshots across symbols in a library
- Soft quota support
- Hooks for persisting other data types
- Audited writes: API for saving metadata and data before and after a write.
- a wide range of TimeSeries data frequencies: End-Of-Day to Minute bars
- See the HowTo
- Documentation
- TickStore: Column oriented tick database. Supports dynamic fields, chunks aren't versioned. Designed for large continuously ticking data.
- Chunkstore: A storage type that allows data to be stored in customizable chunk sizes. Chunks aren't versioned, and can be appended to and updated in place.
Arctic storage implementations are pluggable. VersionStore is the default.
Arctic currently works with:
- Python 3.6, 3.7
- pymongo >= 3.6.0 <= 3.11.0
- Pandas >= 0.22.0 <= 1.0.3
- MongoDB >= 2.4.x <= 4.2.8
Operating Systems:
- Linux
- macOS
- Windows 10
Arctic has been under active development at Man Group since 2012.
It wouldn't be possible without the work of the Man Data Engineering Team including:
- Richard Bounds
- James Blackburn
- Vlad Mereuta
- Tom Taylor
- Tope Olukemi
- Drake Siard
- Slavi Marinov
- Wilfred Hughes
- Edward Easton
- Bryant Moscon
- Dimosthenis Pediaditakis
- Shashank Khare
- Duncan Kerr
- ... and many others ...
Contributions welcome!
Arctic is licensed under the GNU LGPL v2.1. A copy of which is included in LICENSE