Name	Name	Last commit message	Last commit date
Latest commit History 18 Commits
tempo	tempo
tests	tests
.gitignore	.gitignore
CONTRIBUTING.md	CONTRIBUTING.md
LICENSE	LICENSE
NOTICE	NOTICE
README.md	README.md
requirements.txt	requirements.txt
setup.py	setup.py

Name

Last commit message

Last commit date

tempo

Tempo - Time Series Utilities for Data Teams Using Databricks

Standard Project Template for Databricks Labs Projects

Project Description

The purpose of this project is to provide easier ways to perform machine learning experiments, ETL, and ad-hoc analytics on time series within Databricks using Apache Spark. This includes data parallel and model parallel use cases encountered across the field.

Project Support

Please note that all projects in the /databrickslabs github account are provided for your exploration only, and are not formally supported by Databricks with Service Level Agreements (SLAs). They are provided AS-IS and we do not make any guarantees of any kind. Please do not submit a support ticket relating to any issues arising from the use of these projects.

Any issues discovered through the use of this project should be filed as GitHub Issues on the Repo. They will be reviewed as time permits, but there are no formal SLAs for support.

Project Setup

After cloning the repo, it is highly advised that you create a virtual environment to isolate and manage packages for this project, like so:

python -m venv <path to project root>/venv

You can then install the required modules via pip:

pip install requirements.txt

Building the Project

Once in the main project folder, build into a wheel using the following command:

python setup.py bdist_wheel

Deploying / Installing the Project

For installation in a Databricks notebook, you'll need to upload to the FileStore via UI (or directly). If uploading via the UI, you may need to rename with the commands below. Also below is the command to install the wheel into the notebook scope:

%fs cp /FileStore/tables/tca_0_1_py3_none_any-1f645.whl /FileStore/tables/tca-0.1-py3-none-any.whl

dbutils.library.install("/FileStore/tables/tca-0.1-py3-none-any.whl") # Library dbutils.library.restartPython()

Releasing the Project

Instructions for how to release a version of the project

Using the Project

from tca.base import newBaseTs 

base_trades = newBaseTs(skewTrades)
normal_asof_result = base_trades.asofJoin(skewQuotes,partitionCols = ["symbol"])
normal_asof_result.select("EVENT_TS_left").distinct().count()

About

The purpose of this project is to provide an API for manipulating time series on top of Apache Spark. Functionality includes featurization using lagged time values, rolling statistics (mean, avg, sum, count, etc), AS OF joins, and downsampling & interpolation. This has been tested on TB-scale of historical data and is unit tested for quality pur…

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tempo - Time Series Utilities for Data Teams Using Databricks

Project Description

Project Support

Project Setup

Building the Project

Deploying / Installing the Project

Releasing the Project

Using the Project

About

Releases

Packages

Languages

License

trainorpj/tempo

Folders and files

Latest commit

History

Repository files navigation

Tempo - Time Series Utilities for Data Teams Using Databricks

Project Description

Project Support

Project Setup

Building the Project

Deploying / Installing the Project

Releasing the Project

Using the Project

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages