Lists (1)
Sort Name ascending (A-Z)
Stars
ClickBench: a Benchmark For Analytical Databases
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
Distributed data engine for Python/SQL designed for the cloud, powered by Rust
A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.
Definition and DDLs for the OMOP Common Data Model (CDM)
Literature references for “Designing Data-Intensive Applications”
Single-binary Postgres read replica optimized for analytics
ParadeDB is a modern Elasticsearch alternative built on Postgres. Built for real-time, update-heavy workloads.
A curated list of awesome PostgreSQL software, libraries, tools and resources, inspired by awesome-mysql
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
Airflow DAGs for exporting, loading, and parsing the Ethereum blockchain data. How to get any Ethereum smart contract into BigQuery https://towardsdatascience.com/how-to-get-any-ethereum-smart-cont…
A curated list of engineering blogs
A Mozilla SpiderMonkey JavaScript engine embedded into the Python VM, using the Python engine to provide the JS host environment.
An efficient implementation of a rate limiter for asyncio.
Notes from books and other interesting things that I've read. Table of contents at the end 👇
The goal of pandas-log is to provide feedback about basic pandas operations. It provides simple wrapper functions for the most common functions that add additional logs
Rill is a tool for effortlessly transforming data sets into powerful, opinionated dashboards using SQL. BI-as-code.
Open-source cron job and background task monitoring service, written in Python & Django
Always know what to expect from your data.
dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable,…
A browser automation framework and ecosystem.
Python library for Windows Remote Management (WinRM)
Python best practices guidebook, written for humans.
Fetch and install Boot Camp ESDs with ease.