Stars
InstructLab Training Library - Efficient Fine-Tuning with Message-Format Data
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
A Strict JSON Framework for LLM Outputs
Towards Human-Friendly, Fast Learning and Adaptable Agent Communities
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
This is a public repository to go over all the LLM-driven data engineering concepts.
A standard framework for modelling Deep Learning Models for tabular data
π Collection of Kaggle Solutions and Ideas π
Turns Data and AI algorithms into production-ready web applications in no time.
A light-weight, flexible, and expressive statistical data testing library
Data validation using Python type hints
The pytest framework makes it easy to write small tests, yet scales to support complex functional testing
Typer, build great CLIs. Easy to code. Based on Python type hints.
Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.
Dataframes powered by a multithreaded, vectorized query engine, written in Rust
Generate deterministic fake values: The same input will always generate the same fake-output.
Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
TimesFM (Time Series Foundation Model) is a pretrained time-series foundation model developed by Google Research for time-series forecasting.
Up to 200x Faster Dot Products & Similarity Metrics β for Python, Rust, C, JS, and Swift, supporting f64, f32, f16 real & complex, i8, and bit vectors using SIMD for both AVX2, AVX-512, NEON, SVE, β¦
An open-source remote desktop application designed for self-hosting, as an alternative to TeamViewer.
DuckDB is an analytical in-process SQL database management system
Your AI second brain. Self-hostable. Get answers from the web or your docs. Build custom agents, schedule automations, do deep research. Turn any online or local LLM into your personal, autonomous β¦