Stars
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper
System Design, Solution Architecture, Data Systems Practice
A series of DAGs/Workflows to help maintain the operation of Airflow
An orchestration platform for the development, production, and observation of data assets.
Data product portal created by Dataminded
An Awesome List of Open-Source Data Engineering Projects
Open Source Feature Flagging and A/B Testing Platform
Machine Learning Toolkit for Kubernetes
Kubernetes-native platform to run massively parallel data/streaming jobs
AWS Lambda Power Tuning is an open-source tool that can help you visualize and fine-tune the memory/power configuration of Lambda functions. It runs in your own AWS account - powered by AWS Step Fu…
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Generate an ERD from your dbt project model config.
The data-validation toolkit for enhanced dbt (data build tool) PR review
Chronon is a data platform for serving for AI/ML applications.
Efficient data transformation and modeling framework that is backwards compatible with dbt.
Container runtimes on macOS (and Linux) with minimal setup
Package to assert rows in-line with dbt macros.
data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
This is a repo with links to everything you'd ever want to learn about data engineering
All the resources you need to get to Senior Engineer and beyond
Self-serve BI to 10x your data team ⚡️
A dbt SQL package for ensuring documentation and test coverage, with granular control.
This dbt package contains macros to support unit testing that can be (re)used across dbt projects.
Dynamically generate Apache Airflow DAGs from YAML configuration files