Skip to content
View donisury21's full-sized avatar

Block or report donisury21

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

Java 10,735 3,069 Updated Jan 22, 2025

Python scraper based on AI

Python 17,352 1,456 Updated Jan 22, 2025

🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.

TypeScript 22,305 1,792 Updated Jan 23, 2025

🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper

Python 26,839 2,103 Updated Jan 22, 2025

System Design, Solution Architecture, Data Systems Practice

35 6 Updated Dec 21, 2024

A series of DAGs/Workflows to help maintain the operation of Airflow

Python 1,697 399 Updated Jun 18, 2024

An orchestration platform for the development, production, and observation of data assets.

Python 12,349 1,553 Updated Jan 23, 2025

Data product portal created by Dataminded

TypeScript 172 32 Updated Jan 22, 2025

An Awesome List of Open-Source Data Engineering Projects

2,250 373 Updated Oct 4, 2024

Open Source Feature Flagging and A/B Testing Platform

TypeScript 6,314 526 Updated Jan 23, 2025

S3 vector database for LLM Agents and RAG.

Python 35 4 Updated Aug 15, 2023

Machine Learning Toolkit for Kubernetes

TypeScript 14,566 2,442 Updated Nov 26, 2024

Kubernetes-native platform to run massively parallel data/streaming jobs

Go 1,761 121 Updated Jan 23, 2025

AWS Lambda Power Tuning is an open-source tool that can help you visualize and fine-tune the memory/power configuration of Lambda functions. It runs in your own AWS account - powered by AWS Step Fu…

JavaScript 5,561 384 Updated Jan 2, 2025

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

Python 38,404 14,554 Updated Jan 23, 2025

Generate an ERD from your dbt project model config.

Python 21 4 Updated Aug 18, 2024

The data-validation toolkit for enhanced dbt (data build tool) PR review

TypeScript 300 7 Updated Jan 22, 2025

Chronon is a data platform for serving for AI/ML applications.

Scala 765 59 Updated Jan 23, 2025

Efficient data transformation and modeling framework that is backwards compatible with dbt.

Python 1,980 177 Updated Jan 23, 2025

🦉 Data Versioning and ML Experiments

Python 14,103 1,195 Updated Jan 21, 2025

Container runtimes on macOS (and Linux) with minimal setup

Go 20,704 415 Updated Jan 9, 2025

Package to assert rows in-line with dbt macros.

66 8 Updated Nov 15, 2024

data load tool (dlt) is an open source Python library that makes data loading easy 🛠️

Python 2,995 199 Updated Jan 22, 2025

This is a repo with links to everything you'd ever want to learn about data engineering

Jupyter Notebook 25,051 4,882 Updated Jan 6, 2025

All the resources you need to get to Senior Engineer and beyond

14,118 1,270 Updated Dec 31, 2024

Self-serve BI to 10x your data team ⚡️

TypeScript 4,236 465 Updated Jan 22, 2025

A dbt SQL package for ensuring documentation and test coverage, with granular control.

SQL 119 14 Updated Nov 18, 2022

This dbt package contains macros to support unit testing that can be (re)used across dbt projects.

Shell 430 79 Updated Jul 23, 2024

Dynamically generate Apache Airflow DAGs from YAML configuration files

Python 1,235 184 Updated Jan 17, 2025

Making DAG construction easier

Python 252 11 Updated Jan 15, 2025
Next