Lists (1)
Sort Name ascending (A-Z)
Stars
🪄 Create rich visualizations with AI
📊 Cube — Universal semantic layer platform for AI, BI, spreadsheets, and embedded analytics
A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning
A curated list of awesome Machine Learning frameworks, libraries and software.
🏆 A ranked list of awesome machine learning Python libraries. Updated weekly.
This is a repo with links to everything you'd ever want to learn about data engineering
Implementing the 4 agentic patterns from scratch
This repository provides tutorials and implementations for various Generative AI Agent techniques, from basic to advanced. It serves as a comprehensive guide for building intelligent, interactive A…
NVIDIA Ingest is an early access set of microservices for parsing hundreds of thousands of complex, messy unstructured PDFs and other enterprise documents into metadata and text to embed into retri…
This package contains macros and models to find DAG issues automatically
Using a pre-commit hook, Talisman validates the outgoing changeset for things that look suspicious — such as tokens, passwords, and private keys.
Free, simple, and intuitive online database diagram editor and SQL generator.
Code for "Efficient Data Processing in Spark" Course
A tool for exploring each layer in a docker image
A curated list of awesome blogs, videos, tools and resources about Data Contracts
Open, Multi-modal Catalog for Data & AI
Understanding Deep Learning - Simon J.D. Prince
Shows how the CFT modules can be composed to build a secure cloud foundation
Deploys a secured BigQuery data warehouse
A collection of learning resources for curious software engineers
1️⃣🐝🏎️ The One Billion Row Challenge -- A fun exploration of how quickly 1B rows from a text file can be aggregated with Java
Dataform is a framework for managing SQL based data operations in BigQuery
This Dataform project processes various marketing data sources and creates a Marketing Data Store (MDS) to be used in several use cases: a)retain historical marketing data; b)create high performanc…
📚 Tech blogs & talks by companies that run Apache Flink in production
A comprehensive list of books on Software Architecture.
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
Data Engineering Zoomcamp is a free nine-week course that covers the fundamentals of data engineering.
A data engineering project with Kafka, Spark Streaming, dbt, Docker, Airflow, Terraform, GCP and much more!