Stars
A library that provides useful extensions to Apache Spark and PySpark.
A resource to help you become good at work 👇
The User-Community Airflow Helm Chart is the standard way to deploy Apache Airflow on Kubernetes with Helm. Originally created in 2017, it has since helped thousands of companies create production-…
AStar example with community contributions
Uniffle is a high performance, general purpose Remote Shuffle Service.
Kubetools - Curated List of Kubernetes Tools
Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
An airflow operator that executes a task in a kubernetes cluster, given a kubernetes yaml configuration or an image refrence.
Serverless patterns. Learn more at the website: https://serverlessland.com/patterns.
The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
Helper tool for generating and running BPFTrace scripts which trace and measure timings related to Linux Networking Stack, specifically SocKet Buffer contents
List of Computer Science courses with video lectures.
OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team co…
Fast web applications through dynamic, partially-stateful dataflow
🎆Interactive Online Platform that Visualizes Algorithms from Code
Neural network 3D visualization framework, build interactive and intuitive model in browsers, support pre-trained deep learning models from TensorFlow, Keras, TensorFlow.js
Data Lineage Tracking And Visualization Solution
A time-series database for high-performance real-time analytics packaged as a Postgres extension
A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
Easy CPU Profiling for Apache Spark applications
50+ DockerHub public images for Docker & Kubernetes - DevOps, CI/CD, GitHub Actions, CircleCI, Jenkins, TeamCity, Alpine, CentOS, Debian, Fedora, Ubuntu, Hadoop, Kafka, ZooKeeper, HBase, Cassandra,…
CLI collection of utilities for working with CrateDB or PostgreSQL. Benchmark queries, insert data.
Simple JVM Profiler Using StatsD and Other Metrics Backends
Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
Sampling CPU and HEAP profiler for Java featuring AsyncGetCallTrace + perf_events