Stars
Embeddable property graph database management system built for query speed and scalability. Implements Cypher.
DiceDB is an open-source in-memory database with query subscriptions.
LETSQL is a deferred compute system focused on smart composition of AI pipelines. Optimize performance with cross-engine caching and static planning. Easily go from research to production with port…
QuestDB is a high performance, open-source, time-series database
The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance …
Apache Doris is an easy-to-use, high performance and unified analytics database.
Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
These are the best resources for System Design on the Internet
📟 Manage your monitors, alerts and notifications in OpenSearch Dashboards
📟 Get notified when your data meets certain conditions by setting up monitors, alerts, and notifications
Manage your detectors and identify atypical data in OpenSearch Dashboards
Identify atypical data and receive automatic notifications
🔐 Manage your internal users, roles, access control, and audit logs from OpenSearch Dashboards
🔐 Secure your cluster with TLS, numerous authentication backends, data masking, audit logging as well as role-based access control on indices, documents, and fields
🔎 Open source distributed and RESTful search engine.
Confluent Schema Registry for Kafka
Spark: The Definitive Guide's Code Repository
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
☸️ A community repository for Helm Charts of OpenSearch Project.
Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, Du…
Feathr – A scalable, unified data and AI engineering platform for enterprise
A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and bat…
Open source platform for the machine learning lifecycle
Redpanda is a streaming data platform for developers. Kafka API compatible. 10x faster. No ZooKeeper. No JVM!
yq is a portable command-line YAML, JSON, XML, CSV, TOML and properties processor
ThirdEye is an integrated tool for realtime monitoring of time series and interactive root-cause analysis. It enables anyone inside an organization to collaborate on effective identification and an…
A composable and fully extensible C++ execution engine library for data management systems.