Skip to content
View leongkui's full-sized avatar

Block or report leongkui

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
35 stars written in Scala
Clear filter

Source code for Twitter's Recommendation Algorithm

Scala 62,977 12,166 Updated Jul 10, 2024

CMAK is a tool for managing Apache Kafka clusters

Scala 11,881 2,508 Updated Aug 2, 2023

Removes large or troublesome blobs like git-filter-branch does, but faster. And written in Scala

Scala 11,360 552 Updated Jan 19, 2025

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs

Scala 7,858 1,792 Updated Mar 7, 2025

The leader in Next-Generation Customer Data Infrastructure

Scala 6,893 1,190 Updated Mar 5, 2025

Arnold Schwarzenegger based programming language

Scala 6,756 292 Updated Jan 31, 2024

Apache OpenWhisk is an open source serverless cloud platform

Scala 6,619 1,172 Updated Jan 21, 2025

A machine learning package built for humans.

Scala 4,792 562 Updated Sep 23, 2024

Fault tolerant job scheduler for Mesos which handles dependencies and ISO8601 based schedules

Scala 4,383 526 Updated Jun 29, 2022

Deploy and manage containers (including Docker) on top of Apache Mesos at scale.

Scala 4,060 839 Updated Sep 8, 2022

A Scala API for Cascading

Scala 3,514 706 Updated May 28, 2023

Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.

Scala 3,372 552 Updated Mar 5, 2025

A distributed, fault-tolerant graph database

Scala 3,338 257 Updated Mar 16, 2017

Streaming MapReduce with Scalding and Storm

Scala 2,136 266 Updated Jan 19, 2022

TextTeaser is an automatic summarization algorithm.

Scala 1,976 252 Updated Feb 7, 2018

DataStax Connector for Apache Spark to Apache Cassandra

Scala 1,943 920 Updated Jan 20, 2025

A Scala kernel for Jupyter

Scala 1,610 248 Updated Mar 6, 2025

Base classes to use when writing tests with Spark

Scala 1,526 355 Updated Jan 14, 2025

MLeap: Deploy ML Pipelines to Production

Scala 1,516 314 Updated Nov 27, 2024

Distributed Prometheus time series database

Scala 1,433 230 Updated Mar 8, 2025

Build highly concurrent, distributed, and resilient message-driven applications using Java/Scala

Scala 1,313 159 Updated Mar 9, 2025

KillrWeather is a reference application (work in progress) showing how to easily integrate streaming and batch data processing with Apache Spark Streaming, Apache Cassandra, Apache Kafka and Akka f…

Scala 1,182 395 Updated Jan 5, 2017

The software used to extract structured data from Wikipedia

Scala 888 277 Updated Feb 19, 2025

[DEPRECATED] Tensorflow wrapper for DataFrames on Apache Spark

Scala 748 161 Updated Jul 30, 2024

Mirror of Apache Toree (Incubating)

Scala 741 224 Updated Feb 20, 2025

Simplifying robust end-to-end machine learning on Apache Spark.

Scala 470 117 Updated Apr 18, 2017

Manage your Kafka ACL at scale

Scala 368 161 Updated Feb 1, 2024

The Nak Machine Learning Library

Scala 341 83 Updated Jul 18, 2017

ScienceWorld is a text-based virtual environment centered around accomplishing tasks from the standardized elementary science curriculum.

Scala 244 26 Updated Oct 16, 2024

Self-contained examples of Apache Spark streaming integrated with Apache Kafka.

Scala 199 126 Updated Apr 15, 2018
Next