Stars
Apache Nemo (Incubating) - Data Processing System for Flexible Employment With Different Deployment Characteristics
Resource scheduling and cluster management for AI
Best Practices on Recommendation Systems
An open source ML system for the end-to-end data science lifecycle
version your SQL schemas with git + automatically migrate them
A Ruby Gem to detect under what license a project is distributed.
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials,…
A cluster computing framework for processing large-scale geospatial data
This is my site. There are many like it, but this one is mine.
YugabyteDB - the cloud native distributed SQL database for mission-critical applications.
A curated list of data engineering tools for software developers
A command-line tool to generate, analyze, convert and manipulate colors
Interactive and Reactive Data Science using Scala and Spark.
bamboolib - a GUI for pandas DataFrames
📝 An awesome Data Science repository to learn and apply for real world problems.
Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架
🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
bootOS is a monolithic operating system in 512 bytes of x86 machine code.
Knack - A Python command line interface framework
Create *beautiful* command-line interfaces with Python
Library for building powerful interactive command line applications in Python