Stars
Ongoing research training transformer models at scale
Python implementations of contextual bandits algorithms
Sparsity-aware deep learning inference runtime for CPUs
A memory balanced and communication efficient FullyConnected layer with CrossEntropyLoss model parallel implementation in PyTorch
World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.
🤖 Open-source GenBI AI Agent that empowers data-driven teams to chat with their data to generate Text-to-SQL, charts, spreadsheets, reports, and BI. 📈📊📋🧑💻
Graphic notes on Gilbert Strang's "Linear Algebra for Everyone"
Everything that makes working with databases easier
Awesome deals on Black Friday: Apps, SaaS, Books, Courses, etc.
An opinionated, actionable guide for software engineering interviews.
A booklet on machine learning systems design with exercises. NOT the repo for the book "Designing Machine Learning Systems"
It is an efficient, flexible, and lightweight goroutine pool. It provides an easy way to deal with concurrent tasks with limited resource.
Awesome Deep Learning papers for industrial Search, Recommendation and Advertisement. They focus on Embedding, Matching, Ranking (CTR/CVR prediction), Post Ranking, Large Model (Generative Recommen…
⭐️ Companies that don't have a broken hiring process
This is a Tensor Train based compression library to compress sparse embedding tables used in large-scale machine learning models such as recommendation and natural language processing. We showed th…
Xournal++ is a handwriting notetaking software with PDF annotation support. Written in C++ with GTK3, supporting Linux (e.g. Ubuntu, Debian, Arch, SUSE), macOS and Windows 10. Supports pen input fr…
Python Sorted Container Types: Sorted List, Sorted Dict, and Sorted Set
A complete computer science study plan to become a software engineer.
python实现GBDT的回归、二分类以及多分类,将算法流程详情进行展示解读并可视化,庖丁解牛地理解GBDT。Gradient Boosting Decision Trees regression, dichotomy and multi-classification are realized based on python, and the details of algorithm flo…
Implementation of Factorization Machines on Spark using parallel stochastic gradient descent (python and scala)
Interactive examples of common C# concurrency patterns using channels.
Accelerating Inference for Recommendation Systems (WSDM'21)
Read and write Tensorflow TFRecord data from Apache Spark.
PySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2
HugeCTR is a high efficiency GPU framework designed for Click-Through-Rate (CTR) estimating training