Lists (1)
Sort Name ascending (A-Z)
Starred repositories
[WIP] Resources for AI engineers. Also contains supporting materials for the book AI Engineering (Chip Huyen, 2025)
📚 Process PDFs, Word documents and more with spaCy
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
Convert PDF to markdown + JSON quickly with high accuracy
A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.
Code for "Is There a Replication Crisis in Finance" by Jensen, Kelly and Pedersen (2023)
Retrieve, Read and LinK: Fast and Accurate Entity Linking and Relation Extraction on an Academic Budget (ACL 2024)
Open-source web platform used to create live reporting dashboards from APIs, MongoDB, Firestore, MySQL, PostgreSQL, and more 📈📊
ETL, Analytics, Versioning for Unstructured Data
CLI tool to extract (meta)data from PDF and manipulate PDF files
Leveraging BERT and c-TF-IDF to create easily interpretable topics.
Top2Vec learns jointly embedded topic, document and word vectors.
Text preprocessing, representation and visualization from zero to hero.
A complete daily plan for studying to become a machine learning engineer.
Data science interview questions and answers
Distributed data engine for Python/SQL designed for the cloud, powered by Rust
Source for book "Feature Engineering A-Z"