Skip to content
View ghbacct's full-sized avatar

Block or report ghbacct

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

An AI Hedge Fund Team

Python 4,234 629 Updated Dec 25, 2024

[WIP] Resources for AI engineers. Also contains supporting materials for the book AI Engineering (Chip Huyen, 2025)

1,354 156 Updated Dec 16, 2024

On-device intelligence.

Python 209 12 Updated Sep 11, 2024

📚 Process PDFs, Word documents and more with spaCy

Python 261 12 Updated Dec 24, 2024

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

Python 6,051 550 Updated Dec 24, 2024

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.

Python 6,956 687 Updated Dec 16, 2024

Convert PDF to markdown + JSON quickly with high accuracy

Python 18,818 1,088 Updated Dec 20, 2024

A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.

Python 1,193 65 Updated Dec 10, 2024

Code for "Is There a Replication Crisis in Finance" by Jensen, Kelly and Pedersen (2023)

R 260 122 Updated Nov 15, 2024
Jupyter Notebook 477 80 Updated Aug 22, 2023

Retrieve, Read and LinK: Fast and Accurate Entity Linking and Relation Extraction on an Academic Budget (ACL 2024)

Python 359 22 Updated Oct 1, 2024

Open-source web platform used to create live reporting dashboards from APIs, MongoDB, Firestore, MySQL, PostgreSQL, and more 📈📊

JavaScript 2,701 327 Updated Dec 23, 2024

Blazing-fast Data-Wrangling toolkit

Rust 2,565 74 Updated Dec 26, 2024

Maestro: Netflix’s Workflow Orchestrator

Java 3,351 203 Updated Aug 9, 2024

ETL, Analytics, Versioning for Unstructured Data

Python 2,135 94 Updated Dec 26, 2024

CLI tool to extract (meta)data from PDF and manipulate PDF files

Python 114 18 Updated Dec 22, 2024

Leveraging BERT and c-TF-IDF to create easily interpretable topics.

Python 6,267 773 Updated Dec 18, 2024

Top2Vec learns jointly embedded topic, document and word vectors.

Python 2,965 374 Updated Nov 14, 2024

Text preprocessing, representation and visualization from zero to hero.

Python 2,896 239 Updated Aug 29, 2023
TypeScript 267 3,732 Updated Nov 22, 2024

A textual TUI for Prodigy

CSS 14 1 Updated Jun 8, 2023

A complete daily plan for studying to become a machine learning engineer.

28,241 6,200 Updated Jun 11, 2024

100 Days of ML Coding

45,890 10,686 Updated Dec 29, 2023

Data science interview questions and answers

HTML 9,049 1,997 Updated Sep 5, 2024
Jupyter Notebook 2,244 283 Updated Mar 26, 2024

TheBloke's Dockerfiles

Shell 300 59 Updated Mar 8, 2024

AICI: Prompts as (Wasm) Programs

Rust 1,973 78 Updated Nov 10, 2024

Distributed data engine for Python/SQL designed for the cloud, powered by Rust

Rust 2,442 174 Updated Dec 25, 2024

Draw datasets from within Jupyter.

JavaScript 814 81 Updated Dec 2, 2024

Source for book "Feature Engineering A-Z"

HTML 120 8 Updated Dec 23, 2024
Next