![:octocat: :octocat:](https://github.githubassets.com/images/icons/emoji/octocat.png)
-
Capital One AI Foundations
- New York
- https://gentawinata.com
- @gentaiscool
Highlights
- Pro
Stars
Fully open reproduction of DeepSeek-R1
Maya: An Instruction Finetuned Multilingual Multimodal Model using Aya
Grassroots Science Website
potato: portable text annotation tool
Githun Repo for “Preference Tuning with Human Feedback on Language, Speech, and Vision Tasks: A Survey”
EZswitch is a framework designed to generate code-switched text, blending two languages within a single sentence or discourse. This tool incorporates Equivalence Constraint Theory (ECT) with and (L…
WorldCuisines is an extensive multilingual and multicultural benchmark that spans 30 languages, covering a wide array of global cuisines.
RewardBench: the first evaluation tool for reward models.
MetaMetrics is a calibrated meta-metric designed to evaluate generation tasks across different modalities aligned with alignment with human preferences.
A curated list of research papers and resources on Cultural LLM.
Resources for cultural NLP research
A Python module for getting the GPU status from NVIDA GPUs using nvidia-smi programmically in Python
nahidalam / LLaVA
Forked from haotian-liu/LLaVA[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
ANSI color formatting for output in terminal
Mexican NLP 2024 Summerschool Tutorial on Knowledge Distillation and Parameter Efficient Finetuning
A Python implementation of global optimization with gaussian processes.
A library to calculate similarity scores between two collections of text sequences encoded using transformer models for bitext mining, dense retrieval, retrieval-based classification, and retrieval…
A library of translation-based text similarity measures
MTEB: Massive Text Embedding Benchmark
Implementation of ProxyLM, a scalable and efficient LM performance prediction framework on NLP task using proxy models
Generate synthetic labeled data for extremely low-resource languages using bilingual lexicons.
MINERS ⛏️: The semantic retrieval benchmark for evaluating multilingual language models. (EMNLP 2024 Findings)
A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
IndoToD: A Multi-Domain Indonesian Benchmark For End-to-End Task-Oriented Dialogue Systems