Stars
Pillars of Grammatical Error Correction: Comprehensive Inspection Of Contemporary Approaches In The Era of Large Language Models
The repository of EMNLP 2023 "CLEME: Debiasing Multi-reference Evaluation for Grammatical Error Correction"
MultiGEC-2025 shared task website, results and scripts.
The repository of EMNLP 2023 "MixEdit: Revisiting Data Augmentation and Beyond for Grammatical Error Correction"
A Code System for Grammar Error Correction Method. Code Repo for ACL 24 Main "Detection-Correction Structure via General Language Model for Grammatical Error Correction"
Customizing Judaeo-Arabic to Arabic transliteration to Princeton Geniza data needs
Automated backups of tabular data from Princeton Geniza Project
DSPy: The framework for programming—not prompting—language models
A blazing fast inference solution for text embeddings models
Large Language Model Text Generation Inference
Utilities intended for use with Llama models.
Train transformer language models with reinforcement learning.
String-to-String Algorithms for Natural Language Processing
Official repository of the paper "mEdIT: Multilingual Text Editing via Instruction Tuning" (NAACL 2024)
Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
Generalist and Lightweight Model for Named Entity Recognition (Extract any entity types from texts) @ NAACL 2024
The implementation of "Mitigating Hallucinations and Off-target Machine Translation with Source-Contrastive and Language-Contrastive Decoding"
TSAR2022 Shared Task on Lexical Simplification - Datasets and Evaluation scripts
This repository contains the code for "BERTRAM: Improved Word Embeddings Have Big Impact on Contextualized Representations".
[Kauf & Ivanova, ACL 2023] A Better Way to Do Masked Language Model Scoring
Code accompanying "How I learned to start worrying about prompt formatting".