NLP 💫
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
StereoSet: Measuring stereotypical bias in pretrained language models
BLUE benchmark consists of five different biomedicine text-mining tasks with ten corpora.
Handwriting Synthesis with RNNs ✏️
Medical Question Answering Dataset of 47,457 QA pairs created from 12 NIH websites
Library for fast text representation and classification.
Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
Large medical text dataset curated for abbreviation disambiguation, designed for natural language understanding pre-training in the medical domain
PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO
[ACL 2021] Learning Dense Representations of Phrases at Scale; EMNLP'2021: Phrase Retrieval Learns Passage Retrieval, Too https://arxiv.org/abs/2012.12624
A library for parameter-efficient and composable transfer learning for NLP with sparse fine-tunings.
Generalist and Lightweight Model for Named Entity Recognition (Extract any entity types from texts) @ NAACL 2024
中文分词 词性标注 命名实体识别 依存句法分析 成分句法分析 语义依存分析 语义角色标注 指代消解 风格转换 语义相似度 新词发现 关键词短语提取 自动摘要 文本分类聚类 拼音简繁转换 自然语言处理
📺 Discover the latest machine learning / AI courses on YouTube.
Master list of curated resources on NLP and LLMs
NLP Projects playlist
中文 NLP 预处理、解析工具包,准确、高效、易用 A Chinese NLP Preprocessing & Parsing Package www.jionlp.com
A Python multilingual toolkit for Sentiment Analysis and Social NLP tasks
🦙 Integrating LLMs into structured NLP pipelines