Stars
Unsupervised text tokenizer for Neural Network-based text generation.
An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.
CHisIEC An Information Extraction Corpus for Ancient Chinese History
800,000 step-level correctness labels on LLM solutions to MATH problems
[NeurIPS 2024] Goldfish Loss: Mitigating Memorization in Generative LLMs
A data set based on all arXiv publications, pre-processed for NLP, including structured full-text and citation network
An implementation of "Fair Attribute Completion on Graph with Missing Attributes" paper. Accepted TMLR
Source code for the paper "CAT: Interpretable Concept-based Taylor Additive Models".
[NeurIPS 2024 Oral] Aligner: Efficient Alignment by Learning to Correct
The repo for the article: CSPG: Crossing Sparse Proximity Graphs for Approximate Nearest Neighbor Search
Text of the Dhammapadi (Pali language) with Latin translation
Repository for Fine-grained Contrastive Learning for Relation Extraction
Alphabetical list of free/public domain datasets with text data for use in Natural Language Processing (NLP)
[NAACL 2021] A Frustratingly Easy Approach for Entity and Relation Extraction https://arxiv.org/abs/2010.12812
This repository implements our EMNLP 2022 research paper A Dataset for Hyper-Relational Extraction and a Cube-Filling Approach.
[KDD'2024] "LLM4Graph: A Survey of Large Language Models for Graphs"
[KDD'2024] "HiGPT: Heterogenous Graph Language Models"
[ICLR 2023] Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners
Official implementation of paper "Autonomous Data Selection with Language Models for Mathematical Texts" (As Huggingface Daily Papers: https://huggingface.co/papers/2402.07625)
PAC-Bayesian Generalization Bounds for Knowledge Graph Representation Learning (ICML 2024)
Learning from Negative samples for Biomedical Generative Entity Linking