
-
Kyoto University
- Kyoto, Japan
-
17:45
(UTC +09:00) - shyyhs.github.io
Stars
Whisperのデコーダをllm-jp-1.3b-v1.0に置き換えた音声認識モデルを学習させるためのコード
Train transformer language models with reinforcement learning.
CycleResearcher: Improving Automated Research via Automated Review
acl-org / emnlp-2025
Forked from acl-org/emnlp-2024Repository for the EMNLP 2025 conference
Code Repository for the tutorial "Connecting Ideas in Lower-Resource Scenarios: NLP for National Varieties, Creoles, and Other Low-resource Languages @ COLING 2025
EmoTa is an open-access Tamil Speech Emotion Recognition dataset with 936 utterances from 22 native speakers, covering five emotions (anger, happiness, sadness, fear, and neutrality). It supports e…
A High-Quality Multilingual Dataset for Structured Documentation Translation
Official implementations for (1) BlonDe: An Automatic Evaluation Metric for Document-level Machine Translation and (2) Discourse Centric Evaluation of Machine Translation with a Densely Annotated P…
The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
VarunGumma / fairseq
Forked from facebookresearch/fairseqFacebook AI Research Sequence-to-Sequence Toolkit written in Python.
Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ
tiktoken is a fast BPE tokeniser for use with OpenAI's models.
Streamlit — A faster way to build and share data apps.
String-to-String Algorithms for Natural Language Processing
Translation models for 22 scheduled languages of India
📋 A list of open LLMs available for commercial use.
日本語マルチタスク言語理解ベンチマーク Japanese Massive Multitask Language Understanding Benchmark