Skip to content
View shyyhs's full-sized avatar
:octocat:
Focusing
:octocat:
Focusing

Organizations

@NLPforCOVID-19

Block or report shyyhs

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

POT : Python Optimal Transport

Python 2,517 515 Updated Mar 25, 2025

Gromov-Wasserstein Alignment of Embeddings

Python 65 14 Updated Sep 23, 2021

Releases from OpenAI Preparedness

Python 659 58 Updated Apr 11, 2025

Whisperのデコーダをllm-jp-1.3b-v1.0に置き換えた音声認識モデルを学習させるためのコード

Python 8 Updated Sep 7, 2024

Train transformer language models with reinforcement learning.

Python 13,167 1,793 Updated Apr 12, 2025

CycleResearcher: Improving Automated Research via Automated Review

Jupyter Notebook 139 9 Updated Apr 6, 2025

🙌 OpenHands: Code Less, Make More

Python 52,778 5,864 Updated Apr 13, 2025

Repository for the EMNLP 2025 conference

HTML 1 3 Updated Apr 9, 2025

Code Repository for the tutorial "Connecting Ideas in Lower-Resource Scenarios: NLP for National Varieties, Creoles, and Other Low-resource Languages @ COLING 2025

Jupyter Notebook 6 Updated Jan 20, 2025

EmoTa is an open-access Tamil Speech Emotion Recognition dataset with 936 utterances from 22 native speakers, covering five emotions (anger, happiness, sadness, fear, and neutrality). It supports e…

8 Updated Apr 11, 2025

A High-Quality Multilingual Dataset for Structured Documentation Translation

Python 36 7 Updated Jul 9, 2024

Official implementations for (1) BlonDe: An Automatic Evaluation Metric for Document-level Machine Translation and (2) Discourse Centric Evaluation of Machine Translation with a Densely Annotated P…

Python 77 10 Updated Sep 21, 2023

The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.

Python 8,386 525 Updated May 3, 2024

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Python 24 4 Updated Mar 31, 2025

A library for minimum Bayes risk (MBR) decoding

Python 37 7 Updated Apr 10, 2025

Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ

Python 897 39 Updated Apr 10, 2025

tiktoken is a fast BPE tokeniser for use with OpenAI's models.

Python 14,113 1,004 Updated Mar 17, 2025
C++ 817 118 Updated May 24, 2023

Compact Language Detector 2

C++ 857 131 Updated May 22, 2021
Python 15 2 Updated Mar 19, 2025

Streamlit — A faster way to build and share data apps.

Python 38,779 3,381 Updated Apr 13, 2025

String-to-String Algorithms for Natural Language Processing

Jupyter Notebook 542 28 Updated Jul 26, 2024

Go ahead and axolotl questions

Python 9,075 989 Updated Apr 13, 2025

Translation models for 22 scheduled languages of India

Python 300 81 Updated Mar 16, 2025
Python 11 Updated Apr 2, 2024
Jupyter Notebook 9,500 663 Updated Jul 29, 2024

The FLORES+ Machine Translation Benchmark

101 15 Updated Nov 12, 2024

📋 A list of open LLMs available for commercial use.

11,902 837 Updated Feb 13, 2025

日本語マルチタスク言語理解ベンチマーク Japanese Massive Multitask Language Understanding Benchmark

33 2 Updated Dec 16, 2024
Next