cahya-wirawan

💭

building a bunch of transformers based indonesian language models

Cahya Wirawan cahya-wirawan

💭

building a bunch of transformers based indonesian language models

System engineer, currently working on NLP, CV and Speech Recognition for fun and curiosity

318 followers · 47 following

Vienna, Austria
https://www.linkedin.com/in/cahyawirawan/
@CahyaWr

Achievements

x2 x2

Achievements

x2 x2

Lists (3)

Sort

GAN

Machine Learning

3 repositories

Speech Synthesis

Stars

711 results for source starred repositories

Clear filter

ALucek / linear-adapter-embedding

Query Only Linear Adapter Training for Fine Tuned Embedding Model Query Representation

Jupyter Notebook 14 2 Updated Sep 12, 2024

UKPLab / sentence-transformers

State-of-the-Art Text Embeddings

Python 16,147 2,560 Updated Mar 5, 2025

ML-GSAI / LLaDA

Official PyTorch implementation for "Large Language Diffusion Models"

Python 965 50 Updated Mar 6, 2025

Zyphra / Zonos

Zonos-v0.1 is a leading open-weight text-to-speech model trained on more than 200k hours of varied multilingual speech, delivering expressiveness and quality on par with—or even surpassing—top TTS …

Python 5,879 603 Updated Mar 5, 2025

huggingface / agents-course

This repository contains the Hugging Face Agents Course.

Jupyter Notebook 13,812 834 Updated Mar 5, 2025

jina-ai / node-DeepResearch

Keep searching, reading webpages, reasoning until it finds the answer (or exceeding the token budget)

TypeScript 3,186 295 Updated Mar 6, 2025

SalesforceAIResearch / DiffusionDPO

Code for "Diffusion Model Alignment Using Direct Preference Optimization"

Python 373 28 Updated Feb 3, 2025

QwenLM / Qwen2.5-VL

Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 8,415 591 Updated Mar 4, 2025

multimodal-art-projection / YuE

YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open

Python 4,257 454 Updated Mar 1, 2025

deepseek-ai / Janus

Janus-Series: Unified Multimodal Understanding and Generation Models

Python 16,601 2,175 Updated Feb 1, 2025

zhenye234 / X-Codec-2.0

Codec for paper: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis

Python 215 23 Updated Mar 6, 2025

deepseek-ai / DeepSeek-R1

85,217 10,990 Updated Feb 24, 2025

bytedance / LatentSync

Taming Stable Diffusion for Lip Sync!

Python 2,819 419 Updated Jan 19, 2025

s-sakti / data_indsp_news_tts

Lex 5 2 Updated Sep 17, 2022

fakerybakery / simpletts

A lightweight Python library for running TTS models with a unified API.

Python 17 1 Updated Feb 18, 2025

theodorblackbird / lina-speech

Official implementation of the TTS model Lina-Speech

Jupyter Notebook 157 12 Updated Jan 9, 2025

ronantakizawa / cacheaugmentedgeneration

A demo of Cache-Augmented Generation (CAG) in an LLM

Jupyter Notebook 44 7 Updated Jan 1, 2025

OpenBMB / MiniCPM-o

MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone

Python 18,847 1,351 Updated Mar 3, 2025

sunnynexus / Search-o1

Search-o1: Agentic Search-Enhanced Large Reasoning Models

Python 687 76 Updated Mar 4, 2025

hhhuang / CAG

Cache-Augmented Generation: A Simple, Efficient Alternative to RAG

Python 1,066 157 Updated Feb 16, 2025

fixie-ai / ultravox

A fast multimodal LLM for real-time voice

Python 3,680 265 Updated Feb 14, 2025

AnswerDotAI / ModernBERT

Bringing BERT into modernity via both architecture changes and scaling

Python 1,255 93 Updated Feb 20, 2025

THUDM / GLM-4-Voice

GLM-4-Voice | 端到端中英语音对话模型

Python 2,721 222 Updated Dec 5, 2024

microsoft / TRELLIS

Official repo for paper "Structured 3D Latents for Scalable and Versatile 3D Generation".

Python 8,187 635 Updated Dec 27, 2024

deepseek-ai / DeepSeek-VL2

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

Python 4,407 1,646 Updated Feb 26, 2025

QwenLM / Qwen2-Audio

The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.

Python 1,558 123 Updated Aug 13, 2024

X-LANCE / SLAM-LLM

Speech, Language, Audio, Music Processing with Large Language Model

Python 742 69 Updated Mar 6, 2025

microsoft / markitdown

Python tool for converting files and office documents to Markdown.

Python 39,573 1,840 Updated Mar 6, 2025

modelscope / ClearerVoice-Studio

An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.

Python 2,348 176 Updated Feb 14, 2025

dagmawibabi / ScholarXIV

ScholArxiv is an open-source, aesthetic, minimal and AI powered app that allows users to search, read, bookmark, share, download and view summaries of academic papers from the arXiv repository.

Dart 860 38 Updated Jan 1, 2025