- Vienna, Austria
- https://www.linkedin.com/in/cahyawirawan/
- @CahyaWr
Lists (3)
Sort Name ascending (A-Z)
Stars
Query Only Linear Adapter Training for Fine Tuned Embedding Model Query Representation
State-of-the-Art Text Embeddings
Official PyTorch implementation for "Large Language Diffusion Models"
Zonos-v0.1 is a leading open-weight text-to-speech model trained on more than 200k hours of varied multilingual speech, delivering expressiveness and quality on par with—or even surpassing—top TTS …
This repository contains the Hugging Face Agents Course.
Keep searching, reading webpages, reasoning until it finds the answer (or exceeding the token budget)
Code for "Diffusion Model Alignment Using Direct Preference Optimization"
Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open
Janus-Series: Unified Multimodal Understanding and Generation Models
malaysia-ai / StyleTTS2-MS
Forked from yl4579/StyleTTS2StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Codec for paper: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis
A lightweight Python library for running TTS models with a unified API.
Official implementation of the TTS model Lina-Speech
A demo of Cache-Augmented Generation (CAG) in an LLM
MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone
Search-o1: Agentic Search-Enhanced Large Reasoning Models
Cache-Augmented Generation: A Simple, Efficient Alternative to RAG
Bringing BERT into modernity via both architecture changes and scaling
Official repo for paper "Structured 3D Latents for Scalable and Versatile 3D Generation".
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
Lollipop / Qwen2-Audio
Forked from QwenLM/Qwen2-AudioThe official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.
The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.
Speech, Language, Audio, Music Processing with Large Language Model
Python tool for converting files and office documents to Markdown.