- Vienna, Austria
- https://www.linkedin.com/in/cahyawirawan/
- @CahyaWr
Lists (3)
Sort Name ascending (A-Z)
Stars
Query Only Linear Adapter Training for Fine Tuned Embedding Model Query Representation
State-of-the-Art Text Embeddings
Official PyTorch implementation for "Large Language Diffusion Models"
Zonos-v0.1 is a leading open-weight text-to-speech model trained on more than 200k hours of varied multilingual speech, delivering expressiveness and quality on par with—or even surpassing—top TTS …
This repository contains the Hugging Face Agents Course.
Keep searching, reading webpages, reasoning until it finds the answer (or exceeding the token budget)
Code for "Diffusion Model Alignment Using Direct Preference Optimization"
Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open
Janus-Series: Unified Multimodal Understanding and Generation Models
Codec for paper: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis
A lightweight Python library for running TTS models with a unified API.
Official implementation of the TTS model Lina-Speech
A demo of Cache-Augmented Generation (CAG) in an LLM
MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone
Search-o1: Agentic Search-Enhanced Large Reasoning Models
Cache-Augmented Generation: A Simple, Efficient Alternative to RAG
Bringing BERT into modernity via both architecture changes and scaling
Official repo for paper "Structured 3D Latents for Scalable and Versatile 3D Generation".
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.
Speech, Language, Audio, Music Processing with Large Language Model
Python tool for converting files and office documents to Markdown.
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.
ScholArxiv is an open-source, aesthetic, minimal and AI powered app that allows users to search, read, bookmark, share, download and view summaries of academic papers from the arXiv repository.