-
Japan
- WIRED
- https://misumisumi.github.io
- @sumiNull
- @SumiSumiVRC
Highlights
- Pro
Lists (31)
Sort Name ascending (A-Z)
👍 LifeTips
ASR
📖 LLM
Large Language Model📷 Vision
Vision Model🐈 VRChat
Tools for VRChat☕ coffee break
Develop Envs
Developing environment toolsEnglish
fediverse
github-workflow
GUI
k8s
latex
macos
🔍 ML
In General ML📝 Editor
Editor and pluginsMusicSourceSeparation
neovim
obsidian
📦 nix
NixOSとかnixpkgsとかその他nixに関連するもの📄 Dataset
📎 Paper and Docs
Paper and Docs🐕🦺 Server
ServerShell
🔉 DSP
Degital Sound Processing🔉 TTS
Text To Speech🔉 VC
Voice Changer🔉 Vocoder
Vocoder Modelspeech-processing
🎮 Emurator
Emuratorvm
Stars
Stable Diffusion web UI
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Robust Speech Recognition via Large-Scale Weak Supervision
Interact with your documents using the power of GPT, 100% privately, no data leaks
LlamaIndex is a data framework for your LLM applications
OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
Deezer source separation library including pretrained models.
Easily train a good VC model with voice data <= 10 mins!
Image-to-Image Translation in PyTorch
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…
Your AI second brain. Self-hostable. Get answers from the web or your docs. Build custom agents, schedule automations, do deep research. Turn any online or local LLM into your personal, autonomous …
リアルタイムボイスチェンジャー Realtime Voice Changer
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference,…
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Flet enables developers to easily build realtime web, mobile and desktop apps in Python. No frontend experience required.
PyTorch implementation of the U-Net for image semantic segmentation with high quality images
Code for the paper Hybrid Spectrogram and Waveform Source Separation
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
vits2 backbone with multilingual-bert
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
Self-hosted bookmark manager that is designed be to be minimal, fast, and easy to set up using Docker.