Stars
Implementation of Voicebox, new SOTA Text-to-speech network from MetaAI, in Pytorch
Foundational Models for State-of-the-Art Speech and Text Translation
[ICLR24] Official PyTorch Implementation of Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors
Pytorch implementation of DiffusionNet for fast and robust learning on 3D surfaces like meshes or point clouds.
[ICCV 2023] StyleGANEX: StyleGAN-Based Manipulation Beyond Cropped Aligned Faces
Official release of InternLM series (InternLM, InternLM2, InternLM2.5, InternLM3).
GLM-130B: An Open Bilingual Pre-Trained Model (ICLR 2023)
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Few-shot Keyword Spotting in Any Language and Multilingual Spoken Word Corpus
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (V…
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding a…
YOLOv6: a single-stage object detection framework dedicated to industrial applications.
Bringing Characters to Life with Computer Brains in Unity
Scaled-YOLOv4: Scaling Cross Stage Partial Network
Monocular, One-stage, Regression of Multiple 3D People and their 3D positions & trajectories in camera & global coordinates. ROMP[ICCV21], BEV[CVPR22], TRACE[CVPR2023]
Official implementation of Diffusion Autoencoders
Human Pose Estimation Related Publication
ncnn is a high-performance neural network inference framework optimized for the mobile platform
Pyroomacoustics is a package for audio signal processing for indoor applications. It was developed as a fast prototyping platform for beamforming algorithms in indoor scenarios.
Visualizer for neural network, deep learning and machine learning models
Command line utility for forced alignment using Kaldi
Official PyTorch implementation of "Neural Head Avatars from Monocular RGB Videos"
Implementation of Generating Diverse High-Fidelity Images with VQ-VAE-2 in PyTorch
PyTorch Implementation of DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs