Stars
deep learning for image processing including classification and object-detection etc.
🍀 Pytorch implementation of various Attention Mechanisms, MLP, Re-parameter, Convolution, which is helpful to further understand papers.⭐⭐⭐
Implementation of different kinds of Unet Models for Image Segmentation - Unet , RCNN-Unet, Attention Unet, RCNN-Attention Unet, Nested Unet
Pyroomacoustics is a package for audio signal processing for indoor applications. It was developed as a fast prototyping platform for beamforming algorithms in indoor scenarios.
This repo contains the scripts, models, and required files for the Deep Noise Suppression (DNS) Challenge.
基于PaddlePaddle实现端到端中文语音识别,从入门到实战,超简单的入门案例,超实用的企业项目。支持当前最流行的DeepSpeech2、Conformer、Squeezeformer模型
PyTorch implementation of "FullSubNet: A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement."
This repo hosts the code and models of "Masked Autoencoders that Listen".
(IJCV2024 & ICCV2023) LSKNet: A Foundation Lightweight Backbone for Remote Sensing
Conformer-based Metric GAN for speech enhancement
Implementation of a U-net complete with efficient attention as well as the latest research findings
transform-average-concatenate (TAC) method for end-to-end microphone permutation and number invariant ad-hoc beamforming.
The official PyTorch implementation of "FullSubNet+: Channel Attention FullSubNet with Complex Spectrograms for Speech Enhancement".
[NeurIPS'22] Squeezeformer: An Efficient Transformer for Automatic Speech Recognition
A simple library for theoretical research on direction-of-arrival (DOA) estimation in array signal processing.
[ICASSP 2023] Official Tensorflow implementation of "Temporal Modeling Matters: A Novel Temporal Emotional Modeling Approach for Speech Emotion Recognition".
A meta-package for tianbot autonomous AI racecar based on nvidia development kits.
[IJCAI 2024] EAT: Self-Supervised Pre-Training with Efficient Audio Transformer
The implementation of "Dual-branch Attention-In-Attention Transformer for single-channel speech enhancement"
A PyTorch implementation of Time-domain Audio Separation Network (TasNet) with Permutation Invariant Training (PIT) for speech separation.
A two-stage polyphonic sound event detection and localization method for both SED and DOA.
A description of "RealMAN: A Real-Recorded and Annotated Microphone Array Dataset for Dynamic Speech Enhancement and Localization" [NeurIPS 2024]
The Official PyTorch Implementation of FN-SSL & IPDnet for Sound Source Localization [INTERSPEECH2023 & TASLP2024]
The official PyTorch implementation of "Inter-SubNet: Speech Enhancement with Subband Interaction", accepted by ICASSP 2023.