AudioSR: Versatile Audio Super-resolution at Scale

Pass your audio in, AudioSR will make it high fidelity!

Work on all types of audio (e.g., music, speech, dog, raining, ...) & all sampling rates.

Share your thoughts/samples/issues in our discord channel: https://discord.gg/HWeBsJryaf

Change Log

2024-12-31: The training code of AudioSR can be found here (For reference only. The code is not carefully organized.).
2024-12-16: Add Important things to know to make AudioSR work.
2023-09-24: Add replicate demo (@nateraw); Fix error on windows, librosa warning etc (@ORI-Muchim).
2023-09-16: Fix DC shift issue. Fix duration padding bug. Update default DDIM steps to 50.

Gradio Demo

To run the Gradio demo locally:

Install dependencies: pip install -r requirements.txt
Run the app: python app.py
Open the URL displayed to view the demo

Commandline Usage

Installation

# Optional
conda create -n audiosr python=3.9; conda activate audiosr
# Install AudioLDM
pip3 install audiosr==0.0.7
# or
# pip3 install git+https://github.com/haoheliu/versatile_audio_super_resolution.git

Usage

Process a list of files. The result will be saved at ./output by default.

audiosr -il batch.lst

Process a single audio file.

audiosr -i example/music.wav

Full usage instruction

> audiosr -h

> usage: audiosr [-h] -i INPUT_AUDIO_FILE [-il INPUT_FILE_LIST] [-s SAVE_PATH] [--model_name {basic,speech}] [-d DEVICE] [--ddim_steps DDIM_STEPS] [-gs GUIDANCE_SCALE] [--seed SEED]

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT_AUDIO_FILE, --input_audio_file INPUT_AUDIO_FILE
                        Input audio file for audio super resolution
  -il INPUT_FILE_LIST, --input_file_list INPUT_FILE_LIST
                        A file that contains all audio files that need to perform audio super resolution
  -s SAVE_PATH, --save_path SAVE_PATH
                        The path to save model output
  --model_name {basic,speech}
                        The checkpoint you gonna use
  -d DEVICE, --device DEVICE
                        The device for computation. If not specified, the script will automatically choose the device based on your environment.
  --ddim_steps DDIM_STEPS
                        The sampling step for DDIM
  -gs GUIDANCE_SCALE, --guidance_scale GUIDANCE_SCALE
                        Guidance scale (Large => better quality and relavancy to text; Small => better diversity)
  --seed SEED           Change this value (any integer number) will lead to a different generation result.
  --suffix SUFFIX       Suffix for the output file

TODO

Add gradio demo.
Optimize the inference speed.

Cite our work

If you find this repo useful, please consider citing:

@inproceedings{liu2024audiosr,
  title={{AudioSR}: Versatile audio super-resolution at scale},
  author={Liu, Haohe and Chen, Ke and Tian, Qiao and Wang, Wenwu and Plumbley, Mark D},
  booktitle={IEEE International Conference on Acoustics, Speech and Signal Processing},
  pages={1076--1080},
  year={2024},
  organization={IEEE}
}

Understanding the Impact of Cutoff Patterns on AudioSR Performance

AudioSR is a powerful tool for audio super-resolution. However, its performance can be significantly influenced by the characteristics of the input data, especially the cutoff pattern.

🚩 When AudioSR May Fail

Input Audio with Unfamiliar Cutoff Patterns
If the input audio file contains a cutoff pattern that is significantly different from those used in training, AudioSR may fail to perform effectively.
Input Audio with Severe Distortions
Strong distortions such as excessive noise or reverb can degrade the performance of AudioSR.

❓ Why Do Cutoff Patterns Have Such a Huge Impact on AudioSR?

During training, our data was simulated using low-pass filtering. The model was not trained to handle other causes of high-frequency loss, such as MP3 compression. As a result, AudioSR struggles when encountering unfamiliar cutoff patterns.

For example, MP3 compression can introduce a cutoff pattern that looks like this:

Why This Matters

As you can see, there are spectrogram holes near the cutoff range, which differ significantly from the patterns seen during training. When you apply AudioSR to such data, the output may look like this:

The higher frequencies are not adequately inpainted due to the unfamiliar cutoff pattern.

A Simple Solution: Low-Pass Filtering

To mitigate this issue, you can perform a low-pass filtering on the audio before feeding it into AudioSR. After low-pass filtering, the audio would resemble a standard low-pass cutoff pattern, like this:

When processed by AudioSR, the output will then be as expected, with improved high-frequency inpainting:

By understanding the limitations and addressing them with preprocessing, you can maximize the performance of AudioSR!

Name	Name	Last commit message	Last commit date
Latest commit haoheliu Merge pull request #75 from Rhapsody360/main Feb 10, 2025 210c1fa · Feb 10, 2025 History 68 Commits
audiosr	audiosr	Fix model loading twice on CUDA	Dec 11, 2024
bin	bin	Update audiosr	Sep 28, 2023
example	example	update	Dec 16, 2024
.gitignore	.gitignore	update download link	Feb 5, 2024
LICENSE	LICENSE	Update LICENSE	Nov 11, 2023
MANIFEST.in	MANIFEST.in	update	Sep 6, 2023
README.md	README.md	Update README.md	Jan 28, 2025
app.py	app.py	Demo Gradio App	Apr 27, 2024
batch.lst	batch.lst	done	Sep 6, 2023
cog.yaml	cog.yaml	update audiosr version	Feb 5, 2024
inference.py	inference.py	Add jarredou's CLI inference code.	Dec 11, 2024
predict.py	predict.py	✨ Add replicate demo	Sep 20, 2023
requirements.txt	requirements.txt	update requirements.txt	Feb 1, 2025
setup.py	setup.py	update audiosr version	Feb 5, 2024
visualization.png	visualization.png	update	Sep 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AudioSR: Versatile Audio Super-resolution at Scale

Change Log

Gradio Demo

Commandline Usage

Installation

Usage

TODO

Cite our work

Understanding the Impact of Cutoff Patterns on AudioSR Performance

🚩 When AudioSR May Fail

❓ Why Do Cutoff Patterns Have Such a Huge Impact on AudioSR?

Why This Matters

A Simple Solution: Low-Pass Filtering

About

Releases

Packages

Contributors 9

Languages

License

haoheliu/versatile_audio_super_resolution

Folders and files

Latest commit

History

Repository files navigation

AudioSR: Versatile Audio Super-resolution at Scale

Change Log

Gradio Demo

Commandline Usage

Installation

Usage

TODO

Cite our work

Understanding the Impact of Cutoff Patterns on AudioSR Performance

🚩 When AudioSR May Fail

❓ Why Do Cutoff Patterns Have Such a Huge Impact on AudioSR?

Why This Matters

A Simple Solution: Low-Pass Filtering

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 9

Languages

Packages