POC code for HarmonyCloak paper https://mosis.eecs.utk.edu/harmonycloak.html
dagger.py
- Make Music Unlearnable for Generative AI
dagger.py is a proof-of-concept script that demonstrates how to render audio files unlearnable for generative AI models by introducing imperceptible noise.
Psychoacoustic Noise Generation: Introduces imperceptible noise aligned with dominant frequencies to protect audio from generative AI learning.
STFT-Based Processing: Uses Short-Time Fourier Transform for frequency analysis and noise insertion.
Command-Line Interface: Fully configurable via CLI options for flexibility.
Supports WAV Files: Handles mono WAV files efficiently.
- Clone the repository:
git clone https://github.com/yourusername/harmonydagger.git
cd harmonydagger
- Install the required Python packages:
pip install numpy scipy librosa soundfile
Run the script using the command line:
python dagger.py <input_file> <output_file> [OPTIONS]
Required Arguments:
input_file: Path to the input WAV file.
output_file: Path to save the perturbed WAV file.
Optional Arguments:
--window_size: Window size for STFT (default: 1024).
--hop_size: Hop size for STFT overlap (default: 512).
--noise_scale: Scale of the generated noise (default: 0.01).
Example:
python dagger.py input.wav output_perturbed.wav --window_size 2048 --hop_size 1024 --noise_scale 0.02
- Frequency Analysis:
The script analyzes the input audio file using STFT to identify dominant frequencies.
- Noise Generation:
Imperceptible noise is generated based on psychoacoustic masking and aligned with the dominant frequencies.
- Noise Injection:
The noise is added to the original audio while preserving perceptual quality.
- Output:
The perturbed audio file is saved to the specified location.
numpy
scipy
librosa
soundfile
Install them using:
pip install numpy scipy librosa soundfile
Input Audio: Ensure the input audio is in mono WAV format. Stereo files can be converted using tools like librosa.
Output Audio: The perturbed audio retains perceptual quality and is safe for distribution.
Effectiveness: This script is a proof of concept and is intended for experimentation. Further enhancements are required for real-world robustness.
Support for multi-channel (stereo) WAV files.
Integration with more advanced psychoacoustic models.
Evaluation against specific generative AI models.
This project is licensed under the MIT License. See the LICENSE file for details.
Contributions are welcome! Please fork the repository and submit a pull request with your improvements.