Text-to-Playable-Sound: synthesizers based on latent diffusion technology

In this work, the integration and applicability of generative artificial intelligence in the field of music production was analyzed through the introduction of a digital instrument. Using selected diffusion models, users can define sounds through textual descriptions and play and manipulate them with standardized music production tools. The diffusion models used were evaluated for their suitability in the given context and modified for integration into a digital instrument. Using certain frameworks, the digital instrument was created, resulting in a user interface. This allows users to edit model and instrument-specific parameters. The analysis showed that the models used do not always adequately respond to the context of music production, resulting in unexpected sound patterns or abstract artifacts. Currently, available text-to-audio models do not provide high-quality reproduction of familiar sounds but offer opportunities for experimental applications. The implementation of a prototype of the digital instrument allows for such experiments and the exploration of innovative sound synthesis methods. However, functions are currently missing to reproduce selected areas of the generated sounds or to play them indefinitely. Nevertheless, interesting and unusual soundscapes can already be produced, which could find application in musical compositions.

Info

The pdf contains the finished bachelor thesis.

The resulting digital instrument of the thesis is accesiable under WaveGenSynth

The installer and executables of the digital instrument can be found in the following release

The source code for the evaluation page is available here

Source code for the generation of the images of the thesis is available here

The LaTeX source code is accessible here

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
BachelorThesisLatex @ 0114d57		BachelorThesisLatex @ 0114d57
ResultPage		ResultPage
ThesisPythonGraphs @ d068b22		ThesisPythonGraphs @ d068b22
WaveGenSynth @ af52261		WaveGenSynth @ af52261
.DS_Store		.DS_Store
.gitmodules		.gitmodules
BachelorThesis.pdf		BachelorThesis.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text-to-Playable-Sound: synthesizers based on latent diffusion technology

Info

About

Releases

Packages

suckrowPierre/BachelorThesis

Folders and files

Latest commit

History

Repository files navigation

Text-to-Playable-Sound: synthesizers based on latent diffusion technology

Info

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages