The Speech Signal Processing Toolkit (SPTK) is a software for speech signal processing tools.
- SPTK consists of over 100 commands for speech signal processing.
- The data format used in SPTK is raw header-less, i.e., there is no specific structure.
Thanks to the data format, we can check file contents immediately on CUI.
dmp +s data.raw
- The data used in the commands is passed through standard input/output.
We can chain multiple processes using pipes.
x2x +sd < data.raw | clip | x2x +da | less
- The data type is basically little-endian double 8 bytes.
- The commands do not require interactive user inputs.
Parameters are set via command line options beforehand.
impulse -l 4 | sopr -m 10 | x2x +da
- GCC 4.8.5+ / Clang 3.5.0+ / Visual Studio 2015+
- CMake 3.1+
expand
The latest release can be downloaded through Git. The install procedure is as follows.
git clone https://github.com/sp-nitech/SPTK.git
cd SPTK
make
Then the SPTK commands can be used by adding bin/
directory to the PATH
environment variable.
If you would like to use a part of the SPTK functions, please link the static library lib/libsptk.a
.
expand
You may need to add cmake
and MSBuild
to the PATH
environment variable in advance.
Please run make.bat
or open Command Prompt and follow the below procedure:
cd /path/to/SPTK # Please change here to your appropriate path.
mkdir build
cd build
cmake .. -DCMAKE_INSTALL_PREFIX=.. # Please change install directory.
MSBuild /p:Configuration=Release INSTALL.vcxproj
You can compile SPTK via GUI instead of running MSBuild by opening the generated project file.
Then the SPTK functions can be used by linking the static library lib/sptk.lib
.
- Analysis-synthesis via mel-cepstrum
- Parametric coding via line spectral pairs
SPTK provides some examples.
Go to an example directory and execute run.sh
, e.g.,
cd egs/analysis_synthesis/mgc
./run.sh
The below is a simple example that decreases the volume of input audio in input.wav
.
You may need to install sox
command on your system.
sox -t wav input.wav -c 1 -t s16 -r 16000 - |
x2x +sd | sopr -m 0.5 | x2x +ds -r |
sox -c 1 -t s16 -r 16000 - -t wav output.wav
If you would like to draw figures, please prepare a python environment.
cd tools; make venv PYTHON_VERSION=3.8; cd ..
. ./tools/venv/bin/activate
impulse -l 32 | gseries impulse.png
deactivate
- Input and output types are changed to double from float
- Signal processing classes are written in C++ instead of C
- Drawing commands are implemented in Python
- Some option names
- No memory leaks
- Thread-safe
- New features:
- Aperiodicity extraction (
ap
) - Conversion from/to log area ratio (
lar2par
andpar2lar
) - Dynamic range compression (
drc
) - Entropy calculation (
entropy
) - Huffman coding (
huffman
,huffman_encode
, andhuffman_decode
) - Magic number interpolation (
magic_intpl
) - Median filter (
medfilt
) - Mel-cepstrum postfilter (
mcpf
) - Mel-filter-bank extraction (
fbank
) - Nonrecursive MLPG (
mlpg -R 1
) - Pitch adaptive spectrum estimation (
pitch_spec
) - Pitch extraction by DIO used in WORLD (
pitch -a 3
) - PLP extraction (
plp
) - Pole-zero plot (
gpolezero
) - Scalar quantization (
quantize
anddequantize
) - Sinusoidal generation from pitch (
pitch2sin
) - Spectrogram plot (
gspecgram
) - Stability check of LPC coefficients (
lpccheck
) - Subband decomposition (
pqmf
andipqmf
) - WORLD synthesis (
world_synth
) - Windows build support
- Aperiodicity extraction (
- Obsoleted commands:
acep
,agcep
, andamcep
->amgcep
bell
c2sp
->mgc2sp
cat2
andecho2
da
ds
,us
,us16
, anduscd
->sox
fig
gc2gc
->mgc2mgc
gcep
,mcep
, anduels
->mgcep
glsadf
,lmadf
, andmlsadf
->mglsadf
ivq
andvq
->imsvq
andmsvq
lsp2sp
->mglsp2sp
mgc2mgclsp
andmgclsp2mgc
psgr
andxgr
raw2wav
,wav2raw
,wavjoin
, andwavsplit
->sox
- Separated commands:
c2ir
->c2mpir
andmpir2c
dtw
->dtw
anddtw_merge
mglsadf
->mglsadf
andimglsadf
train
->train
andmseq
ulaw
->ulaw
andiulaw
vstat
->vstat
andmedian
- Renamed commands:
mgclsp2sp
->mglsp2sp
- Keiichi Tokuda - Produce and Design - Nagoya Institute of Technology
- Keiichiro Oura - Nagoya Institute of Technology
- Takenori Yoshimura - Main Maintainer - Nagoya Institute of Technology
- Takato Fujimoto - Nagoya Institute of Technology
- Akira Tamamori
- Cassia Valentini
- Chiyomi Miyajima
- Fernando Gil Resende Junior
- Gou Hirabayashi
- Heiga Zen
- Junichi Yamagishi
- Kazuhito Koishida
- Keiichi Tokuda
- Keiichiro Oura
- Kenji Chiba
- Masatsune Tamura
- Naohiro Isshiki
- Noboru Miyazaki
- Satoshi Imai
- Shinji Sako
- Tadashi Kitamura
- Takao Kobayashi
- Takashi Masuko
- Takashi Nose
- Takato Fujimoto
- Takayoshi Yoshimura
- Takenori Yoshimura
- Toru Takahashi
- Toshiaki Fukada
- Toshihiko Kato
- Toshio Kanno
- Yoshihiko Nankaku
This software is released under the Apache License 2.0.
@InProceedings{sp-nitech2023sptk,
author = {Takenori Yoshimura and Takato Fujimoto and Keiichiro Oura and Keiichi Tokuda},
title = {{SPTK4}: An open-source software toolkit for speech signal processing},
booktitle = {12th ISCA Speech Synthesis Workshop (SSW 2023)},
pages = {211--217},
year = {2023},
}