The Speech Signal Processing Toolkit (SPTK) is a software for speech signal processing tools.
See this page for a reference manual.
- GCC 4.8.5+ / Clang 3.5.0+ / Visual Studio 2015+
- CMake 3.1+
expand
The latest release can be downloaded through Git. The install procedure is as follows.
git clone https://github.com/sp-nitech/SPTK.git
cd SPTK
mkdir build
cd build
cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=.. # Please change install directory.
make -j 4 install # Please change the number of jobs depending on your environment.
Then the SPTK commands can be used by adding bin/
directory to the PATH
environment variable.
If you would like to use a part of the SPTK functions, please link the static library lib/libsptk.a
.
expand
You may need to add cmake
and MSBuild
to the PATH
environment variable in advance.
Open Command Prompt and follow the below procedure:
cd /path/to/SPTK # Please change here to your appropriate path.
mkdir build
cd build
cmake .. -DCMAKE_INSTALL_PREFIX=.. # Please change install directory.
MSBuild -maxcpucount:4 /p:Configuration=Release INSTALL.vcxproj
You can compile the programs via GUI instead of running MSBuild.
Then the SPTK functions can be used by linking the static library lib/sptk.lib
.
- Analysis-synthesis via mel-cepstrum
- Parametric coding via line spectral pairs
The SPTK provides some examples.
Go to an example directory and execute run.sh
, e.g.,
cd egs/analysis_synthesis/mgc
./run.sh
The below is a simple example that decreases the volume of input audio in input.wav
.
You may need to install sox
command on your system.
sox -t wav input.wav -c 1 -t s16 -r 16000 - |
x2x +sd | sopr -m 0.5 | x2x +ds -r |
sox -c 1 -t s16 -r 16000 - -t wav output.wav
If you would like to draw figures, please prepare a python environment.
cd tools; make venv PYTHON_VERSION=3.8; cd ..
. ./tools/venv/bin/activate
impulse -l 32 | gseries impulse.png
deactivate
- Input and output types are changed to double from float
- Signal processing classes are written in C++ instead of C
- Drawing commands are implemented in Python
- No memory leaks
- Thread-safe
- New features:
- Conversion from/to log area ratio (
lar2par
andpar2lar
) - Dynamic range compression (
drc
) - Entropy calculation (
entropy
) - Huffman coding (
huffman
,huffman_encode
, andhuffman_decode
) - Magic number interpolation (
magic_intpl
) - Median filter (
medfilt
) - Mel-cepstrum postfilter (
mcpf
) - Mel-filter-bank extraction (
fbank
) - Nonrecursive MLPG (
mlpg -R 1
) - Pitch extraction by DIO used in WORLD (
pitch -a 3
) - Pole-zero plot (
gpolezero
) - Scalar quantization (
quantize
anddequantize
) - Spectrogram plot (
gspecgram
) - Stability check of LPC coefficients (
lpccheck
) - Subband decomposition (
pqmf
andipqmf
) - Windows build support (only static library)
- Conversion from/to log area ratio (
- Obsoleted commands:
acep
,agcep
, andamcep
->amgcep
bell
c2sp
->mgc2sp
cat2
andecho2
da
ds
,us
,us16
, anduscd
->sox
fig
gc2gc
->mgc2mgc
gcep
,mcep
, anduels
->mgcep
glsadf
,lmadf
, andmlsadf
->mglsadf
ivq
andvq
->imsvq
andmsvq
lsp2sp
->mglsp2sp
mgc2mgclsp
andmgclsp2mgc
psgr
andxgr
raw2wav
,wav2raw
,wavjoin
, andwavsplit
->sox
- Separated commands:
c2ir
->c2mpir
andmpir2c
dtw
->dtw
anddtw_merge
mglsadf
->mglsadf
andimglsadf
train
->train
andmseq
ulaw
->ulaw
andiulaw
vstat
->vstat
andmedian
- Renamed commands:
mgclsp2sp
->mglsp2sp
- Keiichi Tokuda - Produce and Design - Nagoya Institute of Technology
- Keiichiro Oura - Nagoya Institute of Technology
- Takenori Yoshimura - Main Maintainer - Nagoya Institute of Technology
- Takato Fujimoto - Nagoya Institute of Technology
- Akira Tamamori
- Cassia Valentini
- Chiyomi Miyajima
- Fernando Gil Resende Junior
- Gou Hirabayashi
- Heiga Zen
- Junichi Yamagishi
- Kazuhito Koishida
- Keiichi Tokuda
- Keiichiro Oura
- Kenji Chiba
- Masatsune Tamura
- Naohiro Isshiki
- Noboru Miyazaki
- Satoshi Imai
- Shinji Sako
- Tadashi Kitamura
- Takao Kobayashi
- Takashi Masuko
- Takashi Nose
- Takato Fujimoto
- Takayoshi Yoshimura
- Takenori Yoshimura
- Toru Takahashi
- Toshiaki Fukada
- Toshihiko Kato
- Toshio Kanno
- Yoshihiko Nankaku
This software is released under the Apache License 2.0.