examples
Folders and files
Name | Name | Last commit date | ||
---|---|---|---|---|
parent directory.. | ||||
this is the example directory of emoSyn. If you want to build a database following this example, you need the Entropic ESPS-tools and xwaves. I further assume that you use unix and have perl and preferably the sensyn synthesizer in your path. the filelist should show the following: README - this file diphonDB/ - example diphone-database for a german utterance modFiles/ - example emoSyn modification-files for different emotions example.sd - the utterance that got copysynthesized. example.ll - phone-labeling file example.dp - diphone-labeling file example.ed.fb - the manually corrected formant-tracks example.ed.f0 - the manually corrected f0- and amplitude-tracks example.phon - input file for emoSyn to generate the eph-file from the database example.eph - the resulting input file for emoSyn emoSyn.conf - klatt default parameters needed by emoSyn WHAT TO DO TO MAKE A COPY-SYNTHESIS ------------------------------------ 1.- record an utterance (speaker should be male and have a nice voice characteristic with easily seen formant tracks ;-)) 2.- convert to (esps) sd-format and launch xwaves. (see example-org.sd) 3. segment the file into diphones. It might be easier if you start with a phone segmenting (see example.ll) and than mark the diphone borders (see example.dp). While labeling the utterance you must keep in mind that all phone-names must be known to emoSyn. As I used emoSyn for German, I provided for all phones that are mentioned in the de1 database of mbrola (in Sampa, this should cover german). Each diphone starts with it's name, followed by some markers for the transitions. It ends with a ".". eg: 0.115 -1 a-n 0.145 -1 t1 0.155 -1 brd 0.155 -1 t2 0.175 -1 . means that diphone a-n starts at 0.115 sec., the transition from /a/ to /n/ starts at 0.145, the border between /a/ and /n/ is at 0.155 and the transition from /a/ to /n/ ends at 0.155 (i.e. abrupt). The steady state of the /n/ is given until 0.175 sec. If you have a stop as first phone of a diphone, t1 designates the time where the burst is finished and brd the time where the aspiration is finished. If you have a stop as second phone, t2 means nothing, it should be set t2=brd. Same goes for silence. 4. generate f0-file. make a f0-analysis (Menu Changes - Add extended Waveform Op - F0 analysis) and hand-edit the f0 and amplitude (rms) contour. (right click - Button modes - middle/left button - modify signal) The result is saved automatically by xwaves as <filname>.sd.ed.out. Save it under <filname>.ed.f0 . 5. generate fb-file. Make a spectrogram of the whole utterance. Make a format overlay over the spectrogram (Menu Changes - Add extended Image Op - formants (w/overlay)) and hand-edit the formant-tracks: (right click - Button modes - middle/left button - mark formants) The result is saved automatically by xwaves as <filname>(some numbers).fb.ed.sig. Save it under <filname>.ed.fb . 6. build the database use the perlscript filterStartWerte.pl that comes with this distribution to make all timevalues in the labelfile dividable by 5. use the perlscript segmentDiphon.pl that comes with this distribution to generate the database in directory <diphonDir> segmentDiphon.pl diphonLabelFile formantFile f0File diphonDir 7. generate the eph-file eph (extended pho) is the input-format of emoSyn. you can generate it with emoSyn for a test sentence, if you write a phon-file, where in each line is the name of the phone or a syllable border (see example.phon). then you can invoke emoSyn with the following line: bin/emoSyn -kd examples/emoSyn.conf -db examples/diphonDB/ -makePhoFromDB examples/example.phon > example.eph; 8. that's it. if you have "sensyn" (the formant-synthesizer) and "play" (for playing wav) in your path, you can listen to the result by using the testSenSyn.sh-script. (the sensyn-program should be patched to enable batch-mode). If some phones sound extremely strange, it might be that theu didn't occur in my testsentence and I never modeled them. You should provide for them by changing the methods phon::setSoundSource() and phon::setArticulationTractFilter() . some example modification-files (that's what emoSyn is all about) are in the directory modFiles/ .