forked from smarsland/AviaNZ
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathTODO.txt
291 lines (230 loc) · 11.8 KB
/
TODO.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
* changelog and readme
* What else is on the git repository as issues?
* Segmentation options, especially merging, to be sorted (see below)
* The full set of features to be implemented, checked, and then used
* LSTM and CNN plus other learners
* tSNE and PCA
* For cheatsheet and zooniverse
-> sort out data
-> make the files
-> Cheatsheet: make the webpage
-> Zooniverse:
-> What data?
-> Picture of animal
-> Sample files
-> Put the data into the web portal
* Use the spectrogram inversion code to (a) change pitch, (b) clean the file as an image and reconvert
* Wigner-Villes Distribution code (eventually cython)
* New fundamental frequency algorithm
* More thought about how we decide and encode certainty (certainly is, certainly isn't)
# ==============
# TODO
# Finish segmentation
# Add a minimum length of time for a segment -> make this a parameter
# Finish sorting out parameters for median clipping segmentation, energy segmentation
# Finish cross-correlation to pick out similar bits of spectrogram -> and what other methods?
# Add something that aggregates them -> needs planning
# Interface -> inverted spectrogram does not work - spec and amp do not synchronize
# Actions -> Denoise -> median filter check
# Make the median filter on the spectrogram have params and a dialog. Other options?
# Finish the raven features
# Would it be good to smooth the image? Actually, lots of ideas here! Might be nice way to denoise?
# Median filter, smoothing, consider also grab-cut
# Continue to play with inverting spectrogram
# Colourmaps
# HistogramLUTItem
# Context menu different for day and night birds?
# Minor:
# Consider always resampling to 22050Hz (except when it's less in file :) )?
# Font size to match segment size -> make it smaller, could also move it up or down as appropriate
# Where should label be written?
# Use intensity of colour to encode certainty?
# If don't select something in context menu get error -> not critical
# Colours of the segments to be visible with different colourmaps? Not important!
# Look at raven and praat and luscinia -> what else is actually useful? Other annotations on graphs?
# Don't really want to load the whole thing, just 5 mins, and then move through with arrows -> how?
# This is sometimes called paging, I think. (y, sr = librosa.load(filename, offset=15.0, duration=5.0) might help. Doesn't do much for the overview through)
# ===============
# TODO for AviaNZ
Tier 1 annotation - labels to use so that we can evaluate results in detail
## AviaNZ annotation ######################################## in corresponding GT (-sec.txt) #############
quality male female cannot decide # time presence/absence type quality #
v close 'Kiwi(M)1' 'Kiwi(F)1' 'Kiwi1' # 1 1 M/F ***** #
close 'Kiwi(M)2' 'Kiwi(F)2' 'Kiwi2' # #
faded 'Kiwi(M)3' 'Kiwi(F)3' 'Kiwi3' # #
v faded 'Kiwi(M)4' 'Kiwi(F)4' 'Kiwi4' # #
v v faded 'Kiwi(M)5' 'Kiwi(F)5' 'Kiwi5' # n 1 M/F * #
#
Bittern annotation - same as kiwi #
'Bittern(B)1' to 'Bittern(B)5' for the booms #
'Bittern(I)1' to 'Bittern(I)5' for the inhalations #
##########################################################################################################
AvianZ seems to be stable now! Just minor things
Set Operator/Reviewer (Current File) -> does not remember if it has no annotation. Instead do we make
the option disable if it has no annotation?
Spectrogram options highest frq. 16000->4000 does it need to remember when moving to next file (currently not)
Ask if you want to save it, then put it in the params file?
Bug to fix: the very first annotation gets lost.
#done
annotations lost after changing interface settings.
#done, update the overview
Add a sub menu item 'Save selected sound': useful to extract interesting sounds from long recordings (this stops me going back to Praat)
#done
Change spectrogram parameters is not working smooth
- if someone changes window length and hop they assume it is saved (when you open a new file changes are gone), so let it remember?
- low and high frequency is the same, in addition it is ugly!
GT format - need GT is in .txt format (no need to have them as .xlsx) then update the code
#done
Time axis is wrong when file name has more text in the begining e.g. SM recordings
#done
Look into MFCC as a way to do template matching. Plus some others, but start there.
So for each segment identified by the wavelets, compute the coefficients, and plot them for now :)
# done some experiments with Ponui dataset and Tier1. MFCC + DTW could manage Ponui data but not good for Tier1, might need more examples
Training data
- needs to be less noisy and not overlapped with other species.
- how long the examples should be? complete call or individual syllbles?
- I choose syllables, then the problem was test sounds (AviaNZ detections) are usually complete calls/more than one syllables.
length/#syllables in test and templates matter, so I got the average MFCC over all frames.
- how many MFCCs? tested with 12, 20, 48. 12 with delta was better.
NP:
Introduce confidence using energy ratio after detection
# done, eRatio was good at picking level 1-3 according to my testing.
# I'm processing Tier1 data using wavelet segmentation followed by this eRatio and generating two excel
# outputs:
(1) 'Possible': includes all the segments that turned out to be kiwi
(2) 'withCofidence': includes only the segments confirmed with eRatio.
# and the annotation with 'kiwi' for segments with confidence and 'kiwi?' for possibles.
# Phase 1 (wavelet detection) of Tier1 data processing is completing on 15th Jan 2018. Started on 30th Oct, done mostly
between 3 machines (plus 6 from the undergrad lab for 1 week). Tried Azure but wan't easy and had to gave it up (needs data to be uploaded).
SM:
from PyQt4 import QtGui
QtGui.QImageReader.supportedImageFormats()
Check the parameters for the clicks
And where to use -- current idea is to remove them during/after detection
And can we use the same idea for wind?
Adaptive noise floor/SNR
Finish feature vectors
Look into image denoising more
SM Choose best sets of nodes and evaluate
SM Make the general segmentation more sensitive
SM Get to grips with the bloody wavelet segmenter
NP Features
NP Good training set
NP Work out how not to mistake kiwi
NP Sort out 1 version of code for excel exporting with 1 minute presence/absence
Move from AviaNZ to SupportClasses
#Done
Add the 1 min workflow (see below)
#Done the first version - arrange output without changing the detectors
NP Check manual
#Done
We should start to decide what makes a specialised filter
Mingap, minlength
Threshold for segmenters
Wavelet nodes
Trained classifiers
(2) SRM And the video!
(6) SRM Multiprocessing where possible
It made things slower!
(7) NP Classify segments
(i) Wavelets better version
(ii) Learning based on whatever (check Raven features)
(iii) SRM Wavelet energy features
# What we want from the Tier 1 output:
Presence or absence at some time resolution (e.g., 1 min, or 5 mins)
(1) Pick up a 1 min section
(2) Detect kiwi
(3) As soon as a definite kiwi is detected, mark presence, move on to next section
(4) If a possible kiwi is detected, mark it and keep on going
(5) If you find a definite, delete possible, go to (1)
(6) If find no kiwi mark as absence
(7) If find only possibles, ask user at end of all file processing using Check segments interface
Work out how to make old wavelet method produce too many segments
Make a new method (not inside Segmentation):
(1) Perform segmentation using at least one of old wavelet method and any other segmenter, possibly multiple options
# Partly done, (FIR + median clip). Add in wavelets.
# And it does need parameters
(2) Combine the segments as appropriate
# Two versions are done. Opens it out so far, no max. Either with or without envelopes kept.
(3) Perform classification on each segment using at least one of wavelet energies, Raven features, MFCC, LPC, fundamental freq
# To be done (ready to modify wavelet segmentation part)
Plot the wavelet energy for noise, crickets, calls, etc.
Suppress the noise nodes, reconstruct -- what happens?
Parallel process the segmenters
get name from standard files and use it
# Name bit is done, not yet used
# Look into the invertible CQT
# Doesn't seem to help with tril1
# Dominant frequency
# warbleR features
# plots of eg xcorr
# DTW - 2D, fast
look into DP
NP:
Test the idea of using the wavelet nodes with either
(a or b or c) and not (d or e)
(a and (b or c))
or maybe both :)
Will have to count the number of times each occurs
Idea is to reduce (i) misclassifications, (2) crickets, (3) wide-band clicks, (4) rain
Denoising experiments
-> short time better than long?
-> denoise only segments
-> compare python and matlab
Segmentation experiments
-> like thesis but better :)
-> paper
(1) Minor bugs and extensions
NP: Read mp3 files
# SM: pysox
??: What to do with stereo sound? How about consistent sample rates?
NP: Make play sounds play the denoised version after denoising
# This seems to be complicated with undo etc. Still thinking whats the best way to do it,
can easily add a separate button to play denoised but not nice. I fixed the plotting problem though.
NP: Stop loading the file when choose to cancel the progress bar
# Removed the cancel button for now. otherwise have to unroll what happened inside the loadFile when the user cancel it.
File list dock becomes frozen - one was experiencing this. Had to restart the program.
# Can we reproduce it?
NP: Fully integrate wavelet seg into program
NP: We actually want Kiwi (M) and Kiwi (F), and need to get all the ruru calls
NP: Want to have some form of machine learning
(1) decision tree
(2) MLP
(3) SVM
(4) boosting
SM: And need to think about the 95% confidence thing a lot more
SM: Make the segmentations work fully
Get the minimum time of a segment parameter sorted
Work out how to combine the methods, particularly, spot overlaps in segments and combine them
Work out parameters and how to set them
Remove cross-correlation? Or just improve?
Both: Think about nice ways to train a new wavelet filter
And get the whole workflow sorted for it
(4) Features
Use the wavelets
Finish the Raven features
Add MFCC (nearly done)
And whatever else seems interesting
More on fundamental frequency
smoothing?
harvest or bana (yaapt was awful)
Shape metrics
(5) Learning
Standard methods
MLP
Decision tree
Boosting
SVM
LSTM or GRU
HMM to string syllables together
(6) Other
Think more about the spectrogram inversion
If it works --> Stu's bats
Denoise the spectrogram fully (median filter, smoothing, consider grab-cut)
Any necessary database or metadata things?
Bats do keep on coming up...
Generative noise model
# Tier 1
(1) Segmentation that is as good as we can get it
(2) Wavelet recognition ditto
(3) Non-wavelet recognition ditto