-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PREP can't always flag low amplitude channels as bad-by-deviation #83
Comments
Wouldn't channels with suspiciously |
They'd only be flagged as bad-by-flat if their standard deviations were extremely low (i.e., practically no signal). This check seems to be for channels where they still have a signal, but its variance is just considerably lower than the median variance. Decreasing the z-score threshold to 3.29 (what I think you're thinking of, corresponds to a two-tailed test at p < 0.001) would definitely make detecting the low-amplitude channels easier, but even then the Z-test still rests on the assumption that you're using it with a normally-distributed population. Since the population here is a bunch of variance measures (which would be right-skewed), the right thing to do would be to either use a different test that doesn't make that assumption, or to try and transform the variances so they can be treated as roughly normal. I guess the most important question here is: how weak or strong should a signal need to be, relative to other channels in the same dataset, in order to warrant exclusion from the average reference signal? If we can figure out a good rule of thumb for that, it should be easier to figure out how to improve the stats. |
I think the PREP devs didn't think of what they did as a statistical test that matches certain assumptions (maybe I am wrong). --> Rather, I think they tried a bunch of thresholds on known robust metrics and "creative" robust metrics to see "what works". Now apparently they did not notice that low deviation channels are not detected as easily - but it's also possible that they were relying on other methods to catch these channels. If we can improve it, great. However for your question:
I think that's highly context dependent, and finding a good metric here can be its own scientific paper. --> If a thresh of 3.29 works for us for now, let's take it, document it, and stick with it until we find something better. |
Probably, but I think it still makes sense to treat it like one: we're testing to see whether each channel's amplitude meaningfully deviates from all the others, so it's still a statistical procedure (like outlier removal for reaction times) if not exactly a statistical test. Regardless, I think you're right that this deserves some simulation tests or something: maybe I'll write an R script for reading in the eegbci dataset and visualizing the effects of some different outlier exclusion techniques. Not likely to touch that for a while though, fixing the remaining MATLAB inconsistencies definitely takes priority! |
Reading back on this, I think the log transform may be a good change indeed. Did you mean changing these lines: pyprep/pyprep/find_noisy_channels.py Lines 288 to 294 in 8897804
to: -chan_amplitudes = _mat_iqr(self.EEGData, axis=1) * IQR_TO_SD
+chan_amplitudes = np.log(_mat_iqr(self.EEGData, axis=1) * IQR_TO_SD )
... ? :-) |
I've been picking away at refactoring the
NoisyChannels
unit tests so that each 'bad-by' type is covered by a separate test. In the process, though, I think I've discovered a weak spot in the original PREP logic: if I use a different test file, the 'bad-by-low-deviation test' below stops working:pyprep/tests/test_find_noisy_channels.py
Lines 51 to 60 in bc46978
This holds true even if I replace the "divide by 10" with a "divide by 100,000": the "robust channel deviation" z-score seems to plateau at around -4, never reaching the +/- 5 threshold for being bad by deviation.
Looking at the actual math, this makes sense:
(variance - median_variance) / variance_of_the_variance
).The problem here is that in step 2, the variances calculated in step 1 have a minimum value of zero. As such, even a channel with an amplitude of zero won't be detected as bad-by-deviation if the median isn't at least 5x the variance of the variances.
A quick-and-dirty fix is to z-score the log of the channel variances, which makes the variances more normally-distributed and thus makes the detection of low-amplitude channels much easier. However, this also increases the threshold for detecting high-amplitude bad-by-deviation channels, with the required multiplication factor going from 2.4 to 3.9 for my test channel.
Any ideas on how to better handle this?
The text was updated successfully, but these errors were encountered: