Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timing Errors (workaround for widescale tests) #289

Open
kkappler opened this issue Sep 15, 2023 · 1 comment
Open

Timing Errors (workaround for widescale tests) #289

kkappler opened this issue Sep 15, 2023 · 1 comment
Assignees

Comments

@kkappler
Copy link
Collaborator

kkappler commented Sep 15, 2023

We have never carefully studied timing errors and the structure of MTH5 makes severe timing errors extremely unlikely, but it is possible to have some minor errors. These should eventually be addressed in MTH5, with a generic solution.

For now these are impeding the widescale test processing, and a workaround will be put into aurora.

This issue was encountered when processing Station ORF10 with remote reference ORG10.

Recall that the KernelDataset has a dataframe which lists runs that need to be processed.
When there is a remote reference station, the rows of the dataframe are organized such that rows 0 and 1 correspond to a simultaneous data at Local and Reference stations respectively.
The same is true of rows 2 & 3, 4&5, and so on.

What goes wrong is that when loading the time series from mth5, the "paired" time series can have differing length. Even if the TS are off by a single sample (which is the only problem I have seen so far), this can mean that the STFT spectrograms can have one more or one fewer spectral estimate.

For example, here is a run-pair:
A time series corresponding to ORF10, run=006 is loaded with
starttime 2006-09-07 22:50:22, and endtime 2006-09-18 14:59:13.875000. This TS has 7377056 samples
A time series corresponding to ORG10, run=002 is loaded with identical start and end times, but the TS has 7377055 samples. We could call this a timing error -- technically it is, but seriously we are looking at +/- one sample of, in this case 8Hz data, over the course of 11 days, so it is trivial.

TS ORF10 006 (7377056,) 2006-09-07 22:50:22+00:00 2006-09-18 14:59:13.875000+00:00
TS ORG10 002 (7377055,) 2006-09-07 22:50:22+00:00 2006-09-18 14:59:13.875000+00:00

However, as it turns out, with default windows of length 128, and 96-sample advance (32-sample overlap) the ORF10 TS has exactly enough samples for 76844 spectral estimates, but the ORG10 TS is one sample shy, so it can only come up with 76843 spectral estimates. What goes wrong downstream is in effective_degrees_of_freedom_weights but what went wrong was that we started with non-uniformly sampled data.

The proposed workaround is to drop FCs that do not match one another. There are various ways to go about this.
A fairly straightforward solution would be to check the timestamps from local and RR are in agreement for each chunk right before the spectrograms are merged across all runs.

The request dataframes are attached
ORF09_request_dataframe.csv
ORG10_request_dataframe.csv
tfk_dataset.csv
tfk_dataset_time_periods.csv

kkappler added a commit that referenced this issue Sep 15, 2023
kkappler added a commit that referenced this issue Sep 20, 2023
- rename widescale_test.py to widescale.py
- modify associated imports
- reset tests to run broadly
@kkappler
Copy link
Collaborator Author

N.B. My previous commit message incorrectly referenced 289, should have been 293, sigh ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants