In order to ensure that results are not being modified by changes to the tool - or that expected changes ARE being made - we need to validate/benchmark the results.
Validation datasets will be made available on cloudstor once they are finalised.
- Run analysis on sample data with reference version of tool
- Checkout reference version of tool (currently latest master branch)
- Build database for the events in the
DER_disturbance_analysis/validation/data
directory usingbuild_validation_databases.R
- Run tool using metadata from the event directory as config json
- Batch save results in the same directory as the data, entering
ref
as the file name - (optional) - Instead of steps 3 and 4, use
run_validation_analysis.R
to generate the files.
- Run analysis on sample data with version of tool to be tested
- Checkout target branch
- Build database for the events in the
DER_disturbance_analysis/validation/data
directory usingbuild_validation_databases.R
- Run tool using metadata from the event directory as config json. IMPORTANT - make sure any changes in configuration in the target branch (new fields/values etc.) are reflected in the test meta data.
- Batch save results in the same directory as the data, entering
test
as the file name - (optional) - Instead of steps 3 and 4, use
run_validation_analysis.R
to generate the files.
- Compare reference and test results using
validate_results.R
- Identify if results match
- Check any discrepencies against expected impact of test version of tool
Run full_validation.sh
in your terminal program of choice from the tool base directory.
Pass positional arguments of test and reference branch e.g.
./full_validation.sh branch_to_validate master
Error: Result must have length 36, not 0
orError in read.table(...) first five rows are empty: giving up
when running validation- This is likely caused by new or modified tool settings. Ensure that any new settings are reflected in the
test_meta_data.json
file for each event. Once this validation is complete ensure thatref_meta_data.json
files are updated and uploaded to cloudstor for future use.
- This is likely caused by new or modified tool settings. Ensure that any new settings are reflected in the
A representative set of validation data is required. In order to capture this we are currently uising the data from 2021-05-25 from QLD, 2020-01-31 from SA, and one event using Tesla data (2021-03-12)
For each event the following process will need to be followed:
- Build database from raw data for event
- Run normal tool analysis with appropriate settings for that event. Ensure that frequency data is included if necessary, and all category filters other than "raw" are checked
- Batch save the results under a memorable name
- Identify appropriate sample circuits based on the results in the circuit summary. Currently the following columns are being used to identify unique circuits:
response_category
,reconnection_compliance_status
,ufls_status
,compliance_status
,Standard_Version
- Using the site ID of the sample results filter the raw data to only use included circuits (including site_details, circuit_details and raw data files).
- Save this filtered raw data as a new set of files under
DER_disturbance_analysis/validation/data
in a sub-directory with a meaningful name representing the event. - Copy any required supporting files - the metadata file from the sample selection should be included, as should any necessary network frequency data.