Skip to content

Commit

Permalink
Merge pull request #8 from mhpi/hydrodl_dev
Browse files Browse the repository at this point in the history
Debug of v1.0 hydrology models from HydroDL1.0
  • Loading branch information
leoglonz authored Nov 19, 2024
2 parents cf8b97e + bbb7060 commit 0a22d05
Show file tree
Hide file tree
Showing 29 changed files with 5,464 additions and 1,111 deletions.
6 changes: 5 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -163,4 +163,8 @@ cython_debug/


# VCS versioning
src/hydroDL2/_version.py
src/hydroDL2/_version.py

# Other
*sacsma*
validation/
4 changes: 2 additions & 2 deletions docs/adding_models_and_modules.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
## Guidlines for Adding Models and Modules to hydroDL2


We illustrate this with a hydrology model HBV v2, or `HBV_v2`.
We illustrate this with a hydrology model HBV 1.2.
- In the `models/` directory, create a folder for the model type if it does not already exist, using only lowercase.
- e.g., we should create `models/hbv/`, or confirm it exists.

- Within `models/<your_model>/`, create a `.py` file for the model taking the name that will be exposed to users, converting to lowercase.
- e.g., HBV v2 goes to `hbv_v2.py`.
- e.g., HBV 1.2 goes to `hbv1_2.py`.

- The model file should only contain one model class. If you want to place multiple variants of your model in the same file, make sure to add a variant specification to your config and indicate
with the `ver_name` flag when loading your model with `load_model()`. Otherwise, the first model listed in your file will be loaded by default.
Expand Down
249 changes: 249 additions & 0 deletions src/hydroDL2/bmi/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,249 @@
A dPLhydro model BMI will live in `./ngen/extern/dpl_model_package` when
operating in NextGen.

Every file for this model that is not in this directory and not standard with
NextGen will be flagged with the identifier `***dPLModel File***`. This way we
can distinguish which files need to be added to a dPLModel module.



---

### NextGen

When you first git clone https://github.com/NOAA-OWP/ngen.git, run `cd ngen/` and `git submodule update --init --recursive`.

#### To build BMI Python Adapter Test (for Pythonic BMI testing in NextGen)
Then run in `cd ~/ngen/`:
1. `git submodule update --init --recursive -- test/googletest`
2. `cmake -DCMAKE_BUILD_TYPE=Debug -DNGEN_WITH_TESTS:BOOL=ON -B cmake-build-debug -S .` (If UDUNITS2_library is not
found, `conda install udunits2` (confirm that your .conda/[env] dir has `/include`. If not, conda install failed and
should be reattempted.) and run cmake with the following options:
cmake -DCMAKE_BUILD_TYPE=Debug \
-DNGEN_WITH_TESTS:BOOL=ON \
-B cmake-build-debug \
-S . \
-DUDUNITS2_LIBRARY=/data/lgl5139/.conda/envs/mulhydrodl/lib/libudunits2.so \
-DUDUNITS2_INCLUDE_DIR=/data/lgl5139/.conda/envs/mulhydrodl/include)

3. `cmake --build cmake-build-dir --target test_bmi_python -- -j 4` or with options `| less -iR`.
4. run `cmake-build-debug/test/test_bmi_python`


---

### BMI

To test BMI with nextgen:
1. `cd ngen/extern/dpl_model_package`
2. `python run_bmi_unit_test.py`






---

### Steps from `NOAA-OWP/lstm` for running a custom BMI in ngen:

S.D. Peckham
September 12, 2022
November 15, 2022 (updated)

-------------------------------------------
Steps to run the LSTM Package in NextGen
-------------------------------------------
(1) Download the new LSTM Python package (branch) from:
https://github.com/NOAA-OWP/lstm/tree/lstm_package

(2) Copy the lstm_package folder into the ngen project tree/repo at:
ngen/extern/lstm_package

(3) Copy the files in the "ngen_files" subfolder into the corresponding
locations in the ngen project tree (e.g. into ngen/data/lstm, etc.)

(4) Open a terminal window and set PYTHONPATH (for this session).
For example:
(base) % export PYTHONPATH='/Users/peckhams/Dropbox/GitHub/ngen/extern/lstm_py'
(base) % echo $PYTHONPATH
/Users/peckhams/Dropbox/GitHub/ngen/extern/lstm_py

Note: You may also need to install pytorch, or include its path in PYTHONPATH.
Note: Virtual environments in NextGen don't work yet.
Note: As a result of setting PYTHONPATH, if you now type:
(base) % pip list
you will see "lstm" in the list of packages.
Note: You can check that PYTHONPATH is set correctly via:
(base) % python
>>> import lstm

(5) In the original version of LSTM, it was assumed that you would be running
LSTM from its own repo folder, and therefore some file paths are set relative
to that folder. For example, in the LSTM model config (YML) files, the
"train_cfg_file" entry is initially set to:
./trained_neuralhydrology_models/hourly_slope_mean_precip/config.yml
To run LSTM from the top level of the ngen repo folder, this must be changed to:
./extern/lstm_py/trained_neuralhydrology_models/hourly_slope_mean_precip/config.yml
However, LSTM also uses some Python pickle files (created during training) that
make the same assumption, and you can't edit these. One easy way to resolve this
filepath issue --- so that you can run LSTM from the ngen repo folder --- is to
cd to the ngen repo folder and then create a symbolic link with the command:

% ln -s ./extern/lstm_py/trained_neuralhydrology_models ./trained_neuralhydrology_models

(7) Run NextGen for the HUC01 catchment "cat-67" with the commands:
(base) % cd <ngen_repo_tree>
(base) % ./cmake_build/ngen ./data/lstm/spatial/catchment_data_cat67.geojson "cat-67" ./data/lstm/spatial/nexus_data_nex65.geojson "nex-65" ./data/lstm/rc_files/realization_config_lstm_cat67b.json

The output should look like:

#### NextGen Output ####
NGen Framework 0.1.0
Building Nexus collection
Building Catchment collection
Catchment topology is dendridic.
Running Models
Running timestep 0
Running timestep 100
Running timestep 200
Running timestep 300
Running timestep 400
Running timestep 500
Running timestep 600
Running timestep 700
Finished 720 timesteps.
ngen(54702,0x10ddc0dc0) malloc: *** error for object 0x7f95afd54740: pointer being freed was not allocated
ngen(54702,0x10ddc0dc0) malloc: *** set a breakpoint in malloc_error_break to debug
zsh: abort ./cmake_build/ngen ./data/catchment_data_cat67.geojson "cat-67" "nex-65"

#### Output files generated in ngen folder ####
cat-67.csv (34537 bytes)
nex-65_output.csv (24288 bytes)

Note: It appears that the "Pointer" error reported at the end is a NextGen problem.


(8) Run NextGen for 3 test catchments (in one CAMELS basin) with the commands:
(base) % cd <ngen_repo_tree>
(base) % ./cmake_build/ngen ./data/lstm/spatial/catchment_data_CAMELS-test.geojson "cat-67" ./data/lstm/spatial/nexus_data_CAMELS-test.geojson "nex-65" ./data/lstm/rc_files/realization_config_lstm_CAMELS-test.json

------------------------------------------------
Notes about realization config files for LSTM
------------------------------------------------

(1) Set "name" in the formulations block to "bmi_python".

(2) Set "python_type" to "lstm.bmi_lstm.bmi_LSTM",
which has the form: "package-name/module-name/class-name".
Note: The file: ngen/extern/lstm_py/lstm/__init__.py should be empty.

(3) Set "model_type_name" to "bmi_LSTM", which is the model class name.

(4) Set "init_config" as the complete path to an LSTM model config file.
For example:
"./data/lstm/yml_files/HUC01/cat-67.yml"
or
"./extern/lstm_py/bmi_config_files/cat-67.yml"
It can also contain a regular expression to match many files:
"./data/lstm/yml_files/HUCO01/{{id}}.yml"

(5) Set "main_output_variable" to
"land_surface_water__runoff_volume_flux",

(6) In the "variable_names_map" block, notice the line:
"streamflow_cms: "land_surface_water__runoff_volume_flux",
You do *not* need to add:
"water_input": "atmosphere_water__liquid_equivalent_precipitation_rate",

(7) In the "forcing" block, you no longer need to use a forcing file that
only contains data for the time range of interest (start_time to end_time).
For example, you can set "path" to:
"./data/forcing/HUC01-test/cat-67.csv"
or
"./data/forcing/CAMELS-test/{{id}}.csv"

(8) In the "time" block, make sure that "start_time" and "end_time" fall into
the range that is spanned by entries in the CSV forcing file.


--------------------------------------------------------------------
Steps to run the LSTM Package in NextGen for All HUC01 Catchments
--------------------------------------------------------------------

(1) From the Amazon S3 bucket, download the folder that has LSTM YML config files
for every HUC01 catchment: formulations-dev > HUC01 > LSTM
(This can be done with Cyberduck, as explained in another doc.)

(2) Copy this LSTM folder to:
ngen/extern/lstm_py/bmi_config_files/HUC01
or
ngen/data/lstm/yml_files/HUC01

(3) Download the files "catchment_data.geojson" and "nexus_data.geojson" from:
formulations-dev > HUC01
Copy them into: ngen/data/lstm/spatial with the new names:
catchment_data_HUC01.geojson
nexus_data_HUC01.geojson

NOTE: Each ID (e.g. "cat-27") that appears in:
ngen/data/lstm/realization_config_lstm.json
must have a corresponding entry in these two files.
(So older versions of these geojson files may not work.)

(4) Each YML file in: ngen/extern/lstm_py/bmi_config_files/HUC01
has the line:
train_cfg_file: ./trained_neuralhydrology_models/hourly_slope_mean_precip_temp/config.yml
Since we'll be running NextGen from the ngen folder, first make sure that the
trained_neuralhydrology_models folder contains this folder and file, then
create a symbolic link in the ngen folder with the command:
% ln -s ./extern/lstm_py/trained_neuralhydrology_models ./trained_neuralhydrology_models

(5) Create a realization config file (ngen/data/lstm/realization_config_lstm_HUC01.json) for
LSTM that uses the "{{id}}" regular expression to set "init_config", and that uses
the same forcing file (December 2015) for all catchments. Here it is:

{
"global": {
"formulations":
[
{
"name": "bmi_python",
"params": {
"python_type": "lstm.bmi_LSTM",
"model_type_name": "bmi_LSTM",
"init_config": "./extern/lstm_py/bmi_config_files/HUC01/{{id}}.yml",
"main_output_variable": "land_surface_water__runoff_volume_flux",
"uses_forcing_file": false,
"variables_names_map" : {
"streamflow_cms": "land_surface_water__runoff_volume_flux"},
"pytorch_model_path": "./data/lstm/sugar_creek_trained.pt",
"normalization_path": "./data/lstm/input_scaling.csv",
"initial_state_path": "./data/lstm/initial_states.csv",
"useGPU": false
}
}
],
"forcing": {
"path": "./data/forcing/HUC01-test/cat-67.csv"
}
},
"time": {
"start_time": "2015-12-01 00:00:00",
"end_time": "2015-12-30 23:00:00",
"output_interval": 3600
}
}

(6) Now type this command, where "" indicates all catchments:

(base) % cd <ngen-repo-folder>
(base) % ./cmake_build/ngen ./data/lstm/spatial/catchment_data_HUC01.geojson "" ./data/lstm/spatial/nexus_data_HUC01.geojson "" ./data/lstm/rc_files/realization_config_lstm_HUC01.json

--------------------------------
Note about model output files
--------------------------------
At this time, NextGen does not support setting a "output directory" for
model output. So all output files will be written to the same folder
where you ran NextGen from.
See: https://github.com/NOAA-OWP/ngen/issues/374
4 changes: 4 additions & 0 deletions src/hydroDL2/bmi/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Temporarily import dPLHydro_multimodel package (PMI) until BMI package is
# self-complete.

import hydro_multimodel.dPLHydro_multimodel as pmi
4 changes: 4 additions & 0 deletions src/hydroDL2/bmi/__main__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
from extern.dpl_model_package.run_dpl_model_bmi import execute

if __name__ == '__main__':
execute()
Loading

0 comments on commit 0a22d05

Please sign in to comment.