This repository implements GFlowNets, generative flow networks for probabilistic modelling, on PyTorch. A design guideline behind this implementation is the separation of the logic of the GFlowNet agent and the environments on which the agent can be trained on. In other words, this implementation facilitates the extension with new environments for new applications. The configuration is handled via the use of Hydra.
Many wonderful scientists and developers have contributed to this repository: Alex Hernandez-Garcia, Nikita Saxena, Alexandra Volokhova, Michał Koziarski, Divya Sharma, Pierre Luc Carrier and Victor Schmidt. The GFlowNet implementation was initially part of github.com/InfluenceFunctional/ActiveLearningPipeline.
This repository has been used in at least the following research articles:
- Lahlou et al. A theory of continuous generative flow networks. ICML, 2023.
- Hernandez-Garcia, Saxena et al. Multi-fidelity active learning with GFlowNets. RealML at NeurIPS 2023.
- Mila AI4Science et al. Crystal-GFN: sampling crystals with desirable properties and constraints. AI4Mat at NeurIPS 2023 (spotlight).
- Volokhova, Koziarski et al. Towards equilibrium molecular conformation generation with GFlowNets. AI4Mat at NeurIPS 2023.
Quickstart: If you simply want to install everything, run setup_all.sh
.
- This project requires
python 3.10
andcuda 11.8
. - Setup is currently only supported on Ubuntu. It should also work on OSX, but you will need to handle the package dependencies.
- The recommend installation is as follows:
python3.10 -m venv ~/envs/gflownet # Initalize your virtual env.
source ~/envs/gflownet/bin/activate # Activate your environment.
./prereq_ubuntu.sh # Installs some packages required by dependencies.
./prereq_python.sh # Installs python packages with specific wheels.
./prereq_geometric.sh # OPTIONAL - for the molecule environment.
pip install .[all] # Install the remaining elements of this package.
Aside from the base packages, you can optionally install dev
tools using this tag, materials
dependencies using this tag, or molecules
packages using this tag. The simplest option is to use the all
tag, as above, which installs all dependencies.
To train a GFlowNet model with the default configuration, simply run
python main.py user.logdir.root=<path/to/log/files/>
Alternatively, you can create a user configuration file in config/user/<username>.yaml
specifying a logdir.root
and run
python main.py user=<username>
Using Hydra, you can easily specify any variable of the configuration in the command line. For example, to train GFlowNet with the trajectory balance loss, on the continuous torus (ctorus
) environment and the corresponding proxy:
python main.py gflownet=trajectorybalance env=ctorus proxy=torus
The above command will overwrite the env
and proxy
default configuration with the configuration files in config/env/ctorus.yaml
and config/proxy/torus.yaml
respectively.
Hydra configuration is hierarchical. For instance, a handy variable to change while debugging our code is to avoid logging to wandb. You can do this by setting logger.do.online=False
.
Currently, the implementation includes the following GFlowNet losses:
- Flow-matching (FM):
gflownet=flowmatch
- Trajectory balance (TB):
gflownet=trajectorybalance
- Detailed balance (DB):
gflownet=detailedbalance
- Forward-looking (FL):
gflownet=forwardlooking
The repository supports logging of train and evaluation metrics to wandb.ai, but it is disabled by default. In order to enable it, set the configuration variable logger.do.online
to True
.
Bibtex Format
@misc{hernandez-garcia2024,
author = {Hernandez-Garcia, Alex and Saxena, Nikita and Volokhova, Alexandra and Koziarski, Michał and Sharma, Divya and Viviano, Joseph D and Carrier, Pierre Luc and Schmidt, Victor},
title = {gflownet},
url = {https://github.com/alexhernandezgarcia/gflownet},
year = {2024},
}
Or CFF file