The physical symbolic optimization ( physo
is a symbolic regression package that fully leverages physical units constraints. For more details see: [Tenachi et al 2023].
demo_light.mp4
The package has been tested on:
- Linux
- OSX (ARM & Intel)
Running physo
on Windows is not recommended.
To install the package it is recommended to first create a conda virtual environment:
conda create -n PhySO python=3.8
And activate it:
conda activate PhySO
physo
can be downloaded using git:
git clone https://github.com/WassimTenachi/PhySO
Or by direct downloading a zip of the repo:
https://github.com/WassimTenachi/PhySO/zipball/master
From the repository root:
Installing essential dependencies :
conda install --file requirements.txt
Installing optional dependencies (for advanced debugging in tree representation) :
conda install --file requirements_display1.txt
pip install -r requirements_display2.txt
In addition, latex should be installed on the system.
Installing physo
(from the repository root):
pip install -e .
python3
>>> import physo
This should result in physo
being successfully imported.
From the repository root:
python -m unittest discover -p "*UnitTest.py"
This should result in all tests being successfully passed (except for program_display_UnitTest tests if optional dependencies were not installed).
Symbolic regression (SR) consists in the inference of a free-form symbolic analytical function
Given a dataset
import numpy as np
z = np.random.uniform(-10, 10, 50)
v = np.random.uniform(-10, 10, 50)
X = np.stack((z, v), axis=0)
y = 1.234*9.807*z + 1.234*v**2
Where
Given the units input variables
import physo
expression, logs = physo.SR(X, y,
X_units = [ [1, 0, 0] , [1, -1, 0] ],
y_units = [2, -2, 1],
fixed_consts = [ 1. ],
fixed_consts_units = [ [0,0,0] ],
free_consts_units = [ [0, 0, 1] , [1, -2, 0] ],
)
(Allowing the use of a fixed constant
It should be noted that here the units vector are of size 3 (eg: [1, 0, 0]
) as in this example the variables have units dependent on length, time and mass only.
However, units vectors can be of any size
It should also be noted that free constants search starts around 1. by default. Therefore when using default hyperparameters, normalizing the data around an order of magnitude of 1 is strongly recommended.
Finally, please note that SR capabilities of physo
are heavily dependent on hyperparameters, it is therefore recommended to tune hyperparameters to your own specific problem for doing science.
Summary of currently available hyperparameters configurations:
Configuration | Notes |
---|---|
config0 | Light config for demo purposes. |
config1 | Tuned on a few physical cases. |
config2 | [work in progress] Good starting point for doing science. |
By default, config0
is used, however it is recommended to use the latest configuration currently available (config1
) as a starting point for doing science by specifying it:
expression, logs = physo.SR(X, y,
X_units = [ [1, 0, 0] , [1, -1, 0] ],
y_units = [2, -2, 1],
fixed_consts = [ 1. ],
fixed_consts_units = [ [0,0,0] ],
free_consts_units = [ [0, 0, 1] , [1, -2, 0] ],
run_config = physo.config.config1.config1
)
You can also specify the choosable symbolic operations for the construction of
expression, logs = physo.SR(X, y,
X_names = [ "z" , "v" ],
X_units = [ [1, 0, 0] , [1, -1, 0] ],
y_name = "E",
y_units = [2, -2, 1],
fixed_consts = [ 1. ],
fixed_consts_units = [ [0,0,0] ],
free_consts_names = [ "m" , "g" ],
free_consts_units = [ [0, 0, 1] , [1, -2, 0] ],
op_names = ["mul", "add", "sub", "div", "inv", "n2", "sqrt", "neg", "exp", "log", "sin", "cos"]
)
physo.SR
saves monitoring curves, the pareto front (complexity vs accuracy optimums) and the logs.
It also returns the best fitting expression found during the search which can be inspected in regular infix notation (eg. in ascii or latex) via:
>>> print(expression.get_infix_pretty(do_simplify=True))
⎛ 2⎞
m⋅⎝g⋅z + v ⎠
>>> print(expression.get_infix_latex(do_simplify=True))
'm \\left(g z + v^{2}\\right)'
Free constants can be inspected via:
>>> print(expression.free_const_values.cpu().detach().numpy())
array([9.80699996, 1.234 ])
physo.SR
also returns the log of the run from which one can inspect Pareto front expressions:
for i, prog in enumerate(pareto_front_expressions):
# Showing expression
print(prog.get_infix_pretty(do_simplify=True))
# Showing free constant
free_consts = prog.free_const_values.detach().cpu().numpy()
for j in range (len(free_consts)):
print("%s = %f"%(prog.library.free_const_names[j], free_consts[j]))
# Showing RMSE
print("RMSE = {:e}".format(pareto_front_rmse[i]))
print("-------------")
Returning:
2
m⋅v
g = 1.000000
m = 1.486251
RMSE = 6.510109e+01
-------------
g⋅m⋅z
g = 3.741130
m = 3.741130
RMSE = 5.696636e+01
-------------
⎛ 2⎞
m⋅⎝g⋅z + v ⎠
g = 9.807000
m = 1.234000
RMSE = 1.675142e-07
-------------
This demo can be found in demo/demo_quick_sr.ipynb
.
[Coming soon] In the meantime you can have a look at our demo folder ! :)
[Coming soon]
- Defining function token
If you want to add a custom choosable function to physo
, you can do so by adding you own Token to the list OPS_UNPROTECTED
in functions.py.
For example a token such as
OPS_UNPROTECTED = [
...
Token (name = "n5" , sympy_repr = "n5" , arity = 1 , complexity = 1 , var_type = 0, function = lambda x :torch.pow(x, 5) ),
...
]
Where:
name
(str) is the name of the token (used for selecting it in the config of a run).sympy_repr
(str) is the name of the token to use when producing the sympy / latex representation.arity
(int) is the number of arguments that the function takes.complexity
(float) is the value to consider for expression complexity considerations (1 by default).var_type
(int) is the type of token, it should always be 0 when defining functions like here.function
(callable) is the function, it should be written in pytorch to support auto-differentiation.
More details about Token attributes can be found in the documentation of the Token object : here
- Behavior in dimensional analysis
The newly added custom function should also be listed in its corresponding behavior (in the context of dimensional analysis) in the list of behaviors in OP_UNIT_BEHAVIORS_DICT
.
- Additional information
In addition, the custom function should be :
- Listed in
TRIGONOMETRIC_OP
if it is a trigonometric operation (eg. cos, sinh, arcos etc.) so it can be treated as such by priors if necessary. - Listed in
INVERSE_OP_DICT
along with its corresponding inverse operation$f^{-1}$ if it has one (eg. arcos for cos) so they can be treated as such by priors if necessary. - Listed in
OP_POWER_VALUE_DICT
along with its power value (float) if it is a power token (eg. 0.5 for sqrt) so it can be used in dimensional analysis.
- Protected version (optional)
If your custom function has a protected version ie. a version defined everywhere on OPS_PROTECTED
in functions.py with the similar attributes but with the protected version of the function for the function
attribute.
- Running the functions unit test (optional)
After adding a new function, running the functions unit test via python ./physo/physym/tests/functions_UnitTest.py
is highly recommended.
If you found the function you have added useful, don't hesitate to make a pull request so other people can use it too !
The main performance bottleneck of physo
is free constant optimization, therefore, performances are almost linearly dependent on the number of free constant optimization steps and on the number of trial expressions per epoch (ie. the batch size).
In addition, it should be noted that generating monitoring plots takes ~3s, therefore we suggest making monitoring plots every >10 epochs for low time / epoch cases.
Summary of expected performances with physo
:
Time / epoch | Batch size | # free const | free const opti steps |
Example | Device |
---|---|---|---|---|---|
~20s | 10k | 2 | 15 | eg: demo_damped_harmonic_oscillator | CPU: Mac M1 RAM: 16 Go |
~30s | 10k | 2 | 15 | eg: demo_damped_harmonic_oscillator | CPU: Intel W-2155 10c/20t RAM: 128 Go |
~250s | 10k | 2 | 15 | eg: demo_damped_harmonic_oscillator | GPU: Nvidia GV100 VRAM : 32 Go |
~3s | 1k | 2 | 15 | eg: demo_mechanical_energy | CPU: Mac M1 RAM: 16 Go |
~3s | 1k | 2 | 15 | eg: demo_mechanical_energy | CPU: Intel W-2155 10c/20t RAM: 128 Go |
~4s | 1k | 2 | 15 | eg: demo_mechanical_energy | GPU: Nvidia GV100 VRAM : 32 Go |
Please note that using a CPU typically results in higher performances than when using a GPU.
Uninstalling the package.
conda deactivate
conda env remove -n PhySO
@ARTICLE{2023arXiv230303192T,
author = {{Tenachi}, Wassim and {Ibata}, Rodrigo and {Diakogiannis}, Foivos I.},
title = "{Deep symbolic regression for physics guided by units constraints: toward the automated discovery of physical laws}",
journal = {arXiv e-prints},
keywords = {Astrophysics - Instrumentation and Methods for Astrophysics, Computer Science - Machine Learning, Physics - Computational Physics},
year = 2023,
month = mar,
eid = {arXiv:2303.03192},
pages = {arXiv:2303.03192},
doi = {10.48550/arXiv.2303.03192},
archivePrefix = {arXiv},
eprint = {2303.03192},
primaryClass = {astro-ph.IM},
adsurl = {https://ui.adsabs.harvard.edu/abs/2023arXiv230303192T},
adsnote = {Provided by the SAO/NASA Astrophysics Data System}
}