Chemcharts
is an open source program designed to visualize the distribution of molecules in chemical space. It allows for data parsing and preparation and then generates various plots or movies. In order to generate graphical representations that are interpretable, ChemCharts
reduces the high-dimensional fingerprint representation to 2D embedded coordinates. There are two ways to run it, either via a command-line entry point, or, for full control over all settings, by specifying a JSON configuration file. While ChemCharts
is input agnostic to some extent, it is particularly useful as a post-processing step to compound generative modelling with REINVENT
, which has a so-called "scaffold memory" as its final output.
ChemCharts
accepts input from one or multiple datasets in csv format. The file needs to include columns with molecules in SMILES format and scores (when time resolution is desired, an epoch/step column should be included too). The scatter_boxplot_plot allows group colouring which requires an additional column defining the belonging of molecules. For the developing process as well as the notebook, datasets have been generated with REINVENT
(open source), see Example Dataset.
- In the first step
ChemCharts
transforms SMILES to fingerprints by using theRDKit
fingerprint functions (the user can choose between standard, Morgan and MACCS fingerprints)
COc1cccc(-c2c3c(cc4ccccc24)C(=O)NC3=O)c1 ⮕ [1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, ...]
-
Then,
ChemCharts
reduces the fingerprints with theUMAP
dimensionality reduction algorithm. -
For further adjustments, the user can define a filter range and/ or the desired number of clusters (using KMeans).
-
Binning of median scores is possible and will be visualized in a histogram plot (the amount of bins can thereby be defined by the user).
-
When it comes to the plot generation step, the following types can be chosen:
- scatter static plot
- scatter boxplot plot (colour grouping)
- scatter interactive plot (only static preview available)
- scatter interactive mol plot (only static preview available)
- contour plot
- histogram plot (scores)
- trisurf static plot
- trisurf interactive plot (only static preview available)
- hexagonal plot (default)
For interactive plots a view setting can be chosen, which allows for a pop-up browser window of the interactive plot.
If multiple datasets are provided, plots will be merged in columns of three (except interactive plots).
-
And last,
ChemCharts
can also make a movie for most plot types. This works by treating incremental epoch numbers as some sort of time axis, i.e. one can follow the agent sample over the course of the reinforcement learning.
- You might want to put
ChemCharts
into a bespoke environment, e.g.conda
:
conda create -n chemcharts "python==3.9.16" pip
conda activate chemcharts
- Installation the package:
pip install chemcharts
- Install local copy:
pip install .
- Note, that you need
ffmpeg
installed on your computer in case you want to generate movies. On Ubuntu it will look something like:
sudo apt update
sudo apt install ffmpeg
- Execution of command-line interface (CLI) / entry point:
chemcharts_cli -input_data data/scaffold_memory.csv -output_plot test.png
- Execution of
JSON
interface / entry point:
chemcharts -conf examples/json/data_prep_plot.json
chemcharts -conf examples/json/simple_plot_test.json
- Execution of unit tests:
python main_test.py
For detailed guides on how to use JSON
for generating ChemCharts
plots or movies, see the notebook templates Learning Demo Chemcharts Json Plot or Learning Demo Chemcharts Json Movie. For in-depth explanations of the ChemCharts
entry points, please see: Learning Demo Chemcharts Entry Point.
Please don't hesitate to let us know (open an issue) if you find a bug, want to request a feature or to make a pull request.
- Sophie Margreitter @smargreitter
- Christian Margreitter @cmargreitter
- Esben Jannik Bjerrum @EBjerrum