Skip to content

ChemCharts is a module that allows you to plot chemical space in various figure types

License

Notifications You must be signed in to change notification settings

SMargreitter/ChemCharts

Repository files navigation

ChemCharts

hex_pic.png

Description

Chemcharts is an open source program designed to visualize the distribution of molecules in chemical space. It allows for data parsing and preparation and then generates various plots or movies. In order to generate graphical representations that are interpretable, ChemCharts reduces the high-dimensional fingerprint representation to 2D embedded coordinates. There are two ways to run it, either via a command-line entry point, or, for full control over all settings, by specifying a JSON configuration file. While ChemCharts is input agnostic to some extent, it is particularly useful as a post-processing step to compound generative modelling with REINVENT, which has a so-called "scaffold memory" as its final output.

Data input

ChemCharts accepts input from one or multiple datasets in csv format. The file needs to include columns with molecules in SMILES format and scores (when time resolution is desired, an epoch/step column should be included too). The scatter_boxplot_plot allows group colouring which requires an additional column defining the belonging of molecules. For the developing process as well as the notebook, datasets have been generated with REINVENT (open source), see Example Dataset.

What can ChemCharts do?

  1. In the first step ChemCharts transforms SMILES to fingerprints by using the RDKit fingerprint functions (the user can choose between standard, Morgan and MACCS fingerprints)

example_molecule.png

COc1cccc(-c2c3c(cc4ccccc24)C(=O)NC3=O)c1 ⮕ [1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, ...]
  1. Then, ChemCharts reduces the fingerprints with the UMAP dimensionality reduction algorithm.

  2. For further adjustments, the user can define a filter range and/ or the desired number of clusters (using KMeans).

  3. Binning of median scores is possible and will be visualized in a histogram plot (the amount of bins can thereby be defined by the user).

  4. When it comes to the plot generation step, the following types can be chosen:

    For interactive plots a view setting can be chosen, which allows for a pop-up browser window of the interactive plot.

    If multiple datasets are provided, plots will be merged in columns of three (except interactive plots).

  5. And last, ChemCharts can also make a movie for most plot types. This works by treating incremental epoch numbers as some sort of time axis, i.e. one can follow the agent sample over the course of the reinforcement learning.

hexagonal_contour_movie.gif

Requirements

  • You might want to put ChemCharts into a bespoke environment, e.g. conda:
conda create -n chemcharts "python==3.9.16" pip
conda activate chemcharts
  • Installation the package:

pip install chemcharts

  • Install local copy:

pip install .

  • Note, that you need ffmpeg installed on your computer in case you want to generate movies. On Ubuntu it will look something like:
sudo apt update
sudo apt install ffmpeg

Usage

  • Execution of command-line interface (CLI) / entry point:

chemcharts_cli -input_data data/scaffold_memory.csv -output_plot test.png

  • Execution of JSON interface / entry point:
chemcharts -conf examples/json/data_prep_plot.json
chemcharts -conf examples/json/simple_plot_test.json
  • Execution of unit tests:

python main_test.py

Instructions and tutorials

For detailed guides on how to use JSON for generating ChemCharts plots or movies, see the notebook templates Learning Demo Chemcharts Json Plot or Learning Demo Chemcharts Json Movie. For in-depth explanations of the ChemCharts entry points, please see: Learning Demo Chemcharts Entry Point.

Bugs and feature requests

Please don't hesitate to let us know (open an issue) if you find a bug, want to request a feature or to make a pull request.

Contributors

About

ChemCharts is a module that allows you to plot chemical space in various figure types

Resources

License

Stars

Watchers

Forks

Packages

No packages published