OpenADMET
The OpenADMET project seeks to proactively characterize the chemical space accessible to
ADMET-associated proteins (“anti-targets”). By applying recent advances in experimental and computational techniques, a
comprehensive open library of experimental and structural datasets will be generated.
AIRCHECK
AIRCHECK is a platform that provides access to a large collection of high-quality datasets for drug discovery and
development. The datasets are curated from various sources and are available in a standardized format. The current
focus appears to be on DNA encoded library (DEL) data.
Polaris
Polaris aims is to improve the state of benchmarking so ML can have a greater impact on real-world drug discovery
scenarios. To start, Polaris hopes to provide a single source of truth that aggregates and provides simple access to
datasets & benchmarks.
PLINDER
PLINDER is an academic-industry collaboration to address this, driven by VantAI, NVIDIA, the Computational Structural
Biology group at the University of Basel & SIB Swiss Institute of Bioinformatics - co-organizers of CASP, and MIT.
PLINDER aims to provide a gold standard dataset and evaluations to push the field of computational protein-ligand
interactions
prediction forward.
Charlie’s Substack
Charlie Harris writes about applications of AI in drug discovery.
Most recently, his posts have focused on efforts to reproduce AlphaFold3.
Practical Cheminformatics
This is a blog where I post once a month
or so. These posts typically contain code that demonstrates various aspects of cheminformatics; clustering, machine
learning, data visualization, etc. I occasionally throw in posts containing opinions on things like AI and getting a
job.
Is Life Worth Living
A great blog from Iwatobipen (aka pen), whose posts are
chock full of great code examples. Pen always seems to be up on the latest methods and posts interesting examples on a
variety of topics ranging from quantum chemistry to machine learning.
The RDKit Blog
Greg Landrum is the primary contributor to, and BDFL, of the RDKit. In
addition to the latest and greatest features in the RDKit, Greg's posts also touch on a number of key issues in
Cheminformatics, such as dealing with unbalanced datasets and the impact of fingerprint folding on similarity searching.
Practical Cheminformatics Tutorials
This is a collection of Jupyter notebooks that I put together to demonstrate various aspects of cheminformatics and
machine learning. The notebooks demonstrate a range of topics from cheminformatics basics to more advanced
machine learning. The tutorials all use open source software and can run on Google Colab without installing software
locally. .
TeachOpenCADD
A great set of tutorials from Andrea Volkamer's group that use Open Source software to teach Computer-Aided Drug Design concepts including molecular similarity, applications of machine learning, and pharmacophore analysis.
The RDKit Cookbook
A terrific resource that provides "recipes" for a number of common tasks.