Skip to content

PatWalters/resources_2025

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 

Repository files navigation

Machine Learning in Drug Discovery Resources 2025

Datasets

You'll notice the conspicuous absence of two widely used datasets, MoleculeNet and the Therapeutic Data Commons (TDC) from this list. Both of these datasets are highly flawed and should not be used. For more on the reasons why, please consult this blog post.

OpenADMET seeks to proactively characterize the chemical space accessible to ADMET-associated proteins (“anti-targets”). By applying recent advances in experimental and computational techniques, a comprehensive open library of experimental and structural datasets will be generated. It's early days for OpenADMET, but knowing the folks involved, I'm highly optimistic.

AIRCHECK is a platform that provides access to a large collection of high-quality datasets for drug discovery and development. The datasets are curated from various sources and are available in a standardized format. The current focus appears to be on DNA encoded library (DEL) data.

Polaris aims is to improve the state of benchmarking so ML can have a greater impact on real-world drug discovery scenarios. To start, Polaris hopes to provide a single source of truth that aggregates and provides simple access to datasets & benchmarks.

PLINDER is an academic-industry collaboration to collect and organize protein-ligand interaction data. The effort is driven by VantAI, NVIDIA, the Computational Structural Biology group at the University of Basel & SIB Swiss Institute of Bioinformatics (co-organizers of CASP), and MIT. PLINDER aims to provide a gold standard dataset and evaluations to push the field of computational protein-ligand interactions prediction forward.

Blogs

Eric J Ma's Website Eric's blog provides an excellent introduction to the application of cutting edge informatics in drug discovery.

Oxford Protein Informatics Group (OPIG) This blog contains a lot of great [Bio|Chem]informatics content, chock full of code.

Charlie’s Substack Charlie Harris writes about applications of AI in drug discovery. Most recently, his posts have focused on efforts to reproduce AlphaFold3.

Mogan Thomas' Cheminformatics Blog This one is new, but based on the first post, it looks promising.

Jon Swain's Blog Jon Swain, a second generation Cheinformatics blogger, has great set of Jupyter notebooks demonstrating key concepts.

Practical Cheminformatics This is a blog where I post once a month or so. These posts typically contain code that demonstrates various aspects of cheminformatics; clustering, machine learning, data visualization, etc. I occasionally throw in posts containing opinions on things like AI and getting a job.

Is Life Worth Living A great blog from Iwatobipen (aka pen), whose posts are chock full of great code examples. Pen always seems to be up on the latest methods and posts interesting examples on a variety of topics ranging from quantum chemistry to machine learning.

The RDKit Blog Greg Landrum is the primary contributor to, and BDFL, of the RDKit. In addition to the latest and greatest features in the RDKit, Greg's posts also touch on a number of key issues in Cheminformatics, such as dealing with unbalanced datasets and the impact of fingerprint folding on similarity searching.

Models to molecules A new blog by Dries Van Rompaey that seems to be off to a great start.

Tutorials

Practical Cheminformatics Tutorials This is a collection of Jupyter notebooks that I put together to demonstrate various aspects of cheminformatics and machine learning. The notebooks demonstrate a range of topics from cheminformatics basics to more advanced machine learning. The tutorials all use open source software and can run on Google Colab without installing software locally. .

TeachOpenCADD A great set of tutorials from Andrea Volkamer's group that use Open Source software to teach Computer-Aided Drug Design concepts including molecular similarity, applications of machine learning, and pharmacophore analysis.

The RDKit Cookbook A terrific resource that provides "recipes" for a number of common tasks.

Vina Colab Tutorials A set of tutorials showing how to run Autodock Vina, and the associated protein and ligand setup utilities on Google Colab.

About

Machine Learning in Drug Discovery Resources 2024

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published