Skip to content

Commit

Permalink
update
Browse files Browse the repository at this point in the history
  • Loading branch information
Greg Landrum committed Jan 18, 2023
1 parent c136bd7 commit 4b99830
Show file tree
Hide file tree
Showing 2 changed files with 136 additions and 137 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,10 @@
"source": [
"Goal: construct a set of molecular pairs that can be used to compare similarity methods to each other.\n",
"\n",
"Update from http://rdkit.blogspot.com/2016/04/revisiting-similarity-comparison-set.html\n",
"The earlier version of this notebook (http://rdkit.blogspot.ch/2013/10/building-similarity-comparison-set-goal.html or https://github.com/greglandrum/rdkit_blog/blob/master/notebooks/Building%20A%20Similarity%20Comparison%20Set.ipynb)included a number of molecules that have counterions (from salts). Because this isn't really what we're interested in (and because the single-atom fragments that make up many salts triggered a bug in the RDKit's Morgan fingerprint implementation), I repeat the analysis here and restrict it to single-fragment molecules (those that do not include a `.` in the SMILES).\n",
"This works with ChEMBL30.\n",
"\n",
"The other big difference from the previous post is that an updated version of ChEMBL is used; this time it's ChEMBL21.\n",
"\n",
"I want to start with molecules that have some connection to each other, so I will pick pairs that have a baseline similarity: a Tanimoto similarity using count based Morgan0 fingerprints of at least 0.7. I also create a second set of somewhat more closely related molecules where the baseline similarity is 0.6 with a Morgan1 fingerprint. Both thresholds were selected empirically.\n",
"\n",
"**Note:** this notebook and the data it uses/generates can be found in the github repo: https://github.com/greglandrum/rdkit_blog"
"I want to start with molecules that have some connection to each other, so I will pick pairs that have a baseline similarity: a Tanimoto similarity using count based Morgan0 fingerprints of at least 0.65. I also create a second set of somewhat more closely related molecules where the baseline similarity is 0.55 with a Morgan1 fingerprint. \n",
"The thresholds were selected based on the analysis [in this blog post](https://greglandrum.github.io/rdkit-blog/posts/2021-05-21-similarity-search-thresholds.html)\n"
]
},
{
Expand Down
263 changes: 133 additions & 130 deletions notebooks/Colliding Bits III - expanded.ipynb

Large diffs are not rendered by default.

0 comments on commit 4b99830

Please sign in to comment.