-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments on the boundries #2
Comments
Thank you for reaching out! The boundaries is indeed a little bit artificial. However, many of them are far from their supposed group for a reason. For the histine you mention, if you compare it to other charge AA, they are in fact quite different. In fact,
Therefore, that might be the reason why we see H closer to uncharge residue. That's also why having a representation in continuous space could be useful. For you technical questions,
Hope these answer your concerns. |
Thanks for your reply! I am confused by the
but you call it with 1500
I understand it as the size of the learned vectors per amino acid, then it shouldn't be larger than 20 since there are only 20 AAs. Is that correct? |
ah, now I remember. The size 1500 is when I am looking at building embedding for k-mer instead of single AA. When we generate the graph for single AA, the model should output an embedding of size 20 per the comment. |
Since you mentioned, I am also very interested in k-mer, how'd the experiment go for k-mer, then? |
Some experiment figure is here: https://github.com/WesleyyC/Amino-Acid-Embedding/tree/master/Figure so you should be able to open them in MATLAB, but we didn't continue this project, so the context part is not well maintained. |
Thanks! What's your main conclusion, then? I am doing something similar with nr (ftp://ftp.ncbi.nih.gov/blast/db/FASTA/nr.gz) database, I don't see any particular pattern among AAs. My thought is that if there is no strong pattern (if any at all), then it suggests that in nature, any AA is likely to be neighbours of any other AA, so at the character level, there isn't much difference among them. Would you agree with that? |
Sorry for the late response. Miss the notification for some reasons. For single AA embedding, we do see strong pattern regarding their biochemical property. In addition, we have computed a distance matrix using their embedding and compared it to the BLOSUM matrix. It seems that their are highly correlated. For k-mer AA embedding, we do see pattern in our graph but we are not sure if it's the artifact of the way we generate k-mer or it's truly a pattern. |
The boundaries on your front page look quite artificial to me, e.g. H is much closer to the uncharged AAs than to the charged AAs. Do you have any comment on why, please? I also have a few technical questions:
Thank you.
The text was updated successfully, but these errors were encountered: