-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
2400 Mycobacterium phage genomes annotated, only 700 were successful #39
Comments
I can run Finally, the other suggestion is to download and run the sphae docker instance, which would ensure the correct versions are run. |
Thank you for your reply. I have retested and it was successful this time.
Additionally, I have tested two previous comments that failed but were also
successful. I plan to re annotate 2400 genomes and provide feedback if
there are any issues. thank you
Bhavya Papudeshi ***@***.***> 于2024年12月23日周一 23:09写道:
… I can run sphae annotate on the provided fasta file without a problem. I
didn't run into an error. Can you confirm that you are running the latest
version of sphae, which is v1.4.5. If you are, can you send me the
sphae.log generated in the output directory and the command run.
Finally, the other suggestion is to download and run the sphae docker
instance, which would ensure the correct versions are run.
—
Reply to this email directly, view it on GitHub
<#39 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/BN2EPQ47F64BKCOLJRJKUMT2HARSZAVCNFSM6AAAAABUC5X5ZGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNJZHA3TINBTHA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
That's great; let me know if you run into any other errors. |
I conducted a batch analysis of 2,492 phage genomes. After the program
finished running, it indicated that there were some errors. Upon checking,
I found 2,490 results, meaning 2 were missing. However, due to the large
volume, I have not yet identified which specific two were not completed.
Bhavya Papudeshi ***@***.***> 于2024年12月30日周一 10:21写道:
… That's great; let me know if you run into any other errors.
—
Reply to this email directly, view it on GitHub
<#39 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/BN2EPQ6ABIDJVBI3O3KJ6K32ICU23AVCNFSM6AAAAABUC5X5ZGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNRUHE2TSMZQGE>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Hi, I added a script to this repo now, here is the link https://github.com/linsalrob/sphae/blob/main/misc/merging_output.py. To run the script, copy the summary.txt files to a new directory and then run the above script on that directory. Here is an example, I look for the columns, "Failed during assembly", "No contigs assigned viral" and if the column lists "Yes" then the sample failed either because the assembly didn't generate any contigs, or the assembly did generate contigs and they were either too short or none of them were assigned as viral. Hope this helps, let me know if you run into any errors with this script or if you need the script fixed to add any other features. |
wxd.zip
<https://drive.google.com/file/d/1wMbz_tlmyXijNW5eYsooPe-3F8kN9Fz-/view?usp=drive_web>
Thank you very much. I wrote a script myself to extract information from
the summary.txt, but this only works for genomes that have been
successfully annotated. Genomes that have failed annotation do not generate
a summary.txt file. From this perspective, it is not possible to identify
failed annotations using this method.
At the same time, I also extract the positions of specific genes from
summary.functions (multiple genes can be extracted at once). These two
features might be quite useful, and I recommend packaging them into the
software. The purpose of the second feature I wrote was to extract
repressor genes to help determine whether the bacteriophage has a lysogenic
property, which is also very useful for guiding whether it can be used in
clinical treatment. I’m not sure if there is a better and more direct
method to help users make this judgment. Note: There is a function in my
code that extracts classifications from genome names, which I might be the
only one to use.
Attached are the original software code, the extraction results of the
summary from 2490 genomes I ran, and the execution commands.
Additionally, I tested de novo assembly on 5 genomes (which I had
successfully assembled using Unicycler), but all of them failed due to the
raw data being too large. I’m unsure how to send them to you for testing.
Currently, I am only using the annotation feature.
Bhavya Papudeshi ***@***.***> 于2025年1月2日周四 17:30写道:
… Hi,
I added a script now here,
https://github.com/linsalrob/sphae/blob/main/misc/merging_output.py.
I copy the summary.txt files to a new directory and then run the above
script on that directory. Here is an example, python
misc/merging_output.py <directory with summary files> <output file name>
I look for the columns, "Failed during assembly", "No contigs assigned
viral" and if the column lists "Yes" then the sample failed either because
the assembly didn't generate any contigs, or the assembly did generate
contigs and they were either too short or none of them were assigned as
viral.
Hope this helps, let me know if you run into any errors with this script
or if you need the script fixed to add any other features.
—
Reply to this email directly, view it on GitHub
<#39 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/BN2EPQZY77K25X277EMSQGL2IUBMXAVCNFSM6AAAAABUC5X5ZGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNRXGQ4DMOBZGI>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
That sounds good; that sounds like a useful script. Can you commit it to a miscellaneous directory in the repo? I will take a look and add it to the main repo accordingly. Regarding the lifestyle, we don't use repressor genes but use the presence of integrase as an indicator of a lysogenic lifestyle currently. Do you have a reference citation for using repressor genes to indicate this? It will be good to add it in as well. Regarding the 5 genomes that didn't assemble, as the raw data is really large, you can use programs to subsample the data using rasusa https://github.com/mbhall88/rasusa. I am happy to help as well, if you can send me a link to the data to [email protected]. |
I am an ICU doctor and not very familiar with operating GitHub, so in my
last email, I sent you the source files of the code I wrote.
Regarding the issue of phage lysogeny, I am also not very familiar with it
since I have only been studying phages for a year. This time, I analyzed
2,400 phage genomes from https://phagesdb.org/, and the results showed that
515 of them lack the *int* gene. However, 164 of these phages are
classified as "Temperate" according to the cluster information on
https://phagesdb.org/clusters/ (see the attachment for details).
That’s why I thought about extracting the *rep* gene for analysis. However,
there are 13 phages that lack both the *rep* gene and the *int* gene but
are still classified as "Temperate." I plan to systematically review the
literature soon to address this issue.
I also hope to get your help regarding this matter. Thank you.
Bhavya Papudeshi ***@***.***> 于2025年1月8日周三 07:29写道:
… That sounds good; that sounds like a useful script. Can you commit it to a
miscellaneous directory in the repo? I will take a look and add it to the
main repo accordingly.
Regarding the lifestyle, we don't use repressor genes but use the presence
of integrase as an indicator of a lysogenic lifestyle currently. Do you
have a reference citation for using repressor genes to indicate this? It
will be good to add it in as well.
Regarding the 5 genomes that didn't assemble, as the raw data is really
large, you can use programs to subsample the data using rasusa
https://github.com/mbhall88/rasusa. I am happy to help as well, if you
can send me a link to the data to ***@***.***
—
Reply to this email directly, view it on GitHub
<#39 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/BN2EPQ27J226OBF3VAPYAGT2JRPL7AVCNFSM6AAAAABUC5X5ZGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNZWGQYTSMBZGU>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Regarding the issue of the lack of integrase in the phage portion I
analyzed, I have found literature to explain:
https://pmc.ncbi.nlm.nih.gov/articles/PMC6282025 IF: 3.7 Q2 B2/
xiaodong wu ***@***.***> 于2025年1月12日周日 17:36写道:
… I am an ICU doctor and not very familiar with operating GitHub, so in my
last email, I sent you the source files of the code I wrote.
Regarding the issue of phage lysogeny, I am also not very familiar with it
since I have only been studying phages for a year. This time, I analyzed
2,400 phage genomes from https://phagesdb.org/, and the results showed
that 515 of them lack the *int* gene. However, 164 of these phages are
classified as "Temperate" according to the cluster information on
https://phagesdb.org/clusters/ (see the attachment for details).
That’s why I thought about extracting the *rep* gene for analysis.
However, there are 13 phages that lack both the *rep* gene and the *int*
gene but are still classified as "Temperate." I plan to systematically
review the literature soon to address this issue.
I also hope to get your help regarding this matter. Thank you.
Bhavya Papudeshi ***@***.***> 于2025年1月8日周三 07:29写道:
> That sounds good; that sounds like a useful script. Can you commit it to
> a miscellaneous directory in the repo? I will take a look and add it to the
> main repo accordingly.
>
> Regarding the lifestyle, we don't use repressor genes but use the
> presence of integrase as an indicator of a lysogenic lifestyle currently.
> Do you have a reference citation for using repressor genes to indicate
> this? It will be good to add it in as well.
>
> Regarding the 5 genomes that didn't assemble, as the raw data is really
> large, you can use programs to subsample the data using rasusa
> https://github.com/mbhall88/rasusa. I am happy to help as well, if you
> can send me a link to the data to ***@***.***
>
> —
> Reply to this email directly, view it on GitHub
> <#39 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/BN2EPQ27J226OBF3VAPYAGT2JRPL7AVCNFSM6AAAAABUC5X5ZGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNZWGQYTSMBZGU>
> .
> You are receiving this because you authored the thread.Message ID:
> ***@***.***>
>
|
I'm from https://phagesdb.org/ I downloaded 2400 genomes of mycobacterial bacteriophages and annotated them in bulk, but only over 700 were successfully annotated. I don't know where the problem lies.
Taking the genome in the attachment as an example, I have failed to annotate it individually or in bulk.
https://phagesdb.org/media/fastas/Abinghost.fasta
The text was updated successfully, but these errors were encountered: