This is a repository for the SARS-CoV-2 Bioinformatics for Beginners Course
*May 2023 update note - Access to file download links may change in the first two weeks of May 2023 which would impact the input data for example commands. Please expect errors over this time period. *
SARS-CoV-2 variant lineage identification is key to pandemic tracking and enabling public health response. This course is an introduction to bioinformatics by applying skills used in SARS-CoV-2 genomic data analysis. This will be a distributed classrooms style course run across Africa; Latin America and the Caribbean; and Asia. This model was developed by H3ABioNet, see this publication for more info.
SARS-CoV-2 variant lineage identification is key to pandemic tracking and enabling public health response. This course is an introduction to bioinformatics by applying skills used in SARS-CoV-2 genomic data analysis. Bioinformatics skills are fundamental in management and assessment of viral sequences. This course will introduce you to processing data programmatically, the data formats used in viral sequencing, how to determine the variant lineage (Delta, Omicron etc.), and how to share data so that others around the world can benefit. These skills are the building blocks for scaling up analysis to pandemic response levels.
This course is making use of Google Colab - https://colab.research.google.com/, a free to use service.
Access to Colab is via a Google Account, which can be made for free.
Contact sessions will run twice a week, lasting for 4 hours per session. It will run between the 31st of October – 2nd of December 2022. There will be sessions in two time zones. Note, each session for Oceania and Asia; and Latin America and Africa; will run in the same block of time, but with regional time differences.
The course is aimed at postgraduate scientists, postdoctoral scientists, junior faculty members or clinicians/healthcare professionals based in the regions across Africa, Asia, and Latin America & the Caribbean. It does not require bioinformatics skills as a prerequisite.
The programme will cover the following core topics:
- Intro to Python Notebooks
- Intro to Unix/Linux & running commands
- Introduction to NGS Technologies employed in SARS-CoV-2 sequencing
- Data quality control
- Workflows for sequencing analysis
- Pangolin for lineage identification
- Exploring genomics data in a global context
- Apply command line tools for sequence data quality control
- List file formats commonly used in SARS-CoV-2 sequencing
- Use Pangolin to create viral lineages from sets of existing data
- List key metadata that must be included when uploading sequences to online repositories
- Describe broad principles in translation of analysis outputs to outbreak/epi/pandemic response
- Ariel Amadio, Universidad Nacional de Rafaela, IDICAL, INTA-CONICET, Argentina
- Blanca Taboada,CoViGen-Mex, Instituto de Biotecnología, Universidad Nacional Autónoma de México (UNAM); Consorcio Mexicano de Vigilancia Genómica (CoViGen-Mex)
- Carlo Lapid, Philippine Genome Centre, Philippines
- Elizabeth Batty, MORU Mahidol Oxford Tropical Medicine Research Unit, Thailand
- Idowu Olawoye, African Centre for Excellence for Genomics of Infectious Diseases (ACEGID), Nigeria
- Johan F Bernal, AGROSAVIA, Colombia,
- John Juma, International Livestock Research Institute (ILRI), Kenya, SANBI, University of Western Cape, South Africa
- Jorge Batista da Rocha, COG-Train, Wellcome Connecting Science, United Kingdom, Sydney Brenner Institute for Molecular Bioscience SBIMB, University of the Witwatersrand, South Africa
- Leigh Jackson, COG-Train, University of Exeter, United Kingdom
- Marcela Suarez Esquivel, Universidad Nacional, Costa Rica
- Paul Oluniyi, Chan Zuckerberg Biohub, University of California, San Francisco, USA
- Progress Dube, National Biotechnology Authority, Zimbabwe
- Una Ren, Special Pathogens Unit, Environmental Science and Research, New Zealand
- Varun Shammana, Central Research Laboratory, Kempegowda Institute of Medical Sciences
- Zahra Waheed, European Bioinformatics Institute, Wellcome Genome Campus, United Kingdom
Introduction Week
Introduction Notebook - Begin here
Video Playlist - Introduction Week
Module 1: Introduction to Notebooks & Unix command line
Module 1 Video Playlist (Parts 1 and 2)
Module 1 Part 1 and Part 2 Notebook Instructions
Bonus Videos for NGS technologies
Module 2: Data QC and Consensus sequences
Module 2 Video Playlist (Parts 1 and 2)
Module 2 Data QC and Consensus Notebook Instructions Parts 1,2,3
Module 3: Variant Lineage Identification
Module 3 Video - Variant Lineage Identification
Module 3 Variant Lineage Identification Notebook Instructions
Module 3 Part 2 Day Plan - Exercise
Module 4: Data sharing and interpretation
Module 4 Video Playlist
(Please watch Sections 1-2 for Day 1, and Sections 3-7 for Day 2)
Module 4 Data Sharing and Interpretation Notebook Instructions
Module 4 Part 2 Day Plan (Exercises for Day 2 are in the videos for Sections 5-7)
WCS LMS
COG-Train Online courses
Your digital mentor podcast
WCS courses and conferences
Any reuse of the course materials, data or code is encouraged with due acknowledgement.
This work is licensed under a Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).