Skip to content

Latest commit

 

History

History

virus-interpreter

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

Virus Interpreter

Virus Interpreter is a VIRUSBreakend post-process algo that takes in the final VIRUSBreakend summary and adds annotation and interpretation and performs filtering for reporting. The algo writes a "virus.annotated.tsv" where every line is an annotated viral presence from the VIRUSBreakend summary file.

Annotation

Virus Interpreter picks the reference taxid that should be displayed in a report and performs a look-up in the taxonomy db to find the matching virus name.

Interpretation

Virus Interpreter allows the mapping of any species taxid to either "HPV", "EBV" ,"MCV", "HBV" or "HHV-8". Within the Hartwig pipeline this configuration is used to map all clinically relevant HPV species to "HPV" which in turn is used to label patients as "HPV positive" or "HPV negative".

Reporting

Every virus found by VIRUSBreakend is evaluated for reporting. For a virus to be reported, the following conditions need to be met:

  • The virus should be present in the whitelist
  • VIRUSBreakend must have found at least 1 integration site into the tumor DNA for "HPV", "MCV", "HBV" or "HHV-8"
    • For "EBV" next to the at least 1 integration site the following conditions should extend with:
      • Coverage of the virus should be greater than 90%
      • Mean depth of virus should be greater than the expected clonal depth
  • VIRUSBreakend has none integration sites into the tumor DNA for "HPV", "MCV", "HBV", "EBV" or "HHV-8" and the conditions should extend with:
    • Coverage of the virus should be greater than 90%
    • Mean depth of virus should be greater than the expected clonal depth
  • The VIRUSBreakend QC status must not be LOW_VIRAL_COVERAGE
  • The virus must not be blacklisted.

The blacklist is configurable and used in the Hartwig pipeline to filter any forms of HIV from getting reported. The whitelist is configurable and used in the Hartwig pipeline to filter which virus we want to report.

Output data

Virus Interpreter produces a tsv file where every line (record) is an entry from the VIRUSBreakend summary file. The following fields are stored per viral presence:

Field Description
taxid The reference taxid of the virus that is called with VIRUSBreakend
name The name of the virus, matching with the taxid
qcStatus The QC status as reported by VIRUSBreakend
integrations The number of detected integrations of this virus into the sample genome as reported by VIRUSBreakend
interpretation The output of the interpretation step of Virus Interpreter
percentageCovered The percentage of the viral reference sequence that has been covered in the tumor sample as reported by VIRUSBreakend
meanCoverage The mean coverage of the virus as reported by VIRUSBreakend //TODO: improve
expectedClonalMeanCoverage The expected coverage assuming the virus is clonally integrated once in the tumor DNA
reported A boolean indicating whether the detected viral presence is considered a driver

Version History and Download Links

  • [1.1] (coming)
    • New reporting strategy of viruses to report only clinical relevant viruses
  • 1.0
    • Initial release.