PROTECT determines the clinical evidence applicable for a particular tumor sample based on all genomic events and signatures that are called by the Hartwig pipeline. PROTECT works exclusively on a clinical database generated by SERVE and uses a set of rules to match any type of evidence from SERVE against genomic events.
- How is evidence matched against genomic events?
- When is evidence considered on-label?
- What evidence is considered relevant for reporting?
- What output is produced by PROTECT?
- Version history and download links
Genomic events are categorized in six categories and evidence is matched for every category independently.
For small variants (SNVs and INDELs) determined by PURPLE the following matching is performed:
- If the evidence is defined on the exact variant (hotspot) then evidence is always considered applicable
- If the variant falls within the range in which the evidence is applicable then the evidence is applicable if the variant mutation type passes the filter defined as part of the SERVE evidence rule.
- If a variant affects a gene for which evidence is applicable on a gene level (activation, inactivation, or any mutation), then the evidence is considered applicable if the combination of variants affecting that gene have a high driver likelihood (> 80%).
Do note that germline and somatic variants are treated equally. It is not considered relevant for clinical evidence whether the variant is present in the germline already or has been acquired by the tumor somatically.
Evidence on amplifications and deletions is considered applicable in case a gene has been classified as amplified or deleted by PURPLE. In addition, a deletion is assumed to inactivate a gene and hence evidence on gene inactivation is considered applicable in case of a deletion.
When a gene has been homozygously disrupted according to LINX, evidence is applicable when it has been defined on a gene level with event either inactivation, deletion or any mutation.
For fusions that are deemed reportable according to LINX the following matching is performed:
- Evidence defined on a promiscuous gene level is always considered applicable if a fusion with that gene is reported either in 5' or 3' position.
- Evidence that is applicable on an exact fusion pair has to match with the actual fusion pair, and also has to match the (optional) exonic range defined as a restriction on the evidence.
For matching viral presence to evidence, the interpretation by Virus Interpreter is used. If virus interpreter classified at least one viral presence as "HPV" then any evidence for "HPV Positive" will match for this sample.
Evidence on signatures is matched based on the interpretation of the algorithm producing the signature. For example, when CHORD suggests a sample is HR deficient, any evidence for HR deficiency is considered applicable.
Evidence is considered on-label in case the evidence is defined for the tumor type of the sample that is evaluated, or for any tumor type that is a parent of the sample tumor type. For example, evidence on solid tumors is considered applicable for a colorectal cancer sample since a solid tumor is a parent of colorectal cancer.
PROTECT uses DOID exclusively for matching and expects every evidence to be defined for a single DOID entry.
Some additional notes:
- Since tumors could belong to multiple separate branches in the DOID tree, a tumor sample is allowed to have multiple DOIDs. In this case evidence is on-label in case one of the DOIDs matches with the evidence tumor type (or is child thereof).
- The tumor sample DOIDs are optional in the PROTECT algorithm. If they are not provided, all evidence is considered off-label (including evidence that is applicable pan-cancer).
After evidence has been collected based on the five distinct categories of genomic mutations and has been labeled as on-label or off-label, evidence is consolidated and evaluated for reporting. The following steps are executed:
- Evidence is consolidated on source level. If the exact same evidence for the same event is found across multiple sources, this evidence is consolidated in a single instance of applicable evidence. PROTECT has no preference for any source, though sorts sources alphabetically for consistency.
- Evidence is filtered for reporting in case they are based on genomic events that are not reportable.
- Evidence is filtered for reporting based on the maximum configured reporting level for the evidence.
- For CKB C evidence is reported, or up to B for predicted evidence
- For VICC and iClusion evidence is reported up to B level.
- For all other sources evidence is reported for A level only.
- For every event/treatment/direction combination only the highest level of evidence is reported:
- Off-label evidence is only reported in case the evidence level is higher than the highest on-label evidence.
- Clinical trials are only reported when they are on-label.
- There is some evidence that is never reported regardless of what event caused them or what their evidence level is. These are:
- Evidence based on an event affecting TP53.
- Evidence for non-specific chemotherapy, aspirin or steroids.
PROTECT produces a tsv with every applicable evidence after consolidation has been performed.
Field | Description | Example |
---|---|---|
event | The genomic event for which evidence is applicable | BRAF p.Val600Glu |
germline | Whether the genomic event is present in the germline or was acquired somatically | true/false |
reported | Whether the evidence passed all filters for reporting | true/false |
treatment | Name of the treatment (trial or drug(s)) | Vemurafenib |
onLabel | Whether the evidence is valid for the specific tumor for which the match has been made | true/false |
level | Evidence level (from A to D) | A |
direction | Whether the evidence is responsive or resistant | RESPONSIVE |
sources | A list of sources from where the evidence has been extracted | vicc_cgi,vicc_civic |
urls | A list of urls with additional information about the evidence | https://pubmed.ncbi.nlm.nih.gov |
- 1.4
- Output of virus interpreter is loaded by PROTECT and matched against viral evidence
- Evidence is determined for non-reported genomic events (amps, dels, variants, fusions, viral presence).
- Evidence on gene DELETION is considered applicable for homozygous disruptions
- Maximum reporting levels are configured per SERVE source (C for CKB, B for iClusion/VICC, A for all others)
- Ref genome version should be configured externally rather than be inferred by PROTECT
- 1.3
- Renamed actionable signatures to actionable characteristics
- Switch from BACHELOR germline variants to SAGE/PURPLE germline variants
- Renamed linx file from linx driver to linx driver catalog
- 1.2
- Treat partial amplifications and full amplifications identically
- Removing specific evidence filtering, because this is already applied into SERVE
- 1.1
- Split up database loading from running PROTECT
- 1.0
- Initial release