VCF Creation Issue #14

Briteguy · 2020-11-02T19:53:17Z

I have used the following commands to generate reference files (*.nhr, *.nin, and *.nsq files) for VCF creation for both GRCh37/38.

makeblastdb -in file.fasta -input_type fasta -dbtype nucl

However, when I run the Cluster analysis, I get the following error:

Done analyzing MEIs
Writing VCF file
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
Error: subscript contains invalid names
Execution halted

This occurs with either reference (37 or 38).

Any ideas how I could go about troubleshooting this step?

Thanks in advance

your-highness · 2021-02-12T15:13:42Z

Dear @Briteguy and @rebecca810 ,

Using the test data supplied in the ./validation/ subfolder scramble did not raise an error.

However, when invoking (using bioconda built 1.0.1) on a custom capture sequencing sample aligned with bwa mem to hs37d5 (numeric identifiers for autosomes) the Error "Error: subscript contains invalid names" was raised:

scramble.sh --out-name v3 --cluster-file v3_clusters.txt --ref hs37d5.fa --eval-meis --eval-dels

The following output files were created and seem to be fine:

v3_MEIs.txt
v3_PredictedDeletions.txt

The following log was produced:

Running sample: v3_clusters.txt
Running scramble with options:
blastRef : hs37d5.fa
clusterFile : v3_clusters.txt  
deletions : TRUE
indelScore : 80
INSTALL.DIR : /opt/conda/miniconda3/envs/hum-analysis_21-q1_mei-detection/share/scramble/bin
mei.refs : /opt/conda/miniconda3/envs/hum-analysis_21-q1_mei-detection/share/scramble/resources/MEI_consensus_seqs.fa
meis : TRUE
meiScore : 50
minDelLen : 50
nCluster : 5
outFilePrefix : v3
pctAlign : 90
polyAdist : 100
polyAFrac : 0.75
Useful Functions Loaded
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, basename, cbind, colnames,
    dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
    grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
    rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unsplit, which.max, which.min

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: ‘S4Vectors’

The following object is masked from ‘package:base’:

    expand.grid

Loading required package: IRanges
Loading required package: XVector

Attaching package: ‘Biostrings’

The following object is masked from ‘package:base’:

    strsplit

Done analyzing l1 
Done analyzing sva 
Done analyzing alu 
Done analyzing l1 
Done analyzing sva 
Done analyzing alu 
Sample had 14 MEI(s)
Done analyzing MEIs
214 clusters out of 497 were removed due to simple sequence
BLAST Database error: No alias or index file found for nucleotide database [/ramdisk/HUM/bwa_index/hs37d5.fa] in search path [/tmp/Rtmpy3iGE6::]
Number of alignments meeting thresholds: 283 
Number of best alignments: 0 
[1] "Two-End-Deletions: Working on contig 1"
[1] "Two-End-Deletions: Working on contig 10"
[1] "Two-End-Deletions: Working on contig 11"
[1] "Two-End-Deletions: Working on contig 12"
[1] "Two-End-Deletions: Working on contig 13"
[1] "Two-End-Deletions: Working on contig 14"
[1] "Two-End-Deletions: Working on contig 15"
[1] "Two-End-Deletions: Working on contig 16"
[1] "Two-End-Deletions: Working on contig 17"
[1] "Two-End-Deletions: Working on contig 18"
[1] "Two-End-Deletions: Working on contig 19"
[1] "Two-End-Deletions: Working on contig 2"
[1] "Two-End-Deletions: Working on contig 20"
[1] "Two-End-Deletions: Working on contig 21"
[1] "Two-End-Deletions: Working on contig 22"
[1] "Two-End-Deletions: Working on contig 3"
[1] "Two-End-Deletions: Working on contig 4"
[1] "Two-End-Deletions: Working on contig 5"
[1] "Two-End-Deletions: Working on contig 6"
[1] "Two-End-Deletions: Working on contig 7"
[1] "Two-End-Deletions: Working on contig 8"
[1] "Two-End-Deletions: Working on contig 9"
[1] "Two-End-Deletions: Working on contig GL000220.1"
[1] "Two-End-Deletions: Working on contig hs37d5"
[1] "Two-End-Deletions: Working on contig X"
[1] "finished one end dels"
Sample had 0 deletions
Done analyzing deletions
Warning message:
In predict.BLAST(bl, seq, BLAST_args = "-dust no") :
  BLAST did not return a match!
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
Error: subscript contains invalid names
Execution halted

What does the predit.BLAST error mean?

your-highness · 2021-02-12T15:33:41Z

I identified the problem in make.vcf.R. Pull request will follow.

fjmuzengyiheng · 2021-04-15T12:07:46Z

it seems that SCRAMBLE output VCF using the software BLAST.
so maybe you need to index your FASTA file using "makeblastdb" like this (at least it works for me) :

makeblastdb -in YOUR_REFERENCE.fasta -dbtype nucl -parse_seqids

my log:
Running sample: /cluster/home/zengyiheng/project/zyh-pipeline/12_scramble/28787.clusters.txt
Running scramble with options:
INSTALL.DIR : /app/cluster_analysis/bin
blastRef : /cluster/home/zengyiheng/project/zyh-pipeline/reference/hg19_22XY_chr/hg19_22XY.fasta
clusterFile : /cluster/home/zengyiheng/project/zyh-pipeline/12_scramble/28787.clusters.txt
deletions : TRUE
indelScore : 80
mei.refs : /app/cluster_analysis/resources/MEI_consensus_seqs.fa
meiScore : 50
meis : TRUE
minDelLen : 50
nCluster : 5
no.vcf : FALSE
outFilePrefix : /cluster/home/zengyiheng/project/zyh-pipeline/12_scramble/28787
pctAlign : 90
polyAFrac : 0.75
polyAdist : 100
Useful Functions Loaded
Done analyzing l1
Done analyzing sva
Done analyzing alu
Done analyzing l1
Done analyzing sva
Done analyzing alu
Sample had 17 MEI(s)
Done analyzing MEIs
2700 clusters out of 7801 were removed due to simple sequence
Number of alignments meeting thresholds: 1112103
Number of best alignments: 368
[1] "Two-End-Deletions: Working on contig chr1"
[1] "Two-End-Deletions: Working on contig chr10"
[1] "Two-End-Deletions: Working on contig chr11"
[1] "Two-End-Deletions: Working on contig chr12"
[1] "Two-End-Deletions: Working on contig chr13"
[1] "Two-End-Deletions: Working on contig chr14"
[1] "Two-End-Deletions: Working on contig chr15"
[1] "Two-End-Deletions: Working on contig chr16"
[1] "Two-End-Deletions: Working on contig chr17"
[1] "Two-End-Deletions: Working on contig chr18"
[1] "Two-End-Deletions: Working on contig chr19"
[1] "Two-End-Deletions: Working on contig chr2"
[1] "Two-End-Deletions: Working on contig chr20"
[1] "Two-End-Deletions: Working on contig chr21"
[1] "Two-End-Deletions: Working on contig chr22"
[1] "Two-End-Deletions: Working on contig chr3"
[1] "Two-End-Deletions: Working on contig chr4"
[1] "Two-End-Deletions: Working on contig chr5"
[1] "Two-End-Deletions: Working on contig chr6"
[1] "Two-End-Deletions: Working on contig chr7"
[1] "Two-End-Deletions: Working on contig chr8"
[1] "Two-End-Deletions: Working on contig chr9"
[1] "Two-End-Deletions: Working on contig chrX"
[1] "Two-End-Deletions: Working on contig chrY"
[1] "finished one end dels"
Sample had 170 deletions
Done analyzing deletions
Writing VCF file

your-highness mentioned this issue Feb 12, 2021

make.vcf.R: Fixed matching of chrom to FASTA contig ID & Speeding up VCF creation by reading reference sequence once #16

Merged

CarlosBorroto closed this as completed in #16 Apr 19, 2021

vinzenzmay mentioned this issue Sep 12, 2022

Empty vcf file after Error: subscript contains invalid names; result.txt_MEIs.txt shows MEI calls #26

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VCF Creation Issue #14

VCF Creation Issue #14

Briteguy commented Nov 2, 2020

your-highness commented Feb 12, 2021

your-highness commented Feb 12, 2021

fjmuzengyiheng commented Apr 15, 2021 •

edited

Loading

VCF Creation Issue #14

VCF Creation Issue #14

Comments

Briteguy commented Nov 2, 2020

your-highness commented Feb 12, 2021

your-highness commented Feb 12, 2021

fjmuzengyiheng commented Apr 15, 2021 • edited Loading

fjmuzengyiheng commented Apr 15, 2021 •

edited

Loading