Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VCF Creation Issue #14

Closed
Briteguy opened this issue Nov 2, 2020 · 3 comments · Fixed by #16
Closed

VCF Creation Issue #14

Briteguy opened this issue Nov 2, 2020 · 3 comments · Fixed by #16

Comments

@Briteguy
Copy link

Briteguy commented Nov 2, 2020

I have used the following commands to generate reference files (*.nhr, *.nin, and *.nsq files) for VCF creation for both GRCh37/38.

makeblastdb -in file.fasta -input_type fasta -dbtype nucl

However, when I run the Cluster analysis, I get the following error:

Done analyzing MEIs
Writing VCF file
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
Error: subscript contains invalid names
Execution halted

This occurs with either reference (37 or 38).

Any ideas how I could go about troubleshooting this step?

Thanks in advance

@your-highness
Copy link
Contributor

Dear @Briteguy and @rebecca810 ,

Using the test data supplied in the ./validation/ subfolder scramble did not raise an error.

However, when invoking (using bioconda built 1.0.1) on a custom capture sequencing sample aligned with bwa mem to hs37d5 (numeric identifiers for autosomes) the Error "Error: subscript contains invalid names" was raised:

scramble.sh --out-name v3 --cluster-file v3_clusters.txt --ref hs37d5.fa --eval-meis --eval-dels

The following output files were created and seem to be fine:

v3_MEIs.txt
v3_PredictedDeletions.txt

The following log was produced:

Running sample: v3_clusters.txt
Running scramble with options:
blastRef : hs37d5.fa
clusterFile : v3_clusters.txt  
deletions : TRUE
indelScore : 80
INSTALL.DIR : /opt/conda/miniconda3/envs/hum-analysis_21-q1_mei-detection/share/scramble/bin
mei.refs : /opt/conda/miniconda3/envs/hum-analysis_21-q1_mei-detection/share/scramble/resources/MEI_consensus_seqs.fa
meis : TRUE
meiScore : 50
minDelLen : 50
nCluster : 5
outFilePrefix : v3
pctAlign : 90
polyAdist : 100
polyAFrac : 0.75
Useful Functions Loaded
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, basename, cbind, colnames,
    dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
    grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
    rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unsplit, which.max, which.min

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: ‘S4Vectors’

The following object is masked from ‘package:base’:

    expand.grid

Loading required package: IRanges
Loading required package: XVector

Attaching package: ‘Biostrings’

The following object is masked from ‘package:base’:

    strsplit

Done analyzing l1 
Done analyzing sva 
Done analyzing alu 
Done analyzing l1 
Done analyzing sva 
Done analyzing alu 
Sample had 14 MEI(s)
Done analyzing MEIs
214 clusters out of 497 were removed due to simple sequence
BLAST Database error: No alias or index file found for nucleotide database [/ramdisk/HUM/bwa_index/hs37d5.fa] in search path [/tmp/Rtmpy3iGE6::]
Number of alignments meeting thresholds: 283 
Number of best alignments: 0 
[1] "Two-End-Deletions: Working on contig 1"
[1] "Two-End-Deletions: Working on contig 10"
[1] "Two-End-Deletions: Working on contig 11"
[1] "Two-End-Deletions: Working on contig 12"
[1] "Two-End-Deletions: Working on contig 13"
[1] "Two-End-Deletions: Working on contig 14"
[1] "Two-End-Deletions: Working on contig 15"
[1] "Two-End-Deletions: Working on contig 16"
[1] "Two-End-Deletions: Working on contig 17"
[1] "Two-End-Deletions: Working on contig 18"
[1] "Two-End-Deletions: Working on contig 19"
[1] "Two-End-Deletions: Working on contig 2"
[1] "Two-End-Deletions: Working on contig 20"
[1] "Two-End-Deletions: Working on contig 21"
[1] "Two-End-Deletions: Working on contig 22"
[1] "Two-End-Deletions: Working on contig 3"
[1] "Two-End-Deletions: Working on contig 4"
[1] "Two-End-Deletions: Working on contig 5"
[1] "Two-End-Deletions: Working on contig 6"
[1] "Two-End-Deletions: Working on contig 7"
[1] "Two-End-Deletions: Working on contig 8"
[1] "Two-End-Deletions: Working on contig 9"
[1] "Two-End-Deletions: Working on contig GL000220.1"
[1] "Two-End-Deletions: Working on contig hs37d5"
[1] "Two-End-Deletions: Working on contig X"
[1] "finished one end dels"
Sample had 0 deletions
Done analyzing deletions
Warning message:
In predict.BLAST(bl, seq, BLAST_args = "-dust no") :
  BLAST did not return a match!
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
Error: subscript contains invalid names
Execution halted

What does the predit.BLAST error mean?

@your-highness
Copy link
Contributor

I identified the problem in make.vcf.R. Pull request will follow.

@fjmuzengyiheng
Copy link

fjmuzengyiheng commented Apr 15, 2021

it seems that SCRAMBLE output VCF using the software BLAST.
so maybe you need to index your FASTA file using "makeblastdb" like this (at least it works for me) :

makeblastdb -in YOUR_REFERENCE.fasta -dbtype nucl -parse_seqids

my log:
Running sample: /cluster/home/zengyiheng/project/zyh-pipeline/12_scramble/28787.clusters.txt
Running scramble with options:
INSTALL.DIR : /app/cluster_analysis/bin
blastRef : /cluster/home/zengyiheng/project/zyh-pipeline/reference/hg19_22XY_chr/hg19_22XY.fasta
clusterFile : /cluster/home/zengyiheng/project/zyh-pipeline/12_scramble/28787.clusters.txt
deletions : TRUE
indelScore : 80
mei.refs : /app/cluster_analysis/resources/MEI_consensus_seqs.fa
meiScore : 50
meis : TRUE
minDelLen : 50
nCluster : 5
no.vcf : FALSE
outFilePrefix : /cluster/home/zengyiheng/project/zyh-pipeline/12_scramble/28787
pctAlign : 90
polyAFrac : 0.75
polyAdist : 100
Useful Functions Loaded
Done analyzing l1
Done analyzing sva
Done analyzing alu
Done analyzing l1
Done analyzing sva
Done analyzing alu
Sample had 17 MEI(s)
Done analyzing MEIs
2700 clusters out of 7801 were removed due to simple sequence
Number of alignments meeting thresholds: 1112103
Number of best alignments: 368
[1] "Two-End-Deletions: Working on contig chr1"
[1] "Two-End-Deletions: Working on contig chr10"
[1] "Two-End-Deletions: Working on contig chr11"
[1] "Two-End-Deletions: Working on contig chr12"
[1] "Two-End-Deletions: Working on contig chr13"
[1] "Two-End-Deletions: Working on contig chr14"
[1] "Two-End-Deletions: Working on contig chr15"
[1] "Two-End-Deletions: Working on contig chr16"
[1] "Two-End-Deletions: Working on contig chr17"
[1] "Two-End-Deletions: Working on contig chr18"
[1] "Two-End-Deletions: Working on contig chr19"
[1] "Two-End-Deletions: Working on contig chr2"
[1] "Two-End-Deletions: Working on contig chr20"
[1] "Two-End-Deletions: Working on contig chr21"
[1] "Two-End-Deletions: Working on contig chr22"
[1] "Two-End-Deletions: Working on contig chr3"
[1] "Two-End-Deletions: Working on contig chr4"
[1] "Two-End-Deletions: Working on contig chr5"
[1] "Two-End-Deletions: Working on contig chr6"
[1] "Two-End-Deletions: Working on contig chr7"
[1] "Two-End-Deletions: Working on contig chr8"
[1] "Two-End-Deletions: Working on contig chr9"
[1] "Two-End-Deletions: Working on contig chrX"
[1] "Two-End-Deletions: Working on contig chrY"
[1] "finished one end dels"
Sample had 170 deletions
Done analyzing deletions
Writing VCF file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants