vSNP step 2 not producing output #14

lucyjanekelly · 2025-02-06T18:21:42Z

I am running vSNP step 2 and it is saying that the process completed and that it saved the VCF files in default files but I can't find any output. I'm not getting any errors either. Here is my script:

Set paths
VCF_DIR="/xdisk/lilianasalvador/lucykelly/kristina_pipeline/SNP/vsnp_step1_output" REFERENCE="${VCF_DIR}/NC_002945v4.fasta"

Check if the directory exists
if [ ! -d "𝑉𝐶𝐹𝐷𝐼𝑅"];𝑡ℎ𝑒𝑛𝑒𝑐ℎ𝑜"𝐸𝑟𝑟𝑜𝑟:𝐷𝑖𝑟𝑒𝑐𝑡𝑜𝑟𝑦
VCF_DIR does not exist." exit 1 fi

Check if the reference file exists
if [ ! -f "𝑅𝐸𝐹𝐸𝑅𝐸𝑁𝐶𝐸"];𝑡ℎ𝑒𝑛𝑒𝑐ℎ𝑜"𝐸𝑟𝑟𝑜𝑟:𝑅𝑒𝑓𝑒𝑟𝑒𝑛𝑐𝑒𝑓𝑖𝑙𝑒
REFERENCE does not exist." exit 1 fi

Process each VCF file
for VCF in "𝑉𝐶𝐹𝐷𝐼𝑅"/∗.𝑣𝑐𝑓;𝑑𝑜𝑖𝑓[−𝑓"
VCF" ]; then BASENAME=(𝑏𝑎𝑠𝑒𝑛𝑎𝑚𝑒"
VCF" .vcf) echo "Processing $BASENAME..."

echo "Running command: vSNP_step2.py -v \"$VCF\" -r \"$REFERENCE\""

vSNP_step2.py -v "$VCF" -r "$REFERENCE" 2> "$VCF_DIR/${BASENAME}_error.log"

if [ $? -eq 0 ]; then
    echo "Step 2 completed for $BASENAME"
else
    echo "Error processing $BASENAME"
fi

fi
done

echo "All VCF files processed. Check output in vSNP default directories."

The text was updated successfully, but these errors were encountered:

stuber · 2025-02-06T18:34:12Z

Hi,

Can you change your working directory to the directory containing the VCF files and list with (ls -lh) and send me the output? I'm curious if everything is there as expected.

Thanks,

lucyjanekelly · 2025-02-06T18:39:15Z

Hi,

Can you change your working directory to the directory containing the VCF files and list with (ls -lh) and send me the output? I'm curious if everything is there as expected.

Thanks,

There are 100 samples I'm running so here's an example for what I have for each sample:

drwxr-xr-x. 2 lucykelly student 2.0K Dec 11 17:00 SRR8600328
-rw-r--r--. 1 lucykelly lilianasalvador 195M Jan 18 00:04 SRR8600328_all.bam
-rw-r--r--. 1 lucykelly lilianasalvador 142M Jan 18 00:05 SRR8600328.bam
-rw-r--r--. 1 lucykelly lilianasalvador 14K Jan 18 00:05 SRR8600328.bam.bai
-rw-r--r--. 1 lucykelly lilianasalvador 0 Feb 5 17:00 SRR8600328_filtered_hapall_error.log
-rw-r--r--. 1 lucykelly lilianasalvador 82K Jan 18 00:09 SRR8600328_filtered_hapall.vcf
-rw-r--r--. 1 lucykelly lilianasalvador 0 Feb 5 17:00 SRR8600328_mapfix_hapall_error.log
-rw-r--r--. 1 lucykelly lilianasalvador 25M Jan 18 00:09 SRR8600328_mapfix_hapall.vcf
-rw-r--r--. 1 lucykelly lilianasalvador 74K Jan 18 00:09 SRR8600328_R1_unaligned.fastq
-rw-r--r--. 1 lucykelly lilianasalvador 90K Jan 18 00:09 SRR8600328_R2_unaligned.fastq
-rw-r--r--. 1 lucykelly lilianasalvador 560M Jan 18 00:04 SRR8600328.sam
-rw-r--r--. 1 lucykelly lilianasalvador 139M Jan 18 00:04 SRR8600328_sorted.bam
-rw-r--r--. 1 lucykelly lilianasalvador 14K Jan 18 00:04 SRR8600328_sorted.bam.bai
-rw-r--r--. 1 lucykelly lilianasalvador 777 Jan 18 00:09 SRR8600328_unaligned_contigs.fasta
-rw-r--r--. 1 lucykelly lilianasalvador 0 Feb 5 17:00 SRR8600328_unfiltered_hapall_error.log
-rw-r--r--. 1 lucykelly lilianasalvador 25M Jan 18 00:09 SRR8600328_unfiltered_hapall.vcf

stuber · 2025-02-06T18:44:28Z

When running step 2 start out with a working directory containing just the *_zc.vcf files. Step 2 starts with only these samples.

Then I'm just noticing this is on the vSNP repo. I recommend using vSNP at the vsnp3 repo. There is a conda version of vsnp3 available if interested in using the conda setup.

lucyjanekelly · 2025-02-06T19:07:49Z

It looks like I only have a *_zc.vcf file for SRR8599992, do you think that means that step 1 didn't actually run correctly?

stuber · 2025-02-06T19:17:32Z

Yeah, something went wrong. Good though that at least one completed, so you know the script is working. Are you running vsnp3? The instructions for step 1 are for running just one sample. To run many at once you’ll want to package your FASTQs in individual folders per sample, then iterate the step 1 commands over each directory, controlling based on the resources available. To look something like this: Directory with FASTQs for many samples ``` for i in *.fastq*; do n=`echo $i | sed 's/_.*//' | sed 's/\..*//'`; echo "n is : $n"; mkdir -p $n; mv $i $n/; done ``` ``` for sample_dir ./*/; do cd "$sample_dir" vSNP_step1.py -r1 *_R1*gz -r2 *_R2*gz -r /path/to/reference.fasta cd .. done ``` From: lucyjanekelly ***@***.***> Date: Thursday, February 6, 2025 at 12:08 PM To: USDA-VS/vSNP ***@***.***> Cc: Stuber, Tod - MRP-APHIS ***@***.***>, Comment ***@***.***> Subject: Re: [USDA-VS/vSNP] vSNP step 2 not producing output (Issue #14) It looks like I only have a *_zc.vcf file for SRR8599992, do you think that means that step 1 didn't actually run correctly? — Reply to this email directly, view it on GitHub<#14 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABMMFMNU6HR2B7ZNU6LC5T32OOXJVAVCNFSM6AAAAABWUGEEBKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNBQG43DANRWHE>. You are receiving this because you commented.Message ID: ***@***.***> This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately.

lucyjanekelly · 2025-02-06T19:25:35Z

Does this script look good? I apologize I'm very new to coding and bioinformatics

#!/bin/bash

#SBATCH --job-name=vsnp
#SBATCH --account=lilianasalvador
#SBATCH --partition=standard
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --mem-per-cpu=16gb
#SBATCH --time=99:00:00

Step 1: Organize fastq.gz files into directories named after their sample

for i in .fastq; do
n=$(echo "$i" | sed 's/_.//' | sed 's/..//')
echo "Moving $i to directory: $n"
mkdir -p "$n"
mv "$i" "$n/"
done

Step 2: Process each sample directory

for sample_dir in ./*/; do
cd "$sample_dir" || exit # Change into sample directory, exit if it fails
smpl=$(basename "$sample_dir") # Extract sample name
echo "Processing sample: $smpl"

vSNP_step1.py -r1 *_R1*gz -r2 *_R2*gz -r /xdisk/lilianasalvador/lucykelly/kristina_pipeline/SNP/NC_002945v4.fasta

cd ..  # Move back to the parent directory

done

stuber · 2025-02-06T19:29:16Z

Looks good, just be careful starting too many at once. I would start with 2-3 samples then build up from there. From: lucyjanekelly ***@***.***> Date: Thursday, February 6, 2025 at 12:26 PM To: USDA-VS/vSNP ***@***.***> Cc: Stuber, Tod - MRP-APHIS ***@***.***>, Comment ***@***.***> Subject: Re: [USDA-VS/vSNP] vSNP step 2 not producing output (Issue #14) Does this script look good? I apologize I'm very new to coding and bioinformatics #!/bin/bash #SBATCH --job-name=vsnp #SBATCH --account=lilianasalvador #SBATCH --partition=standard #SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --mem-per-cpu=16gb #SBATCH --time=99:00:00 Step 1: Organize fastq.gz files into directories named after their sample for i in .fastq; do n=$(echo "$i" | sed 's/_.//' | sed 's/..//') echo "Moving $i to directory: $n" mkdir -p "$n" mv "$i" "$n/" done Step 2: Process each sample directory for sample_dir in ./*/; do cd "$sample_dir" || exit # Change into sample directory, exit if it fails smpl=$(basename "$sample_dir") # Extract sample name echo "Processing sample: $smpl" vSNP_step1.py -r1 *_R1*gz -r2 *_R2*gz -r /xdisk/lilianasalvador/lucykelly/kristina_pipeline/SNP/NC_002945v4.fasta cd .. # Move back to the parent directory done — Reply to this email directly, view it on GitHub<#14 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABMMFMJVOE3N2GYJRQGIMRD2OOZMLAVCNFSM6AAAAABWUGEEBKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNBQG44TOMRSHA>. You are receiving this because you commented.Message ID: ***@***.***> This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vSNP step 2 not producing output #14

vSNP step 2 not producing output #14

lucyjanekelly commented Feb 6, 2025

stuber commented Feb 6, 2025

lucyjanekelly commented Feb 6, 2025

stuber commented Feb 6, 2025

lucyjanekelly commented Feb 6, 2025

stuber commented Feb 6, 2025 via email

lucyjanekelly commented Feb 6, 2025

stuber commented Feb 6, 2025 via email

vSNP step 2 not producing output #14

vSNP step 2 not producing output #14

Comments

lucyjanekelly commented Feb 6, 2025

stuber commented Feb 6, 2025

lucyjanekelly commented Feb 6, 2025

stuber commented Feb 6, 2025

lucyjanekelly commented Feb 6, 2025

stuber commented Feb 6, 2025 via email

lucyjanekelly commented Feb 6, 2025

Step 1: Organize fastq.gz files into directories named after their sample

Step 2: Process each sample directory

stuber commented Feb 6, 2025 via email