Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vSNP step 2 not producing output #14

Open
lucyjanekelly opened this issue Feb 6, 2025 · 7 comments
Open

vSNP step 2 not producing output #14

lucyjanekelly opened this issue Feb 6, 2025 · 7 comments

Comments

@lucyjanekelly
Copy link

I am running vSNP step 2 and it is saying that the process completed and that it saved the VCF files in default files but I can't find any output. I'm not getting any errors either. Here is my script:

Set paths
VCF_DIR="/xdisk/lilianasalvador/lucykelly/kristina_pipeline/SNP/vsnp_step1_output" REFERENCE="${VCF_DIR}/NC_002945v4.fasta"

Check if the directory exists
if [ ! -d "𝑉𝐶𝐹𝐷𝐼𝑅"];𝑡ℎ𝑒𝑛𝑒𝑐ℎ𝑜"𝐸𝑟𝑟𝑜𝑟:𝐷𝑖𝑟𝑒𝑐𝑡𝑜𝑟𝑦
VCF_DIR does not exist." exit 1 fi

Check if the reference file exists
if [ ! -f "𝑅𝐸𝐹𝐸𝑅𝐸𝑁𝐶𝐸"];𝑡ℎ𝑒𝑛𝑒𝑐ℎ𝑜"𝐸𝑟𝑟𝑜𝑟:𝑅𝑒𝑓𝑒𝑟𝑒𝑛𝑐𝑒𝑓𝑖𝑙𝑒
REFERENCE does not exist." exit 1 fi

Process each VCF file
for VCF in "𝑉𝐶𝐹𝐷𝐼𝑅"/∗.𝑣𝑐𝑓;𝑑𝑜𝑖𝑓[−𝑓"
VCF" ]; then BASENAME=(𝑏𝑎𝑠𝑒𝑛𝑎𝑚𝑒"
VCF" .vcf) echo "Processing $BASENAME..."

echo "Running command: vSNP_step2.py -v \"$VCF\" -r \"$REFERENCE\""

vSNP_step2.py -v "$VCF" -r "$REFERENCE" 2> "$VCF_DIR/${BASENAME}_error.log"

if [ $? -eq 0 ]; then
    echo "Step 2 completed for $BASENAME"
else
    echo "Error processing $BASENAME"
fi

fi
done

echo "All VCF files processed. Check output in vSNP default directories."

@stuber
Copy link
Contributor

stuber commented Feb 6, 2025

Hi,

Can you change your working directory to the directory containing the VCF files and list with (ls -lh) and send me the output? I'm curious if everything is there as expected.

Thanks,

@lucyjanekelly
Copy link
Author

Hi,

Can you change your working directory to the directory containing the VCF files and list with (ls -lh) and send me the output? I'm curious if everything is there as expected.

Thanks,

There are 100 samples I'm running so here's an example for what I have for each sample:

drwxr-xr-x. 2 lucykelly student 2.0K Dec 11 17:00 SRR8600328
-rw-r--r--. 1 lucykelly lilianasalvador 195M Jan 18 00:04 SRR8600328_all.bam
-rw-r--r--. 1 lucykelly lilianasalvador 142M Jan 18 00:05 SRR8600328.bam
-rw-r--r--. 1 lucykelly lilianasalvador 14K Jan 18 00:05 SRR8600328.bam.bai
-rw-r--r--. 1 lucykelly lilianasalvador 0 Feb 5 17:00 SRR8600328_filtered_hapall_error.log
-rw-r--r--. 1 lucykelly lilianasalvador 82K Jan 18 00:09 SRR8600328_filtered_hapall.vcf
-rw-r--r--. 1 lucykelly lilianasalvador 0 Feb 5 17:00 SRR8600328_mapfix_hapall_error.log
-rw-r--r--. 1 lucykelly lilianasalvador 25M Jan 18 00:09 SRR8600328_mapfix_hapall.vcf
-rw-r--r--. 1 lucykelly lilianasalvador 74K Jan 18 00:09 SRR8600328_R1_unaligned.fastq
-rw-r--r--. 1 lucykelly lilianasalvador 90K Jan 18 00:09 SRR8600328_R2_unaligned.fastq
-rw-r--r--. 1 lucykelly lilianasalvador 560M Jan 18 00:04 SRR8600328.sam
-rw-r--r--. 1 lucykelly lilianasalvador 139M Jan 18 00:04 SRR8600328_sorted.bam
-rw-r--r--. 1 lucykelly lilianasalvador 14K Jan 18 00:04 SRR8600328_sorted.bam.bai
-rw-r--r--. 1 lucykelly lilianasalvador 777 Jan 18 00:09 SRR8600328_unaligned_contigs.fasta
-rw-r--r--. 1 lucykelly lilianasalvador 0 Feb 5 17:00 SRR8600328_unfiltered_hapall_error.log
-rw-r--r--. 1 lucykelly lilianasalvador 25M Jan 18 00:09 SRR8600328_unfiltered_hapall.vcf

@stuber
Copy link
Contributor

stuber commented Feb 6, 2025

When running step 2 start out with a working directory containing just the *_zc.vcf files. Step 2 starts with only these samples.

Then I'm just noticing this is on the vSNP repo. I recommend using vSNP at the vsnp3 repo. There is a conda version of vsnp3 available if interested in using the conda setup.

@lucyjanekelly
Copy link
Author

It looks like I only have a *_zc.vcf file for SRR8599992, do you think that means that step 1 didn't actually run correctly?

@stuber
Copy link
Contributor

stuber commented Feb 6, 2025 via email

@lucyjanekelly
Copy link
Author

Does this script look good? I apologize I'm very new to coding and bioinformatics

#!/bin/bash

#SBATCH --job-name=vsnp
#SBATCH --account=lilianasalvador
#SBATCH --partition=standard
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --mem-per-cpu=16gb
#SBATCH --time=99:00:00

Step 1: Organize fastq.gz files into directories named after their sample

for i in .fastq; do
n=$(echo "$i" | sed 's/_.//' | sed 's/..//')
echo "Moving $i to directory: $n"
mkdir -p "$n"
mv "$i" "$n/"
done

Step 2: Process each sample directory

for sample_dir in ./*/; do
cd "$sample_dir" || exit # Change into sample directory, exit if it fails
smpl=$(basename "$sample_dir") # Extract sample name
echo "Processing sample: $smpl"

vSNP_step1.py -r1 *_R1*gz -r2 *_R2*gz -r /xdisk/lilianasalvador/lucykelly/kristina_pipeline/SNP/NC_002945v4.fasta

cd ..  # Move back to the parent directory

done

@stuber
Copy link
Contributor

stuber commented Feb 6, 2025 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants