-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vSNP step 2 not producing output #14
Comments
Hi, Can you change your working directory to the directory containing the VCF files and list with (ls -lh) and send me the output? I'm curious if everything is there as expected. Thanks, |
There are 100 samples I'm running so here's an example for what I have for each sample: drwxr-xr-x. 2 lucykelly student 2.0K Dec 11 17:00 SRR8600328 |
When running step 2 start out with a working directory containing just the *_zc.vcf files. Step 2 starts with only these samples. Then I'm just noticing this is on the vSNP repo. I recommend using vSNP at the vsnp3 repo. There is a conda version of vsnp3 available if interested in using the conda setup. |
It looks like I only have a *_zc.vcf file for SRR8599992, do you think that means that step 1 didn't actually run correctly? |
Yeah, something went wrong. Good though that at least one completed, so you know the script is working. Are you running vsnp3?
The instructions for step 1 are for running just one sample. To run many at once you’ll want to package your FASTQs in individual folders per sample, then iterate the step 1 commands over each directory, controlling based on the resources available.
To look something like this:
Directory with FASTQs for many samples
```
for i in *.fastq*; do n=`echo $i | sed 's/_.*//' | sed 's/\..*//'`; echo "n is : $n"; mkdir -p $n; mv $i $n/; done
```
```
for sample_dir ./*/; do
cd "$sample_dir"
vSNP_step1.py -r1 *_R1*gz -r2 *_R2*gz -r /path/to/reference.fasta
cd ..
done
```
From: lucyjanekelly ***@***.***>
Date: Thursday, February 6, 2025 at 12:08 PM
To: USDA-VS/vSNP ***@***.***>
Cc: Stuber, Tod - MRP-APHIS ***@***.***>, Comment ***@***.***>
Subject: Re: [USDA-VS/vSNP] vSNP step 2 not producing output (Issue #14)
It looks like I only have a *_zc.vcf file for SRR8599992, do you think that means that step 1 didn't actually run correctly?
—
Reply to this email directly, view it on GitHub<#14 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABMMFMNU6HR2B7ZNU6LC5T32OOXJVAVCNFSM6AAAAABWUGEEBKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNBQG43DANRWHE>.
You are receiving this because you commented.Message ID: ***@***.***>
This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately.
|
Does this script look good? I apologize I'm very new to coding and bioinformatics #!/bin/bash #SBATCH --job-name=vsnp Step 1: Organize fastq.gz files into directories named after their samplefor i in .fastq; do Step 2: Process each sample directoryfor sample_dir in ./*/; do
done |
Looks good, just be careful starting too many at once. I would start with 2-3 samples then build up from there.
From: lucyjanekelly ***@***.***>
Date: Thursday, February 6, 2025 at 12:26 PM
To: USDA-VS/vSNP ***@***.***>
Cc: Stuber, Tod - MRP-APHIS ***@***.***>, Comment ***@***.***>
Subject: Re: [USDA-VS/vSNP] vSNP step 2 not producing output (Issue #14)
Does this script look good? I apologize I'm very new to coding and bioinformatics
#!/bin/bash
#SBATCH --job-name=vsnp
#SBATCH --account=lilianasalvador
#SBATCH --partition=standard
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --mem-per-cpu=16gb
#SBATCH --time=99:00:00
Step 1: Organize fastq.gz files into directories named after their sample
for i in .fastq; do
n=$(echo "$i" | sed 's/_.//' | sed 's/..//')
echo "Moving $i to directory: $n"
mkdir -p "$n"
mv "$i" "$n/"
done
Step 2: Process each sample directory
for sample_dir in ./*/; do
cd "$sample_dir" || exit # Change into sample directory, exit if it fails
smpl=$(basename "$sample_dir") # Extract sample name
echo "Processing sample: $smpl"
vSNP_step1.py -r1 *_R1*gz -r2 *_R2*gz -r /xdisk/lilianasalvador/lucykelly/kristina_pipeline/SNP/NC_002945v4.fasta
cd .. # Move back to the parent directory
done
—
Reply to this email directly, view it on GitHub<#14 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABMMFMJVOE3N2GYJRQGIMRD2OOZMLAVCNFSM6AAAAABWUGEEBKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNBQG44TOMRSHA>.
You are receiving this because you commented.Message ID: ***@***.***>
This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately.
|
I am running vSNP step 2 and it is saying that the process completed and that it saved the VCF files in default files but I can't find any output. I'm not getting any errors either. Here is my script:
Set paths
VCF_DIR="/xdisk/lilianasalvador/lucykelly/kristina_pipeline/SNP/vsnp_step1_output" REFERENCE="${VCF_DIR}/NC_002945v4.fasta"
Check if the directory exists
if [ ! -d "𝑉𝐶𝐹𝐷𝐼𝑅"];𝑡ℎ𝑒𝑛𝑒𝑐ℎ𝑜"𝐸𝑟𝑟𝑜𝑟:𝐷𝑖𝑟𝑒𝑐𝑡𝑜𝑟𝑦
VCF_DIR does not exist." exit 1 fi
Check if the reference file exists
if [ ! -f "𝑅𝐸𝐹𝐸𝑅𝐸𝑁𝐶𝐸"];𝑡ℎ𝑒𝑛𝑒𝑐ℎ𝑜"𝐸𝑟𝑟𝑜𝑟:𝑅𝑒𝑓𝑒𝑟𝑒𝑛𝑐𝑒𝑓𝑖𝑙𝑒
REFERENCE does not exist." exit 1 fi
Process each VCF file
for VCF in "𝑉𝐶𝐹𝐷𝐼𝑅"/∗.𝑣𝑐𝑓;𝑑𝑜𝑖𝑓[−𝑓"
VCF" ]; then BASENAME=(𝑏𝑎𝑠𝑒𝑛𝑎𝑚𝑒"
VCF" .vcf) echo "Processing $BASENAME..."
fi
done
echo "All VCF files processed. Check output in vSNP default directories."
The text was updated successfully, but these errors were encountered: