Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect file names #251

Open
tirohia opened this issue May 19, 2019 · 2 comments
Open

Incorrect file names #251

tirohia opened this issue May 19, 2019 · 2 comments

Comments

@tirohia
Copy link

tirohia commented May 19, 2019

I have a pipeline file with the following run command:

run {
        "%_*" * [trim] + "%_*" * [align] + "%_*" * [removeSupplementary]  + "%_*" * [removeDuplicates] +  branches * [pileUp]
}

I'm expecting this to produce an intermediate file called sample3_R1.fastq.gz.trim.align.removeSupplementary.removeDuplicates.bam.

It does not. It takes two fastq file as input, and it produces intermediate fastq files called
sample1_R1.fastq.gz.trim
sample1_R1.fastq.gz.2.trim

It will then take those two files into the align step, as I'm expecting it to, and producing a single bam file, sample1_R1.fastq.gz.align.bam. i.e it's dropping the stage name trim from the output filename. It has to be doing this in the align step, but I've stripped my align step back to:

align = {
    output.dir="/nobackup/data/intermediateFiles"
    SAMPLE=$branch
    exec """bwa mem -t $threads -R '@RG\\tID:$SAMPLE\\tSM:$SAMPLE\\tLB:$SAMPLE\\tPL:ILLUMINA' $bwaIndex $input1 $input2 | samtools sort -@12 -O BAM -o $output.bam""", "align"
}

I've tried forcing the output of the trim step to produce *.fastq files and the input tot he align stage to take *.fastq, it still doesn't work. I can echo $input1 and $input2 variables and it tells me that they are sample1_R1.fastq.gz.2.trim and sample1_R1.fastq.gz.2.trim. I have no idea why this step would truncate that last stage name, which is screwing with other things I'm trying to do further downstream.

@tirohia tirohia closed this as completed May 20, 2019
@tirohia tirohia reopened this May 20, 2019
@tirohia
Copy link
Author

tirohia commented May 20, 2019

To clarify, when I specify file extensions, it still doesn't work, but slightly differently. Enforcing *.gz extensions in the input for the trim stage, and *.fastq in the output, and *.fastq in the input for the alignment stage, results in bpipe dropping the gz from the filename.

@ssadedin
Copy link
Owner

I tried a minimal reproduction but it didn't seem to do what you originally stated. My example looks like:

trim = {
    exec """
        cat > $output1 < $input1.fastq.gz ; cat > $output2 < $input2.fastq.gz
    """
}

align = {
    output.dir="/tmp/intermediateFiles"
    SAMPLE=$branch
    exec """cat $input1 $input2 > $output.bam""", "align"
}

run {
    "%_*" * [trim] + 
    "%_*" * [align] 
}

It produced /tmp/intermediateFiles/A0028_1_1.fastq.gz.align.bam.

Did I misunderstand something about the question or is there something else making it not reproduce? (not completely impossible that this is a bug that has been fixed if you are not using a recent version).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants