CFIA-NCFAD/nf-flu

The format is based on Keep a Changelog and this project adheres to Semantic Versioning.

[3.5.2] - 2024-10-18

This patch release fixes a few issues when running the pipeline.

Changes

fix: better handling of empty IRMA consensus sequences to avoid downstream analysis errors with VADR and BLASTN (peterk87/nf-flu #22)
fix: Clair3 versions.yml indentation issue (#87)
fix: removed capturing of cat and gzip versions in CAT_ILLUMINA_FASTQ process (#46) to avoid issue in some execution environments.
docs: update README.md

[3.5.1] - 2024-10-08

This patch release fixes an issue (#84) with long sample names (over 50 characters) causing VADR to fail. --noseqnamemax has been added to the default arguments for VADR to avoid this issue.

Changes

fix: Added --noseqnamemax to VADR default arguments to avoid issues with long sample names causing VADR to fail.
config: Output directory paths for IRMA and Bcftools consensus VADR annotation results were made more explicit and clear for the Illumina workflow.

[3.5.0] - 2024-10

This release expands the Illumina workflow by adding BLAST analysis, coverage plots, variant calling, and MultiQC reports. Modifications were made to existing modules, and new modules were added.

Changes

feat: Added variant calling, BLAST analysis, coverage plots, and MultiQC to the Illumina workflow to match the capabilities of the Nanopore workflow.
feat: Introduced a new module, Freebayes, for Illumina variant calling.
refactor: Rearranged the Illumina workflow to integrate the new changes and enhance compatibility.
update: Updated Bcftools filtering to add missing tags with fill-tags plugin and to set genotype with the setGT plugin based on major/minor_allele_fraction thresholds to influence consensus sequence output.
config: Changed process labels for IRMA and MultiQC modules to "long" to avoid timeouts for large short-read datasets.
enhance: Changed VADR staged file to use the FTP NCBI link to bypass certificate issues during Nextflow staging.
rollback: Reverted VADR containers to an earlier version to resolve potential issues on Singularity.
refactor: Rearranged modules_illumina.config for consistency with the updated workflow.
container: Switched to Biocontainers images for Clair3 v1.0.10. Issue with full alignment not working with the Biocontainers Docker/Apptainer images seems to have been resolved. This should also resolve an issue with CI where it would fail due to not being able to pull the official Clair3 image hbukal/clair3 from Docker Hub.
dev: Added tests/run-illumina-test.sh to make it more convenient to run the Illumina test locally with the same conditions as GitHub Actions CI.

[3.4.1] - 2024-08-02

This patch release fixes an issue (#75) with CAT_ILLUMINA_FASTQ where 1:N:0:. or 2:N:0:. may be mistakenly appended to Q-score lines beginning with @.

Changes

fix: updated Perl regex to better match Illumina FASTQ header lines starting with @. At least one space is expected in the header line. Match regex has been changed to /^@.* .*/ from /^@.*/ so hopefully Q-score lines should not be matched anymore.
dev: replaced nf-core/modules DUMPSOFTWAREVERSIONS with mqc_versions_table v0.2.0 Nim statically compiled binary to parse versions.yml and output necessary YAML with HTML content for display of process and tool versions table in MultiQC report. In theory DUMPSOFTWAREVERSIONS should be using the same Docker/Singularity image/Conda env as the MultiQC process, but DUMPSOFTWAREVERSIONS uses an older version of MultiQC and only uses it for the pyyaml library. mqc_versions_table was developed to handle this instead with a small 200KB binary instead.
dev: harmonize Docker/Singularity containers and Conda envs used across processes.
ci: use symlink mode for publishDir by default for test_nanopore.config and test_illumina.config to limit disk usage during CI.
Updated to Bioconda channel VADR v1.6.4 since the STAPH-B offered container with the flu model packaged is very large at 6GB vs 1.45GB for quay.io/biocontainers/vadr 1.6.4. However, it's now required that the flu model be downloaded and installed prior to VADR annotation with --vadr_model_targz. The default model tarball is the vadr-models-flu-1.6.3-2.tar.gz (38MB) from the NCBI FTP site uploaded to Zenodo.

[3.4.0] - 2024-07-24

This release adds Influenza virus sequence annotation using VADR.

Changes

Add VADR for Influenza consensus sequence annotation
Add table2asn for Feature Table conversion to Genbank
Add pre- and post-table2asn processing to workaround sequence ID length limits imposed by table2asn when converting from Feature Table format to Genbank

[3.3.10] - 2024-05-31

Fix MultiQC report generation due to module filter paths not working like in v1.12.

Software Updates

multiqc: 1.21 -> 1.22.1

Changes

test: add tests/run-nanopore-test.sh to conveniently run Nanopore test locally

[3.3.9] - 2024-05-30

Long overdue software updates release.

Software Updates

bcftools: 1.15.1 -> 1.20
blast: 2.14.0 -> 2.15.0
clair3: 1.0.5 -> 1.0.9
minimap2: 2.24 -> 2.28
mosdepth: 0.3.3 -> 0.3.8
multiqc: 1.12 -> 1.21
seqtk: 1.3 -> 1.4

Changes

dev: update GitHub Actions versions for CI and linting workflows

[3.3.8] - 2024-02-16

This bugfix patch release fixes an issue where a large number of ambiguous bases in the IRMA consensus can hinder reference selection (#67). This release also addresses an issue with using the Clair3 Biocontainers image resulting in incomplete variant calling results, affecting nf-flu executions with the docker or singularity profiles. The official Clair3 image is used instead. nf-flu executions using Conda and Mamba are unaffected.

Changes

Create majority consensus from IRMA allAlleles.txt files for BLASTN search
Add irma-alleles2fasta.v, statically compiled binary (irma-alleles2fasta) and Bash build script for parsing IRMA allAlleles.txt to output naive majority consensus (i.e. whatever the top non-dash allele is at each position) so that the sequence used for BLASTN search does not contain any ambiguous bases.
Updated nanopore.nf subworkflow to use IRMA majority consensus with no ambiguous bases for BLASTN search so that longer more contiguous matches are possible to aid in top reference sequence selection in some cases.
Updated parse_influenza_blast_results.py to better handle extraction of sample name and segment number from BLASTN query accession/version (qaccver).
Using official Clair3 Docker image and updating Clair3 to v1.0.5

[3.3.7] - 2024-02-09

This bugfix patch release fixes an issue with mislabeling of PB1 and PB2 segments for Influenza B virus results (#65).

[3.3.6] - 2023-11-01

Fixes

docs updated to show proper profile to run test profiles for Illumina and Nanopore locally (#52)
test_nanopore profile has been updated to run locally with the test samplesheet.csv updated with URLs to FASTQ files at CFIA-NCFAD/nf-test-datasets
read samplesheet CSV in parse_influenza_blast_results.py with all columns read as string rather than inferred (#54)
handle cloud storage paths and non-HTTP/FTP URLs in user samplesheets (#55)

[3.3.5] - 2023-09-15

Fixes

handling of empty IRMA amended_consensus/ when running a negative control or blank sequence (#47)

[3.3.4] - 2023-08-18

Fixes

Subtyping report summary sheet "1_Subtype Predictions" shows only N subtype results

[3.3.3] - 2023-08-16

This release fixes issues with subtype report generation script (parse_influenza_blast_results.py), primarily subtype predictions being N/A for samples where the top BLAST hits are user-specified sequences for the HA and NA segments.

Fixes

subtype prediction based off majority H/N prediction of all BLAST hits instead of just the top X matches (#40)
the top hit for H/N can also be a user-specified sequence without subtype information
top segment matches are now sorted by sample name, segment name and BLAST bitscore
output concatenated Nanopore FASTQ to ${outdir}/fastq by default (#43)
Handle ambiguous bases in reference sequences by having Clair3 not convert those positions to N and Bcftools produce a warning instead of an error (#42)

Changes

subtyping report results are now ordered in the same order as the input samplesheet.csv, that is the order of the samples in the report is the same as the order of the samples in the samplesheet.csv file

[3.3.2] - 2023-08-03

This patch release fixes an IBV subtype/genotype parsing issue when generating subtyping report using the new metadata format introduced in 3.3.0 (#32).

[3.3.1] - 2023-08-02

Fixes

Conda/Mamba env creation when using conda/mamba profile (#35)

[3.3.0] - 2023-07-11

This release migrates to more recently updated Influenza virus sequences since the last update for the NCBI Influenza DB FTP data was in 2020-10-13. By default, all Orthomyxoviridae virus sequences were parsed from the daily updated NCBI Viruses AllNucleotide.fa and AllNuclMetadata.csv.gz and uploaded to Figshare as Zstd compressed files. nf-flu no longer uses the influenza.fna.gz and genomeset.dat.gz files for Influenza sequences and metadata, respectively.

Fixes

More up-to-date Influenza sequences database used by default (#24)

[3.2.1] - 2023-07-07

Fixes

Empty BLAST results file parsing NoDataError (#27) (Thanks @MatFish for reporting this issue!)

[3.2.0] - 2023-06-22

Added

Influenza B virus support (#14)
Polars for faster parsing of BLAST results (#14)

Fixes

Irregular Illumina paired-end FASTQ files not producing IRMA assemblies (#20)

Updates

Updated README.md to include references and citations

[3.1.6] - 2023-05-31

This is a patch release for a minor change to use Biocontainers Docker and Singularity images for Clair3 to avoid hitting limits on pulls from Docker Hub and since Biocontainers images are half the size of hkubal/clair3 images.

Also, updated CI workflow and added issue template forms for feature request and questions.

[3.1.5] - 2023-05-30

Added

--use_mamba to enable using Mamba in place of Conda when using -profile conda for faster creation of Conda environments

Updates

Clair3: 0.1.10 -> 1.0.2

Fixes

user-specified Clair3 models not being found (#11)
Conda profile not enabling Conda (#15)
IRMA wanting too much /tmp space; IRMA's tmp dir will be output to the current working directory of the process job (#13) (Thanks @Codes1985 for reporting and solving this issue!)

[3.1.4] - 2023-05-17

This release addresses issue #11 adding a new option --clair3_user_variant_model <PATH TO CLAIR3 MODEL> to allow user to provide a Clair3 model not included with Clair3, e.g. a Rerio Clair3 model for r10 flowcells.

[3.1.3] - 2023-04-28

Patch release to fix issue to handle lowercase subtypes (e.g. h1n5) from NCBI Influenza DB.

[3.1.2] - 2022-09-01

Patch release to fix issue when user reference sequences FASTA specified, but Channel from file is not treated as a value. Code has been reverted to use file Nextflow function.

[3.1.1] - 2022-08-31

Patch release to fix issue when a user-specified sequences FASTA is provided and the FASTA is concatenated with the NCBI influenza sequences FASTA, but there is no new-line character at the end of the FASTA files. New line characters are added to the FASTA files to avoid incorrect concatenation.

[3.1.0] - 2022-05-31

The workflow's name has been changed from nf-iav-illumina to nf-flu and the official repo for nf-flu will be CFIA-NCFAD/nf-flu going forward.

Added back bin/fastq_dir_to_samplesheet.py for Illumina --input samplesheet creation from Illumina FASTQ reads directory
Fixed issue #12. Nanopore sample sheet can specify a mix of single FASTQ files and/or directories containing FASTQ files. Different reads with the same sample name will be merged prior to analysis. FASTQs can be GZIP compressed and have the extensions: .fastq, .fq, .fastq.gz, .fq.gz. Updated CI tests to test for this flexible sample sheet handling.
Switched to GitHub YAML form for bug report template from Markdown template.
CI tests now output results/pipeline_info/ and .nextflow.log as artifacts for easier debugging of issues.

[3.0.0] - 2022-05-24

This is a major release adding a Nanopore influenza sequence analysis subworkflow using IRMA for initial assembly and BLAST against NCBI Influenza DB sequences and optionally, user-specified sequences to identify the top reference sequence for each segment for each sample. A standard read mapping/variant calling analysis is performed: for each sample, Nanopore reads are mapped separately against each gene segment reference sequence using Minimap2; variant calling of read alignments is performed using Clair3; depth-masked consensus sequence is generated using Bcftools. Consensus sequences are BLAST searched against NCBI Influenza (and user-specified sequences) to generate a BLAST summary report and H/N subtyping report. MultiQC is used to summarize results into an interactive HTML report.

NOTE: Read mapping/variant calling analysis has not been ported to the Illumina sequence analysis subworkflow.

[2.0.1] - 2021-06-15

Patch release to fix issue #5; added check that IRMA amended_consensus/ exists before concatenation of consensus FASTA files.

[2.0.0] - 2021-06-10

⚠️ Major enhancements

Samplesheet input (--input samplesheet.csv) replaces path to reads (--reads "reads/*_R{1,2}_*.fastq.gz"). Sample sheet can be tab-delimited (TSV) or CSV and must have a header line and 3 columns (sample name, FASTQ path/URL to forward reads, FASTQ path/URL to reverse reads).
Pipeline has been re-implemented in Nextflow DSL2
All software containers are now exclusively obtained from Biocontainers
Updated minimum Nextflow version to v21.04.0 (see nextflow#572)
Add IRMA params
- irma_module: IRMA module (default: FLU-utr)
- keep_ref_deletions: set consensus sequence deletion by ambiguation (i.e. replace ref seq with Ns) (default: true)
Add BLAST subtyping params:
- pident_threshold: % identity threshold (default: 0.85)
- min_aln_length: min alignment length (default: 50)
Replace Azure Pipelines CI with GitHub Actions CI
add nextflow_schema.json and nf-core helper Jar file and Groovy scripts for params validation, printing help
Use nf-core modules where possible
Use nf-core module style for all processes
Added usage and output docs
Updated README

Parameters

Old parameter	New parameter
`--reads`	`--input`
	`--irma_module`
	`--keep_ref_deletions`
	`--pident_threshold`
	`--min_aln_length`
	`--ncbi_influenza_fasta`
	`--ncbi_influenza_metadata`
	`--slurm_queue_size`
	`--publish_dir_mode`
	`--validate_params`
	`--enable_conda`
	`--singularity_pull_docker_container`
	`--show_hidden_params`
	`--schema_ignore_params`

NB: Parameter has been updated if both old and new parameter information is present.
NB: Parameter has been added if just the new parameter information is present.
NB: Parameter has been removed if new parameter information isn't present.

Software dependencies

Note, since the pipeline is now using Nextflow DSL2, each process will be run with its own Biocontainer. This means that on occasion it is entirely possible for the pipeline to be using different versions of the same tool. However, the overall software dependency changes compared to the last release have been listed below for reference.

Dependency	Old version	New version
`blast`	2.9.0	2.10.0
`irma`	0.6.7	1.2.1
`python`	3.7.3	3.9.0

NB: Dependency has been updated if both old and new version information is present.
NB: Dependency has been added if just the new version information is present.
NB: Dependency has been removed if new version information isn't present.

Files

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

CFIA-NCFAD/nf-flu

[3.5.2] - 2024-10-18

Changes

[3.5.1] - 2024-10-08

Changes

[3.5.0] - 2024-10

Changes

[3.4.1] - 2024-08-02

Changes

[3.4.0] - 2024-07-24

Changes

[3.3.10] - 2024-05-31

Software Updates

Changes

[3.3.9] - 2024-05-30

Software Updates

Changes

[3.3.8] - 2024-02-16

Changes

[3.3.7] - 2024-02-09

[3.3.6] - 2023-11-01

Fixes

[3.3.5] - 2023-09-15

Fixes

[3.3.4] - 2023-08-18

Fixes

[3.3.3] - 2023-08-16

Fixes

Changes

[3.3.2] - 2023-08-03

[3.3.1] - 2023-08-02

Fixes

[3.3.0] - 2023-07-11

Fixes

[3.2.1] - 2023-07-07

Fixes

[3.2.0] - 2023-06-22

Added

Fixes

Updates

[3.1.6] - 2023-05-31

[3.1.5] - 2023-05-30

Added

Updates

Fixes

[3.1.4] - 2023-05-17

[3.1.3] - 2023-04-28

[3.1.2] - 2022-09-01

[3.1.1] - 2022-08-31

[3.1.0] - 2022-05-31

[3.0.0] - 2022-05-24

[2.0.1] - 2021-06-15

[2.0.0] - 2021-06-10

⚠️ Major enhancements

Parameters

Software dependencies