Skip to content

Links to resources related to bioinformatics and data analysis.

Notifications You must be signed in to change notification settings

Hbiotools/helpful_links

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

65 Commits
 
 
 
 
 
 
 
 

Repository files navigation

helpful_links

Check Links GitHub Super-Linter Generate TOC

Links to resources related to bioinformatics and data analysis.

Table of Contents

ATAC-seq

  • nf-core/atacseq - ATAC-seq peak-calling, QC and differential analysis pipeline.

BAM and SAM

  • alimanfoo/pysamstats - Reports simple statistics for genome positions based on sequence alignments from a SAM or BAM file.
  • genome/bam-readcount - Generates low-level information about sequencing data at specific nucleotide positions in a BAM or CRAM file.
  • shiquan/bamdst - Generate BAM file statistics.

ChIP-seq

  • nf-core/chipseq - ChIP-seq peak-calling, QC and differential analysis pipeline.

Containers

  • Docker Hub - Find and share Docker container images.
  • Quay - Find and share container images.

Data sharing and management

  • datalad/datalad - Keep code, data, containers under control with git and git-annex.
  • Zenodo - Store research-related data, software, and reports and make them citable using a DOI.

Data visualization

Diagrams and flowcharts

Document conversion

  • jgm/pandoc - A library for converting from one markup format to another, and a command-line tool that uses this library.

EMBL-EBI

  • Tools & Data Resources - The European Bioinformatics Institute (EMBL-EBI) maintains the world’s most comprehensive range of freely available and up-to-date molecular data resources.

Genome annotation and characterization

  • evotools/nf-LO - A Nextflow workflow to generate liftOver files for any pair of genomes.
  • fmalmeida/bacannot - Generic but comprehensive pipeline for prokaryotic genome annotation and interrogation with interactive reports.
  • ncbi/pgap - NCBI Prokaryotic Genome Annotation Pipeline.
  • nextgenusfs/funannotate - Eukaryotic genome annotation pipeline.
  • jotech/gapseq - Informed prediction and analysis of bacterial metabolic pathways and genome-scale networks.
  • tseemann/mlst - Scan contig files against traditional PubMLST typing schemes.
  • tseemann/prokka - Rapid prokaryotic genome annotation.
  • WrightonLabCSU/DRAM - Distilled and Refined Annotation of Metabolism: A tool for the annotation and curation of function for microbial and viral genomes.

Genome assembly

  • ablab/spades - SPAdes genome assembler.
  • adigenova/wengan - An accurate and ultra-fast hybrid genome assembler.
  • alekseyzimin/masurca - The MaSuRCA (Maryland Super Read Cabog Assembler) genome assembly and analysis toolkit.
  • fenderglass/Flye - Fast and accurate de novo assembler for single molecule sequencing reads.
  • Kinggerm/GetOrganelle - A fast and versatile toolkit for accurate assembly of organelle genomes.
  • malonge/RagTag - Tools for fast and flexible genome assembly scaffolding and improvement.
  • rrwick/Bandage - A tool that allows users to interact with the assembly graphs made by de novo assemblers such as Velvet, SPAdes, and MEGAHIT.
  • rrwick/Trycycler - A tool for generating consensus long-read assemblies for bacterial genomes.
  • rrwick/Unicycler - A hybrid assembly pipeline for bacterial genomes.
  • tseemann/shovill - Assemble bacterial isolate genomes from Illumina paired-end reads.
  • vgl-hub/gfastats - Generate FASTA file summary statistics and manipulate FASTA files.
  • whatshap/whatshap - Read-based phasing of genomic variants.

GTF and GFF

  • agshumate/Liftoff - A tool that accurately maps annotations in GFF or GTF between assemblies of the same, or closely-related species.
  • gpertea/gffread - GFF/GTF utility providing format conversions, region filtering, FASTA sequence extraction and more.
  • NBISweden/AGAT - A suite of tools to handle gene annotations in any GTF/GFF format.

GWAS

  • brentp/vcfassoc - Perform genotype-phenotype-association tests on a VCF with logistic regression.
  • chrchang/plink-ng - A comprehensive update to the PLINK association analysis toolset.
  • MareesAT/GWA_tutorial - A comprehensive tutorial about GWAS and PRS.
  • xiaolei-lab/rMVP - A Memory-efficient, visualization-enhanced, and parallel-accelerated tool for GWAS.

Images

Link collections

Machine learning

Metagenomics

Methyl-seq

  • EpiDiverse - A collection of Nextflow pipelines for epigenome analysis.
  • nf-core/methylseq - Methylation (Bisulfite-Sequencing) analysis pipeline using Bismark or bwa-meth + MethylDackel.

Multiomics

  • bioFAM/MOFA2 - A factor analysis model that provides a general framework for the integration of multiomic data sets in an unsupervised fashion.
  • mixOmics - An R package that offers a wide range of multivariate methods for the exploration and integration of biological datasets with a particular focus on variable selection.

NCBI

  • All Resources - The National Center for Biotechnology Information (NCBI) advances science and health by providing access to biomedical and genomic information.
  • NCBI-Hackathons/EDirectCookbook - Examples illustrating the use of NCBI's Entrez Direct (EDirect), which provides access to the NCBI's suite of interconnected databases.

Phylogenetics and phylogenomics

Programming

Raw sequence data

  • gear-genomics/tracy - Basecalling, alignment, assembly and deconvolution of Sanger Chromatogram trace files.
  • huishenlab/biscuit - Perform alignment, DNA methylation and mutation calling, and allele specific methylation from bisulfite sequencing data.
  • kblin/ncbi-acc-download - An ultra-fast all-in-one FASTQ preprocessor (QC/adapters/trimming/filtering/splitting/merging).
  • kingfisher-download - Easier download/extract of FASTA/Q read data and metadata from the ENA, NCBI, AWS or GCP.
  • lh3/seqtk - Toolkit for processing sequences in FASTA/Q formats.
  • nf-core/fetchngs - Pipeline to fetch metadata and raw FASTQ files from public and private databases.
  • OpenGene/fastp - An ultra-fast all-in-one FASTQ preprocessor (QC/adapters/trimming/filtering/splitting/merging).

RNA-Seq

  • nf-core/nanoseq - Nanopore demultiplexing, QC and alignment pipeline.
  • nf-core/rnaseq - RNA sequencing analysis pipeline using STAR, RSEM, HISAT2 or Salmon with gene/isoform counts and extensive quality control.
  • STAR-Fusion/STAR-Fusion - Uses the STAR aligner to identify candidate fusion transcripts supported by Illumina reads.
  • suhrig/arriba - Fast and accurate gene fusion detection from RNA-Seq data.

Statistics

Tabular data

  • apache/arrow - A multi-language toolbox for accelerated data interchange and in-memory processing.
  • arq5x/bedtools2 - Intersect, merge, count, complement, and shuffle genomic intervals from multiple files in widely-used genomic file formats such as BAM, BED, GFF/GTF, VCF.
  • bedops/bedops - High-performance genomic feature operations.
  • BurntSushi/xsv - A fast CSV command line toolkit written in Rust.
  • harelba/q - Run SQL directly on delimited files and multi-file sqlite databases.
  • johnkerl/miller - Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON.
  • markfairbanks/tidytable - tidytable is a data frame manipulation library for users who need data.table speed but prefer tidyverse-like syntax.
  • OpenRefine - OpenRefine (previously Google Refine) is a powerful tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data.
  • pandas-dev/pandas - Flexible and powerful data analysis and manipulation library for Python.
  • pstaender/csv2md - Converts CSV data to Markdown tables.
  • ropensci/skimr - A frictionless, pipeable approach to dealing with summary statistics.
  • saulpw/visidata - A terminal spreadsheet multitool for discovering and arranging data.
  • sqlitebrowser/sqlitebrowser - A high quality, visual, open source tool to create, design, and edit database files compatible with SQLite.
  • wireservice/csvkit - A suite of utilities for converting to and working with CSV, the king of tabular file formats.

Utilities

VCF

  • BGI-shenzhen/VCF2Dis - A simple and efficient tool to calculate a p-distance matrix from VCF files.
  • brentp/vcfanno - Annotate a VCF with other VCFs/BEDs/tabixed files.
  • samtools/bcftools - A set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed.
  • vcflib/vcflib - C++ library and command-line tools for parsing and manipulating VCF files.
  • vcftools/vcftools - A set of tools written in Perl and C++ for working with VCF files.

Variant identification and analysis

  • ACEnglish/truvari - A toolkit for benchmarking, merging, and annotating structural variants.
  • barricklab/breseq - A computational pipeline for finding mutations relative to a reference sequence in short-read DNA resequencing data. It is intended for haploid microbial genomes.
  • CRG-CNAG/CalliNGS-NF - GATK RNA-Seq variant calling in Nextflow.
  • fritzsedlazeck/SURVIVOR - Toolset for SV simulation, comparison and filtering.
  • nf-core/sarek - Analysis pipeline to detect germline or somatic variants (pre-processing, variant calling and annotation) from WGS / targeted sequencing.
  • PoisonAlien/maftools - Summarize, analyze and visualize MAF files from TCGA or in-house studies.

Vim

Workflow development and workflows

  • maxplanck-ie/snakepipes - Customizable workflows based on snakemake and python for the analysis of NGS data.
  • nextflow-io/nextflow - A bioinformatics workflow manager that enables the development of portable and reproducible workflows.
  • nfcore - A community effort to collect a curated set of analysis pipelines built using Nextflow.
  • ploomber/ploomber - A framework to build collaborative and modular pipelines.
  • ropensci/targets - A Make-like pipeline tool for statistics and data science in R.
  • Snakemake workflow catalog - A comprehensive catalog of standards compliant, public, Snakemake workflows.
  • snakemake/snakemake-wrappers - A collection of reusable wrappers for adding popular command-line tools to Snakemake workflows.
  • snakemake/snakemake - A tool to create reproducible and scalable data analyses.

About

Links to resources related to bioinformatics and data analysis.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published