Links to resources related to bioinformatics and data analysis.
Table of Contents
- ATAC-seq
- BAM and SAM
- ChIP-seq
- Containers
- Data sharing and management
- Data visualization
- Diagrams and flowcharts
- Document conversion
- EMBL-EBI
- Genome annotation and characterization
- Genome assembly
- GTF and GFF
- GWAS
- Images
- Link collections
- Machine learning
- Metagenomics
- Methyl-seq
- Multiomics
- NCBI
- Phylogenetics and phylogenomics
- Programming
- Raw sequence data
- RNA-Seq
- Statistics
- Tabular data
- Utilities
- VCF
- Variant identification and analysis
- Vim
- Workflow development and workflows
- nf-core/atacseq - ATAC-seq peak-calling, QC and differential analysis pipeline.
- alimanfoo/pysamstats - Reports simple statistics for genome positions based on sequence alignments from a SAM or BAM file.
- genome/bam-readcount - Generates low-level information about sequencing data at specific nucleotide positions in a BAM or CRAM file.
- shiquan/bamdst - Generate BAM file statistics.
- nf-core/chipseq - ChIP-seq peak-calling, QC and differential analysis pipeline.
- Docker Hub - Find and share Docker container images.
- Quay - Find and share container images.
- datalad/datalad - Keep code, data, containers under control with git and git-annex.
- Zenodo - Store research-related data, software, and reports and make them citable using a DOI.
- arvestad/alv - A console-based alignment viewer.
- ChartsCSS/charts.css - Open source CSS framework for data visualization.
- cytoscape/cytoscape.js - Graph theory (network) library for visualisation and analysis.
- dreamRs/esquisse - RStudio add-in to make plots interactively with ggplot2.
- gamcil/clinker - Gene cluster comparison figure generator.
- GenomeVIS USASK - A variety of browser-based visualization tools to support genomics research.
- hms-dbmi/UpSetR - An R implementation of the UpSet set visualization technique.
- krassowski/complex-upset - A library for creating complex UpSet plots with ggplot2 geoms.
- metagenlab/mummer2circos - Circular bacterial genome plots based on BLAST or NUCMER/PROMER alignments.
- mw201608/SuperExactTest - Statistical testing and visualization of intersections among multiple sets.
- rich-iannone/DiagrammeR - Graph and network visualization using tabular data in R.
- ryanlayer/samplot - Plot structural variant signals from many BAMs and CRAMs.
- slowkow/ggrepel - Provides geoms for ggplot2 to repel overlapping text labels.
- taiyun/corrplot - A visual exploratory tool on correlation matrix.
- thackl/gggenomes - A versatile graphics package for comparative genomics.
- thomasp85/patchwork - The goal of patchwork is to make it ridiculously simple to combine separate ggplots into the same graphic.
- wilkox/gggenes - Draw gene arrow maps in ggplot2.
- jgraph/drawio - A configurable diagramming application.
- mermaid-js/mermaid - Generation of diagrams and flowcharts from text.
- jgm/pandoc - A library for converting from one markup format to another, and a command-line tool that uses this library.
- Tools & Data Resources - The European Bioinformatics Institute (EMBL-EBI) maintains the world’s most comprehensive range of freely available and up-to-date molecular data resources.
- evotools/nf-LO - A Nextflow workflow to generate liftOver files for any pair of genomes.
- fmalmeida/bacannot - Generic but comprehensive pipeline for prokaryotic genome annotation and interrogation with interactive reports.
- ncbi/pgap - NCBI Prokaryotic Genome Annotation Pipeline.
- nextgenusfs/funannotate - Eukaryotic genome annotation pipeline.
- jotech/gapseq - Informed prediction and analysis of bacterial metabolic pathways and genome-scale networks.
- tseemann/mlst - Scan contig files against traditional PubMLST typing schemes.
- tseemann/prokka - Rapid prokaryotic genome annotation.
- WrightonLabCSU/DRAM - Distilled and Refined Annotation of Metabolism: A tool for the annotation and curation of function for microbial and viral genomes.
- ablab/spades - SPAdes genome assembler.
- adigenova/wengan - An accurate and ultra-fast hybrid genome assembler.
- alekseyzimin/masurca - The MaSuRCA (Maryland Super Read Cabog Assembler) genome assembly and analysis toolkit.
- fenderglass/Flye - Fast and accurate de novo assembler for single molecule sequencing reads.
- Kinggerm/GetOrganelle - A fast and versatile toolkit for accurate assembly of organelle genomes.
- malonge/RagTag - Tools for fast and flexible genome assembly scaffolding and improvement.
- rrwick/Bandage - A tool that allows users to interact with the assembly graphs made by de novo assemblers such as Velvet, SPAdes, and MEGAHIT.
- rrwick/Trycycler - A tool for generating consensus long-read assemblies for bacterial genomes.
- rrwick/Unicycler - A hybrid assembly pipeline for bacterial genomes.
- tseemann/shovill - Assemble bacterial isolate genomes from Illumina paired-end reads.
- vgl-hub/gfastats - Generate FASTA file summary statistics and manipulate FASTA files.
- whatshap/whatshap - Read-based phasing of genomic variants.
- agshumate/Liftoff - A tool that accurately maps annotations in GFF or GTF between assemblies of the same, or closely-related species.
- gpertea/gffread - GFF/GTF utility providing format conversions, region filtering, FASTA sequence extraction and more.
- NBISweden/AGAT - A suite of tools to handle gene annotations in any GTF/GFF format.
- brentp/vcfassoc - Perform genotype-phenotype-association tests on a VCF with logistic regression.
- chrchang/plink-ng - A comprehensive update to the PLINK association analysis toolset.
- MareesAT/GWA_tutorial - A comprehensive tutorial about GWAS and PRS.
- xiaolei-lab/rMVP - A Memory-efficient, visualization-enhanced, and parallel-accelerated tool for GWAS.
- faressoft/terminalizer - Record your terminal and generate animated gif images or share a web player.
- flameshot-org/flameshot - Powerful yet simple to use screenshot software.
- nbedos/termtosvg - Record terminal sessions as SVG animations.
- sindresorhus/pageres-cli - Capture website screenshots.
- cmdcolin/awesome-genome-visualization - Interesting genome browser or genome-browser-like implementations.
- crazyhottommy/getting-started-with-genomics-tools-and-resources - Unix, R, and Python tools for genomics and data science.
- danielecook/Awesome-Bioinformatics - Awesome bioinformatics libraries and software.
- ibraheemdev/modern-unix - A collection of modern alternatives to common Unix commands.
- j-andrews7/awesome-bioinformatics-benchmarks - Bioinformatics bench-marking papers and resources.
- josephmisiti/awesome-machine-learning - A curated list of awesome machine learning frameworks, libraries and software.
- kmhernan/awesome-bioinformatics-formats - Bioinformatics formats and publications.
- raivivek/awesome-biology - learning resources, research papers, tools, and other resources across different fields of biology.
- seandavi/awesome-single-cell - Software packages and data resources for for single-cell data analysis.
- sindresorhus/awesome - Awesome lists about all kinds of interesting topics.
- DeepLabCut/DeepLabCut - Markerless pose estimation of user-defined features with deep learning for all animals.
- nidhaloff/igel - A machine learning tool that allows you to train, test and use models without writing code.
- scikit-learn/scikit-learn - Machine learning in Python.
- ssusnic/Machine-Learning-Flappy-Bird - Machine learning for Flappy Bird using neural networks and a genetic algorithm.
- Ecogenomics/GTDBTk - A toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes.
- fbreitwieser/pavian - Interactive analysis of metagenomics data.
- metagenome-atlas/atlas - Metagenome-Atlas is a easy-to-use metagenomic pipeline based on Snakemake.
- MrOlm/drep - Rapid comparison and dereplication of genomes.
- nf-core/mag - Assembly and binning of metagenomes.
- EpiDiverse - A collection of Nextflow pipelines for epigenome analysis.
- nf-core/methylseq - Methylation (Bisulfite-Sequencing) analysis pipeline using Bismark or bwa-meth + MethylDackel.
- bioFAM/MOFA2 - A factor analysis model that provides a general framework for the integration of multiomic data sets in an unsupervised fashion.
- mixOmics - An R package that offers a wide range of multivariate methods for the exploration and integration of biological datasets with a particular focus on variable selection.
- All Resources - The National Center for Biotechnology Information (NCBI) advances science and health by providing access to biomedical and genomic information.
- NCBI-Hackathons/EDirectCookbook - Examples illustrating the use of NCBI's Entrez Direct (EDirect), which provides access to the NCBI's suite of interconnected databases.
- AstrobioMike/GToTree - A user-friendly workflow for phylogenomics.
- YuLab-SMU/ggtree - Visualization and annotation of phylogenetic trees.
- Automate the Boring Stuff with Python - Practical programming for total beginners.
- R for Data Science - Learn how to get your data into R, get it into the most useful structure, transform it, visualize it and model it.
- The Modern JavaScript Tutorial - From the basics to advanced topics with simple, but detailed explanations.
- ziishaned/learn-regex - Learn regex the easy way.
- gear-genomics/tracy - Basecalling, alignment, assembly and deconvolution of Sanger Chromatogram trace files.
- huishenlab/biscuit - Perform alignment, DNA methylation and mutation calling, and allele specific methylation from bisulfite sequencing data.
- kblin/ncbi-acc-download - An ultra-fast all-in-one FASTQ preprocessor (QC/adapters/trimming/filtering/splitting/merging).
- kingfisher-download - Easier download/extract of FASTA/Q read data and metadata from the ENA, NCBI, AWS or GCP.
- lh3/seqtk - Toolkit for processing sequences in FASTA/Q formats.
- nf-core/fetchngs - Pipeline to fetch metadata and raw FASTQ files from public and private databases.
- OpenGene/fastp - An ultra-fast all-in-one FASTQ preprocessor (QC/adapters/trimming/filtering/splitting/merging).
- nf-core/nanoseq - Nanopore demultiplexing, QC and alignment pipeline.
- nf-core/rnaseq - RNA sequencing analysis pipeline using STAR, RSEM, HISAT2 or Salmon with gene/isoform counts and extensive quality control.
- STAR-Fusion/STAR-Fusion - Uses the STAR aligner to identify candidate fusion transcripts supported by Illumina reads.
- suhrig/arriba - Fast and accurate gene fusion detection from RNA-Seq data.
- easystats/easystats - A collection of R packages designed to provide a unifying and consistent framework to tame, discipline, and harness the scary R statistics and their pesky models.
- IndrajeetPatil/ggstatsplot - Enhancing ggplot2 plots with statistical analysis.
- kassambara/factoextra - Extract and visualize the results of multivariate data analyses.
- paulvanderlaken/ppsr - R implementation of Predictive Power Score.
- apache/arrow - A multi-language toolbox for accelerated data interchange and in-memory processing.
- arq5x/bedtools2 - Intersect, merge, count, complement, and shuffle genomic intervals from multiple files in widely-used genomic file formats such as BAM, BED, GFF/GTF, VCF.
- bedops/bedops - High-performance genomic feature operations.
- BurntSushi/xsv - A fast CSV command line toolkit written in Rust.
- harelba/q - Run SQL directly on delimited files and multi-file sqlite databases.
- johnkerl/miller - Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON.
- markfairbanks/tidytable - tidytable is a data frame manipulation library for users who need data.table speed but prefer tidyverse-like syntax.
- OpenRefine - OpenRefine (previously Google Refine) is a powerful tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data.
- pandas-dev/pandas - Flexible and powerful data analysis and manipulation library for Python.
- pstaender/csv2md - Converts CSV data to Markdown tables.
- ropensci/skimr - A frictionless, pipeable approach to dealing with summary statistics.
- saulpw/visidata - A terminal spreadsheet multitool for discovering and arranging data.
- sqlitebrowser/sqlitebrowser - A high quality, visual, open source tool to create, design, and edit database files compatible with SQLite.
- wireservice/csvkit - A suite of utilities for converting to and working with CSV, the king of tabular file formats.
- adrianlopezroche/fdupes - A program for identifying or deleting duplicate files residing within specified directories.
- alacritty/alacritty - A cross-platform, GPU-accelerated terminal emulator.
- Clipy/Clipy - Clipboard extension app for macOS.
- joh/when-changed - Execute a command when a file is changed.
- jonschlinkert/markdown-toc - API and CLI for adding a table of contents to a Markdown file.
- lindenb/jvarkit - Java utilities for bioinformatics.
- phiresky/ripgrep-all - Wraps ripgrep and enables it to search more file types.
- schollz/croc - Easily and securely send things from one computer to another.
- stevenvachon/broken-link-checker - Find broken links within HTML.
- tcort/markdown-link-check - Check hyperlinks in Markdown text.
- BGI-shenzhen/VCF2Dis - A simple and efficient tool to calculate a p-distance matrix from VCF files.
- brentp/vcfanno - Annotate a VCF with other VCFs/BEDs/tabixed files.
- samtools/bcftools - A set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed.
- vcflib/vcflib - C++ library and command-line tools for parsing and manipulating VCF files.
- vcftools/vcftools - A set of tools written in Perl and C++ for working with VCF files.
- ACEnglish/truvari - A toolkit for benchmarking, merging, and annotating structural variants.
- barricklab/breseq - A computational pipeline for finding mutations relative to a reference sequence in short-read DNA resequencing data. It is intended for haploid microbial genomes.
- CRG-CNAG/CalliNGS-NF - GATK RNA-Seq variant calling in Nextflow.
- fritzsedlazeck/SURVIVOR - Toolset for SV simulation, comparison and filtering.
- nf-core/sarek - Analysis pipeline to detect germline or somatic variants (pre-processing, variant calling and annotation) from WGS / targeted sequencing.
- PoisonAlien/maftools - Summarize, analyze and visualize MAF files from TCGA or in-house studies.
- iggredible/Learn-Vim - Vim guide for beginner and advanced users.
- maxplanck-ie/snakepipes - Customizable workflows based on snakemake and python for the analysis of NGS data.
- nextflow-io/nextflow - A bioinformatics workflow manager that enables the development of portable and reproducible workflows.
- nfcore - A community effort to collect a curated set of analysis pipelines built using Nextflow.
- ploomber/ploomber - A framework to build collaborative and modular pipelines.
- ropensci/targets - A Make-like pipeline tool for statistics and data science in R.
- Snakemake workflow catalog - A comprehensive catalog of standards compliant, public, Snakemake workflows.
- snakemake/snakemake-wrappers - A collection of reusable wrappers for adding popular command-line tools to Snakemake workflows.
- snakemake/snakemake - A tool to create reproducible and scalable data analyses.