CRISPRviewR: an R package for repairing, comparing, and visualizing CRISPRs across environmental datasets
CRISPRviewR uses the output from minCED to associate, compare, and visualize CRISPR arrays across environmental samples. To get a sense for the shape of minCED data, check out the example files.
This package relies on the functions of other packages for data cleaning and plotting, including the following:
- dplyr, ggplot2, and other components of the tidyverse
- Biostrings
- ggpubr
- ggnewscale
- ggseqlogo
Future versions will be available on CRAN or Bioconductor. For now, you can install the development version from GitHub:
devtools::install_github("acvill/CRISPRviewR")
RStudio users may have to run RStudio with administrator privileges for the devtools installation to work.
Please see the CRISPRviewR vignette for a suggested workflow.
The CRISPRviewR functions make no assumptions about the completeness of the CRISPR arrays annotated by minCED or the structure of the underlying assembly. In that regard, users of CRISPRviewR should be aware of the following possibilities.
- Due to their abundance of direct repeats, CRISPR arrays are often misassembled, particularly in the absence of sufficient coverage.
- minCED does not predict orientation of arrays, which requires either identification of cas genes or the annotation of a leader sequence. Therefore, CRISPRviewR plots may be backwards with respect to the direction of transcription. If this is a problem, use a tool like CRISPRleader to get strand orientation, then export your plots and invert manually.
- For fragmented assemblies, CRISPR arrays may occur at contig boundaries.
- For time-course metagenomic assemblies, differences in CRISPR array structure through time may be attributed to standing variation as opposed to array expansion / recombination.
- minCED relies on CRISPR Recognition Tool (CRT) to detect CRISPR repeats. The CRT algorithm requires repeats to be identical, and this stringency can lead to the misassignment of portions of repeat sequences to spacers. Consider setting
fix_repeats = TRUE
when callingread_minced()
to address this issue. See "Fix truncated repeats" in the vignette and my related blog post for more details.
- CRISPRviewR has only been tested with the output from minCED v0.4.2
- If you find a bug or want to suggest a new feature, please open an issue or make a pull request.