Skip to content

Commit

Permalink
several change after success run the count_matrix
Browse files Browse the repository at this point in the history
  • Loading branch information
ShangZhang committed Sep 10, 2020
1 parent 2c841bd commit 2f3f985
Show file tree
Hide file tree
Showing 6 changed files with 148 additions and 67 deletions.
141 changes: 102 additions & 39 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@

## Installation

#### Docker image
### Docker image
For easy installation, you can use the [exVariance image](https://hub.docker.com/) of [docker](https://www.docker.com) with all dependencies installed:

```bash
Expand All @@ -44,33 +44,64 @@ For easy installation, you can use the [exVariance image](https://hub.docker.com
- dependencies
1. [docker](https://www.docker.com/) version>=19.03.4

#### Singularity image
### Singularity image
Alternatively, you can use use [singularity](https://singularity.lbl.gov/) or [udocker](https://github.com/indigo-dc/udocker) to run the container for Linux kernel < 3 or if you don't have permission to use docker.

#### Homemade
**Best Practice**: Also, you can also use the [github](https://github.com/ShangZhang/exVariance) source code and install dependencies below listed:
### Homemade (Best Practice)
Also, you can also use the [github](https://github.com/ShangZhang/exVariance) source code and install dependencies below listed:

```bash
git clone https://github.com/ShangZhang/exVariance.git
```

- dependencies:
1. [Anaconda3](https://www.anaconda.com)/[Miniconda3](http://conda.pydata.org/miniconda.html) conda version=4.8.4
2. [Python](https://www.python.org/) version=3.7.9
3. [Snakemake](https://snakemake.readthedocs.io) version=5.23.0


> **Note:**
> - how to install special vesion of snakemake?
1. The default conda solver is a bit slow and sometimes has issues with selecting the latest package releases. Therefore, we recommend to install Mamba as a drop-in replacement via
```
conda install -c conda-forge mamba
```
2. you can install Snakemake with
```
mamba create -n exVariance -c conda-forge -c bioconda python=3.7 snakemake=5.23.0 -y
```
#### Dependencies:
1. [Anaconda3](https://www.anaconda.com)/[Miniconda3](http://conda.pydata.org/miniconda.html) conda version latter than 4.8.4
2. [Python](https://www.python.org/) version latter than 3.7.0
3. [Snakemake](https://snakemake.readthedocs.io) version=5.14.0
4. [R](https://www.r-project.org/) version=3.6.3
5. [R packages](https://www.r-project.org/)

#### How to install all the dependencies:
1. Install **Anaconda3/Minicodna3** and **Python**
```
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
```
- Whilst running the installation script, follow the commands listed on screen, and press the enter key to scroll.
- Make sure to answer yes when asked if you want to prepend Miniconda3 to PATH.
Close your terminal, open a new one and you should now have Conda working! Test by entering:
```
conda update conda
```
- Press y to confirm the conda updates
2. Install **Mamba**
The default conda solver is a bit slow and sometimes has issues with selecting the special version packages. Therefore, we recommend to install Mamba as a drop-in replacement via
```bash
conda install -c conda-forge mamba
```
3. Install **Snakemake 5.14.0** and **R 3.6.3**
```
mamba create -n exvariance4 -c conda-forge -c bioconda snakemake=5.14.0 r-base=3.6.3 -y
```
4. Install related **R packages**
```R
install.packages(c("argparse","clusterSim","ggpubr","BiocManager","devtools"))
BiocManager::install(c("scater","scran","SingleCellExperiment","sva","edgeR","RUVSeq"))
devtools::install_github(c("hemberg-lab/scRNA.seq.funcs","theislab/kBET")
```
**OR**
```bash
conda install -c r r-argparse -y
conda install -c conda-forge r-clustersim r-ggpubr -y
conda install -c bioconda bioconductor-scater bioconductor-scran bioconductor-singlecellexperiment bioconductor-sva bioconductor-edger bioconductor-ruvseq -y
conda install -c eugene_t r-kbet -y
conda install -c r r-devtools -y
```
```r
devtools::install_github(c("hemberg-lab/scRNA.seq.funcs","theislab/kBET"))
```
## Download Reference
exVariance is dependent on reference files which can be found for the supported species listed below: <u>hg38</u>
Expand All @@ -91,17 +122,26 @@ usage: exVariance [-h] --user_config_file USER_CONFIG_FILE
[--singularity SINGULARITY]
[--singularity-wrapper-dir SINGULARITY_WRAPPER_DIR]
{quality_control,cutadapt,quality_control_clean,mapping,bigwig,
count_matrix,normalization,differential_expression,fusion_transcripts,
SNP,RNA_editing,AS,APA,WGBS,RRBS,ctdna,wgbs_rrbs,seal_methyl-cap_medip,
mcta,dna-seq}
{ RNA_seq_pre_process,RNA_seq_exp_matrix,
RNA_seq_fusion_transcripts,RNA_seq_RNA_editing,
RNA_seq_SNP,RNA_seq_APA,RNA_seq_AS,
DNA_seq_ctDNA_mutation,DNA_seq_NP,
DNA_meth_WGBS,DNA_meth_RRBS,
DNA_meth_Seal_seq,DNA_meth_Methyl-cap_seq,
DNA_meth_MeDIP_seq,DNA_meth_MCTA_seq
}
exVariance is a tool for integrated analysis the liquid biopsy sequencing data.
positional arguments:
{quality_control,cutadapt,quality_control_clean,mapping,bigwig,count_matrix,
normalization,differential_expression,fusion_transcripts,SNP,RNA_editing,AS,APA,
WGBS,RRBS,ctdna,wgbs_rrbs,seal_methyl-cap_medip,mcta,dna-seq}
{ RNA_seq_pre_process,RNA_seq_exp_matrix,
RNA_seq_fusion_transcripts,RNA_seq_RNA_editing,
RNA_seq_SNP,RNA_seq_APA,RNA_seq_AS,
DNA_seq_ctDNA_mutation,DNA_seq_NP,
DNA_meth_WGBS,DNA_meth_RRBS,
DNA_meth_Seal_seq,DNA_meth_Methyl-cap_seq,
DNA_meth_MeDIP_seq,DNA_meth_MCTA_seq
}
optional arguments:
-h, --help show this help message and exit
Expand All @@ -122,39 +162,62 @@ optional arguments:
positional arguments:
{quality_control,cutadapt,quality_control_clean,mapping,bigwig,count_matrix,normalization,differential_expression,fusion_transcripts,SNP,RNA_editing,AS,APA,WGBS,RRBS,ctdna,wgbs_rrbs,seal_methyl-cap_medip,mcta,dna-seq}
{ RNA_seq_pre_process,RNA_seq_exp_matrix,
RNA_seq_fusion_transcripts,RNA_seq_RNA_editing,
RNA_seq_SNP,RNA_seq_APA,RNA_seq_AS,
DNA_seq_ctDNA_mutation,DNA_seq_NP,
DNA_meth_WGBS,DNA_meth_RRBS,
DNA_meth_Seal_seq,DNA_meth_Methyl-cap_seq,
DNA_meth_MeDIP_seq,DNA_meth_MCTA_seq
}
For additional help or support, please visit https://github.com/ShangZhang/exVariance
```

### Input files

Several examples can be found in `demo` directory with the following structure:
RNA-seq related examples can be found in `demo` directory with the following structure:

```text
./demo/*/
|-- config
| |-- default_config.yaml
| `-- example.yaml
| |-- <data_name>.yaml
| |-- dapars_configure.txt
| `-- RNAEditor_configure.txt
|-- data
| |-- fastq
| |-- fastq/
| |-- sample_ids.txt
| |-- sample_classes.txt
| |-- compare_groups.yaml
| `-- batch_info.txt
|-- output
`-- summary
```

Other related examples can be found in `demo` directory with the following structure:

```text
./demo/*/
|-- config
| |-- default_config.yaml
| `-- <data_name>.yaml
|-- data
| |-- fastq/
| `-- sample_ids.txt
|-- genome
| `-- fasta
|-- output
`-- tmp
`-- summary
```

> **Note:**
>
> - `config/default_config.yaml`: the default configuration file. If you don't understand, don't change the content.
> - `config/<data_name>.yaml`: the user defined configuration file, to point out the related used path.
> - `data/fastq/` : directory contain samples name, suffixed with 'fasta.gz' or 'fastq.gz'.
> - `data/example/sample_ids.txt`: table of sample names (remove the suffix 'fasta.gz' or 'fastq.gz' )
> - `genome/f` : the genome directory
> - `data/fastq/` : directory contain samples name, suffixed with 'fastq' 'fasta.gz' or 'fastq.gz'.
> - `data/sample_ids.txt`: table of sample names (remove the suffix 'fastq' 'fasta.gz' or 'fastq.gz' )
> - `output/`: the output directory
> - `tmp/` : contain the temporary files
> - `summary/` : contain the summary files
You can create your own data directory with the above directory structure.
Multiple datasets can be put in the same directory by replacing "example" with your own dataset names.
Expand Down Expand Up @@ -220,4 +283,4 @@ Our own servers have 64GB of ram and 16 cores.

Copyright (C) Lu Lab @ Tsinghua University, Beijing, China 2020 All rights reserved

## Citation
## Citation
4 changes: 2 additions & 2 deletions snakemake/RNA_seq/diff_exp/count_matrix_long.snakemake
Original file line number Diff line number Diff line change
Expand Up @@ -155,12 +155,12 @@ rule count_matrix:


# remove features not in transcript table
gene_ids = gene_ids[~(transcript_table.loc[gene_ids, 'gene_id'].isna().values)]
gene_ids = gene_ids[~(transcript_table.reindex(gene_ids)['gene_id'].isna().values)]
matrix = matrix.loc[gene_ids]
# read gene lengths
gene_lengths = pd.read_table(input.gene_length, sep='\t', index_col=0, dtype='str').loc[:, 'merged']
# remove features not in gene length
gene_ids = gene_ids[~(gene_lengths.loc[gene_ids].isna().values)]
gene_ids = gene_ids[~(gene_lengths.reindex(gene_ids).isna().values)]
matrix = matrix.loc[gene_ids]
# annotate features
feature_names = transcript_table.loc[gene_ids, 'gene_id'].values \
Expand Down
42 changes: 21 additions & 21 deletions snakemake/RNA_seq/diff_exp/mapping_long_pe.snakemake
Original file line number Diff line number Diff line change
Expand Up @@ -393,24 +393,24 @@ rule summarize_mapping_star:



rule summary_mapping_pe:
input:
mapped_read_length_by_sample= expand('{output_dir}/stats/mapped_read_length_by_sample/{sample_id}',output_dir=output_dir, sample_id=sample_ids),
mapped_insert_size_by_sample=expand('{output_dir}/stats/mapped_insert_size_by_sample/{sample_id}',output_dir=output_dir, sample_id=sample_ids),
mapping_star=expand('{output_dir}/summary/mapping_star.txt',output_dir=output_dir)
output:
summary_mapped_read_length_by_sample=expand('{summary_dir}/alignment/mapped_read_length_by_sample/{sample_id}',summary_dir=summary_dir, sample_id=sample_ids),
summary_mapped_insert_size_by_sample=expand('{summary_dir}/alignment/mapped_insert_size_by_sample/{sample_id}',summary_dir=summary_dir, sample_id=sample_ids),
summary_alignment_stat=expand('{summary_dir}/alignment/mapping_star.txt',summary_dir=summary_dir)
params:
mapped_read_length_by_sample= expand('{output_dir}/stats/mapped_read_length_by_sample/',output_dir=output_dir, sample_id=sample_ids),
mapped_insert_size_by_sample=expand('{output_dir}/stats/mapped_insert_size_by_sample/',output_dir=output_dir, sample_id=sample_ids),
mapping_star=expand('{output_dir}/summary/',output_dir=output_dir),
summary_mapped_read_length_by_sample=expand('{summary_dir}/alignment/mapped_read_length_by_sample/',summary_dir=summary_dir),
summary_mapped_insert_size_by_sample=expand('{summary_dir}/alignment/mapped_insert_size_by_sample/',summary_dir=summary_dir),
summary_alignment_stat=expand('{summary_dir}/alignment/',summary_dir=summary_dir)
shell:
''' cp {params.mapped_read_length_by_sample}/* {params.summary_mapped_read_length_by_sample} ;
cp {params.mapped_insert_size_by_sample}/* {params.summary_mapped_insert_size_by_sample} ;
cp {params.mapping_star} {params.summary_alignment_stat} ;
'''
# rule summary_mapping_pe:
# input:
# mapped_read_length_by_sample= expand('{output_dir}/stats/mapped_read_length_by_sample/{sample_id}',output_dir=output_dir, sample_id=sample_ids),
# mapped_insert_size_by_sample=expand('{output_dir}/stats/mapped_insert_size_by_sample/{sample_id}',output_dir=output_dir, sample_id=sample_ids),
# mapping_star=expand('{output_dir}/summary/mapping_star.txt',output_dir=output_dir)
# output:
# summary_mapped_read_length_by_sample=expand('{summary_dir}/alignment/mapped_read_length_by_sample/{sample_id}',summary_dir=summary_dir, sample_id=sample_ids),
# summary_mapped_insert_size_by_sample=expand('{summary_dir}/alignment/mapped_insert_size_by_sample/{sample_id}',summary_dir=summary_dir, sample_id=sample_ids),
# summary_alignment_stat=expand('{summary_dir}/alignment/mapping_star.txt',summary_dir=summary_dir)
# params:
# mapped_read_length_by_sample= expand('{output_dir}/stats/mapped_read_length_by_sample/',output_dir=output_dir, sample_id=sample_ids),
# mapped_insert_size_by_sample=expand('{output_dir}/stats/mapped_insert_size_by_sample/',output_dir=output_dir, sample_id=sample_ids),
# mapping_star=expand('{output_dir}/summary/',output_dir=output_dir),
# summary_mapped_read_length_by_sample=expand('{summary_dir}/alignment/mapped_read_length_by_sample/',summary_dir=summary_dir),
# summary_mapped_insert_size_by_sample=expand('{summary_dir}/alignment/mapped_insert_size_by_sample/',summary_dir=summary_dir),
# summary_alignment_stat=expand('{summary_dir}/alignment/',summary_dir=summary_dir)
# shell:
# ''' mkdir -p {params.summary_mapped_read_length_by_sample} ; cp {params.mapped_read_length_by_sample}/* {params.summary_mapped_read_length_by_sample} ; \
# mkdir -p {params.summary_mapped_insert_size_by_sample} ; cp {params.mapped_insert_size_by_sample}/* {params.summary_mapped_insert_size_by_sample} ; \
# mkdir -p {params.summary_alignment_stat} ; cp {params.mapping_star} {params.summary_alignment_stat} ;
# '''
4 changes: 2 additions & 2 deletions snakemake/RNA_seq/diff_exp/normalization.snakemake
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
###------------------------The output section---------------------------###

# 以下是需要安装的工具,后续将其添加到conda里,进而形成conda env
# conda install -c r r-argparse
# conda install -c r r-argparse -y
# conda install -c conda-forge r-clustersim -y
# conda install -c bioconda bioconductor-scater bioconductor-scran bioconductor-singlecellexperiment -y

Expand All @@ -21,7 +21,7 @@
# conda install -c conda-forge r-ggpubr -y
# conda install -c bioconda bioconductor-ruvseq -y

# install.packages("devtools")
# conda install -c r r-devtools -y
# devtools::install_github("hemberg-lab/scRNA.seq.funcs")

rule filter_step:
Expand Down
6 changes: 3 additions & 3 deletions snakemake/RNA_seq/exp_matrix.snakemake
Original file line number Diff line number Diff line change
Expand Up @@ -44,9 +44,9 @@ def get_all_inputs(wildcards):
map_paired=expand('{output_dir}/bam/{sample_id}/{map_step}.bam',output_dir=output_dir, sample_id=sample_ids, map_step=map_steps),
map_paired_sorted_by_name=expand('{output_dir}/bam_sorted_by_name/{sample_id}/{map_step}.bam',output_dir=output_dir, sample_id=sample_ids, map_step=map_steps),
# mapping long pe # summary section
summary_mapped_read_length_by_sample=expand('{summary_dir}/alignment/mapped_read_length_by_sample/{sample_id}',summary_dir=summary_dir, sample_id=sample_ids),
summary_mapped_insert_size_by_sample=expand('{summary_dir}/alignment/mapped_insert_size_by_sample/{sample_id}',summary_dir=summary_dir, sample_id=sample_ids),
summary_alignment_stat=expand('{summary_dir}/alignment/mapping_star.txt',summary_dir=summary_dir)
# summary_mapped_read_length_by_sample=expand('{summary_dir}/alignment/mapped_read_length_by_sample/{sample_id}',summary_dir=summary_dir, sample_id=sample_ids),
# summary_mapped_insert_size_by_sample=expand('{summary_dir}/alignment/mapped_insert_size_by_sample/{sample_id}',summary_dir=summary_dir, sample_id=sample_ids),
# summary_alignment_stat=expand('{summary_dir}/alignment/mapping_star.txt',summary_dir=summary_dir)
)

# bigwig_long.snakemake
Expand Down
18 changes: 18 additions & 0 deletions snakemake/envs/normalization.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
channels:
- bioconda
- conda-forge
- r
- eugene_t

dependencies:
- r-argparse
- r-clustersim
- bioconductor-scater
- bioconductor-scran
- bioconductor-singlecellexperiment
- r-kbet
- bioconductor-sva
- bioconductor-edger
- r-ggpubr
- bioconductor-ruvseq
- r-devtools

0 comments on commit 2f3f985

Please sign in to comment.