Skip to content

Latest commit

 

History

History
59 lines (40 loc) · 3.31 KB

README_matam_16s.md

File metadata and controls

59 lines (40 loc) · 3.31 KB

Manual for the matam_16s module

Dependencies

SortMeRNA, MATAM, Usearch and seqtk

Notes

  1. ⚠️ Same as the link module, the matam_16s module also assumes the id of reads in pair in the format of XXXX.1 and XXXX.2. The only difference is the last character. You can rename your reads with MarkerMAG's rename_reads module (manual).

  2. The reconstruction of 16S rRNA genes by Matam is affected by sequencing depth (data not published), we thus recommend to run Matam on the same dataset multiple times at different reads subsample rates, combined Matam assemblies at all depth and dereplicate them.

  3. The following command extracts 16S rRNA reads and subsample them at rates of 1, 5, 10, 25, 50, 75 and 100%. 16S rRNA gene sequences reconstructed from all subsets are combined and clustered at identity cut-off of 99.9% (recommended). The longest sequence from each cluster will be kept.

    MarkerMAG matam_16s -p soil -r1 soil_R1.fastq -r2 soil_R2.fastq -pct 1,5,10,25,50,75,100 -i 0.999 -d /srv/scratch/z5039045/DB/SILVA/SILVA_138_1_SSURef_NR99_id99/SILVA_138.1_SSURef_NR99_tax_silva_NR99 -t 12
    
  4. ⚠️ The default SILVA SSU database used by Matam is the 128 release. If you want to run Matam with the latest release of the SILVA SSU database, please refer to the steps below.

  5. ⚠️ Reconstruct 16S with Matam is time-consuming (especially with multiple times of subsampling), so be patient!

Prepare Matam database with the latest SILVA SSU database (v138.1)

  1. Download SILVA SSU sequences (v138.1)

    # specify a location where you want to store the db files
    matam_db_folder='/srv/scratch/z5039045/DB/Matam'
    
    # download the SILVA SSU sequence file to the specified folder and decompress it
    cd $matam_db_folder
    wget https://www.arb-silva.de/fileadmin/silva_databases/release_138_1/Exports/README.txt
    wget https://www.arb-silva.de/fileadmin/silva_databases/release_138_1/Exports/SILVA_138.1_SSURef_NR99_tax_silva.fasta.gz
    gunzip SILVA_138.1_SSURef_NR99_tax_silva.fasta.gz
    
  2. Format SILVA SSU sequences with Matam

    matam_db_folder='/srv/scratch/z5039045/DB/Matam'
    cd $matam_db_folder
    matam_db_preprocessing.py --clustering_id_threshold 0.99 --max_memory 30000 --cpu 12 -v -i SILVA_138.1_SSURef_NR99_tax_silva.fasta -d SILVA_138_1_SSURef_NR99_id99
    
  3. The generated db files need to be provided to Matam as -d $matam_db_folder/SILVA_138_1_SSURef_NR99_id99/SILVA_138.1_SSURef_NR99_tax_silva_NR99. Here is an example command:

    # run matam directly
    matam_db_folder='/srv/scratch/z5039045/DB/Matam'
    matam_assembly.py -i filtered_reads_R1_R2.fastq -o Matam_outputs -d $matam_db_folder/SILVA_138_1_SSURef_NR99_id99/SILVA_138.1_SSURef_NR99_tax_silva_NR99 -v --cpu 12 --max_memory 30000 
    
    # run matam with matam_16s
    matam_db_folder='/srv/scratch/z5039045/DB/Matam'
    MarkerMAG matam_16s -p soil -r1 soil_R1.fastq -r2 soil_R2.fastq -pct 1,5,10,25,50,75,100 -i 0.999 -d $matam_db_folder/SILVA_138_1_SSURef_NR99_id99/SILVA_138.1_SSURef_NR99_tax_silva_NR99 -t 12