Skip to content

Commit

Permalink
updated trimming table
Browse files Browse the repository at this point in the history
  • Loading branch information
Randy Klabacka committed Mar 2, 2023
1 parent d6e20b2 commit 3dbb176
Show file tree
Hide file tree
Showing 6 changed files with 2,689 additions and 9 deletions.
Binary file added .README.md.swp
Binary file not shown.
16 changes: 7 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ cd example_fastqc
open fastqc_report.html
```
> note: scp is a way to securely copy a file. The first parameter is the path to the remote file. The second parameter is the path to the destination location (in this scenario we just used the current directory ```.```)
> note: This will copy to whatever directory you are in.
> note: The ```.``` at the end of your scp command means the file you are copying will land in the directory you are in.
At the top of the page, you should see information about the 'Basic statistics' for your reads in the file example.fastq. You have 25 total sequences in this file, each of which is of length 100 bp. If you look at the example.fastq file (e.g., ```less example.fastq```), you'll see that each read is 100 bp in length (this is your read length).

Expand All @@ -95,25 +95,21 @@ This shows you the average quality score for your reads at each position. On thi

# Trimming

To trim our reads, we will use the program 'fastp'. The syntax for this program is simple, and is described on their [documentation](https://github.com/OpenGene/fastp). Execute the following commands.
To trim our reads, we will use the program 'fastp'. The syntax for this program is described in their [documentation](https://github.com/OpenGene/fastp). Execute the following command to trim reads within the file ```example_raw.fastq```:

```
module load fastp/0.20.1
fastp \
--in1 example_raw.fastq --in2 NEED
-i example_raw.fastq \
-q 15 \
-u 40 \
-e 30 \
-l 15 \
-a AGATCGGAAGAGCACACGTCTGAACTCCAGTCA \
--adapter_sequence_r2 AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT \
-M 25 \
-W 5 \
-5 \
-3 \
-c \
-m --merged_out merged --out1 unmerged1 --out2 unmerged2 --unpaired1 unpaired1 --unpaired2 unpaired2
-D
-o example_cleaned.fastq
```

Expand All @@ -138,10 +134,12 @@ Information on what the options for this program do is provided in the table bel
| | ```--merged_out``` | Filename for storing merged reads | NA |
| ```-o``` and ```-O``` | ```--out1``` and ```out2``` | Filenames for unmerged reads that passed trimming filters | NA |
| | ```--unpaired1``` and ```unpaired2``` | Filenames for reads that can't be merged because one didn't pass filters | NA |
| ```-D``` | ```--dedup``` | Duplicate reads\*\*\* (reads with the exact same sequence) are removed | ON |
| | ```--dedup``` | Duplicate reads\*\*\* (reads with the exact same sequence) are removed | ON |
##### \* If no adapter sequence is specified, the adapter sequence is intuited by fastp (which is faster, but can be inaccurate)
##### \*\* The TruSeq adapter sequences are ```AGATCGGAAGAGCACACGTCTGAACTCCAGTCA``` (for read 1) and ```AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT``` (for read2).
##### \*\*\* This is to remove PCR duplicates
##### \*\*\* This is to remove PCR duplicates, however this feature is only available in fastp versions after 0.22

You'll notice that many of the options in this table aren't implemented in your command above. One reason for this is because the example_raw.fastq contains reads from single-end (SE) sequencing. Options such as ```--in2```, ```adapter_sequence_r2```, ```--correction```, ```--merged``` (and the other

You can also split the output files into multiple fastq files, which can be helpful if you plan to do mapping in parallel. This options to create 3 output files for a single individual is shown below (we don't include it in this example, but it would decrease downstream processing time).
```fastp --split_prefix_digits=4 --out1=out.fq --split=3```
Expand Down
84 changes: 84 additions & 0 deletions example_cleaned.fastq
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
@D3NH4HQ1:149:C1H5KACXX:3:1101:2106:2242 2:N:0:GCTCGGTA
AAGCCAGCAAACCTTGTTTTACCTCACTGATATAGATTAGATATTTCAAGACAAATTTGTTGCCAATGTTAGATTATTAACATTATTTATTATAAAAATA
+
CCCFFFFFHHHHHJJJJJJJJJJJJJJJJIJJJJJJJJJJJJIJIJJJJJJJJJJJJJJJJJJJJJJIJJJJJJIJJJJHHHHHHHFFFFFFFEEEEEEC
@D3NH4HQ1:149:C1H5KACXX:3:1101:2575:2234 2:N:0:GCTCGGTA
TAGCCATCGTCTAACAACACTAACTCTGGATAGTGTACAAAAATTAAAAGCCAACCATTAAAACCAGTTACCCATGTGCATCAGCTAATAACACTTTCCA
+
@CCFFFFFHHHHHJJJJIIJJJJJJJJJJIJJJHIGHIJJJJJJJJIJJIHHIJJJIJJJJJJIJJJDHFHHHHHFFFFFFEEEEEEDDCDDDDDDDDDE
@D3NH4HQ1:149:C1H5KACXX:3:1101:2881:2141 2:N:0:GCTCGGTA
CACGTGGATGTAAACTGTAGTGCATTGTCATTAGTTTCCTGGTTTTTCGGTCCAAAATGTCCAAATCAGCTTGTGTCCAGTTATGTCAGCTTCTGCTTTG
+
?;@AADDDFHADA4BFFEDEHFBEEHB<C4F4A?1?HHBEHIJ*1?DA:?8?@<DGGJGG@CHGGIIIJJJEIAEEEHBE@BDEECEDCE:>C>@>>@CC
@D3NH4HQ1:149:C1H5KACXX:3:1101:2870:2153 2:N:0:GCTCGGTA
CCGGGCTCAAGCCCACCGCCTCTCCCGTCCTGGGCTCCGAAGGGGCCGTCCACCTGTTCCTTCCTTGCGTGAGAAATCTGCACACCGA
+
@?<DD@D??DADHIBG?GGGGIJGBCGFGEIGHFEEAGA6;8B@@=EB9>9?CD<?,>@A@A@C>C@:>958<3:>@>ACAA:A189>
@D3NH4HQ1:149:C1H5KACXX:3:1101:2928:2247 2:N:0:GCTCGGTA
AGGAGGGGTTTTGAAGAGGCTGATTGTGCTTTAGCAGTCTGGCTTCTGGTTCTGGGTCGTGCCTCCTCATCAGTTTGTGTTTTTGGTTGTTTTCTTTGTG
+
CCCFFFFFFHHHHJJJJJJJJJGIJJIJIJJJIJIIJDHHHIJIJJJIJC@FHGIJCHHEEFFFFDEECEDDDADEDBD<CCDDDDABDDDDDDDDDCCA
@D3NH4HQ1:149:C1H5KACXX:3:1101:4547:2045 2:N:0:GCTCGGTA
GTAAGTGTGAGTTTTAATGGGGATTTCACTTATAAGCCTCTTGACTAGTTATCAGTTATTTCTTTTCATCTTTTTCAGGTATCGCGCACCTGAAGTTTTA
+
=:=DDAB?D4CCFHIHEF@BEA<@FHHFIHH?E9?CH?DA?GGDGG@DFG<D9DEGEGGGI@FGIIJBFCHAG>GG@@ED@D);;?<AABCCA(55>AC@
@D3NH4HQ1:149:C1H5KACXX:3:1101:5962:2073 2:N:0:GCTCGGTA
TAGTATACATGTGGTTATATATTATGAAAATGAATATATATATATTGTGTGTGGAGATGTGGGTATACACACACACACACACACACACACATATCTCTTC
+
@CCDFFFFHHHFFIHIJIJJJJJJJJIJJJJJJJIJJIJJJJJJGJJIIIFHIJJJJJIGIJJ<CGHIIIIJIIJJHHHFFFDDDDDDDBBDCEDEDEDD
@D3NH4HQ1:149:C1H5KACXX:3:1101:6097:2207 2:N:0:GCTCGGTA
TGCCTAAACGCCTTTTTCCGGAGGGCGGCCGGATGTCAGCCTCTGCTTTGCTGCTGCTTCGGCTGCCATGAAACGGGGAGGGAGGGCATGGTA
+
@B@DFFFFGHDDHIGIJIJJIIIGIIJJFBEDA>C5;@@A=?CC:9@CDDACDCDACCDD<B;8;8?:>A4:4>@D>B-09>&5@B&(+2<44
@D3NH4HQ1:149:C1H5KACXX:3:1101:6870:2142 2:N:0:GCTCGGTA
CAATGCTGGCAAATCCAGCCCAAATCTTGCAACTTGTTGGAAGAGTTGGTCTCGGAGACCCTCTACTACTAGCTGATCTTGTCTCAGCTGT
+
BC@DFFDDDFFHHGJ>FHEH?DDHC@HIG>BEFDG9CDEI>DHGD09BF19BFDF<FFAE=CHDEC?=CEFC@>@BEE>A@C@:;@5(>>5
@D3NH4HQ1:149:C1H5KACXX:3:1101:7326:2099 2:N:0:GCTCGGTA
CCGTGCTATAATTCTACAATCCAAACCTTTGGTTTCTAAAATCATTTGTTGAGTCTTACACAGAGGCTACTTTTAAGCAGCCATGGGCATGTACATTTTC
+
CCCFFFFFHHHHHJJJJJJJJJJJJJJJJJJJFHJJJIIJJJJJJJJJJJIIIIJIJJIJIJIIJIJJJJJIJIIHHHFEFFFFFEDEEDDDEDDEFEEE
@D3NH4HQ1:149:C1H5KACXX:3:1101:8426:2077 2:N:0:GCTCGGTA
TGGGATCTCACGGCCACCAACAGGAAATGCAGCAGATTAACCTCCCACACAGACCTAGAGCCTCATTAGGATACGCCAGTTCCGTTTGGAAGGGATGCTA
+
CCCFFFFFHHHHHJJJJJJJJJJJJHIIJJJJJJJJJJJJJJIIJJJJJJHIJIJJJJJHHHHHFFFFFFEEEEDDDDD@CDDDACCBBCDDDDBCDDDD
@D3NH4HQ1:149:C1H5KACXX:3:1101:8736:2198 2:N:0:GCTCGGTA
CTCTGATTGCAAACCAATTTGACTGAGGGATCAGAGAAGAGTGAGCAATGCATATTCCTGGTTGCAGTCTCCTCCTCCTCCTCCTCCTCCTCTCCACATT
+
@C@DDFBBHDHHHJIGBHIJJHJIJIIJJHGGCGBHBHIIIGHGGI@GHGFHIJGIGHIIJFHIJIIGGIJIHHHHCDF>DCECCCC??CCCCCC:?@AA
@D3NH4HQ1:149:C1H5KACXX:3:1101:10672:2051 2:N:0:GCTCGGTA
TAACCAGTTCTGGAGAACCGCTAGTGGAAATTCTGAGTAATTTGGAGAACCGGTAAATACCACCTCTGACTGGCCCCGCCCCCATCTATTTTCTGCCTCC
+
@C@FFFFFGHHHFIIJIJIIGIGGFGHBHGIIJJJJGFHGEEHHDGGHHIFAE@FGGIIJCEEEHFEBDEFCB;@?>?8?BDBDCCDDDA@DA@A@CC@<
@D3NH4HQ1:149:C1H5KACXX:3:1101:11206:2137 2:N:0:GCTCGGTA
GAATGGGGGTAGCAAAGCAGAAACAATTCTTTAAATTTTAAGGTGCTAGCAGCCTAATCTTGACAGGTAAGTCTGTGTGCTTTTAATCTAAAAGTAGTCC
+
@@@DFFFFHFHDGIIEGHJJJIJJIJJJCIEGIGGBIIJCHGGBDEGGGIBFDHHGHGIJGIHCHFHFFFFDDDEED;@ACCCCDCCCDDDCDDDDDFE>
@D3NH4HQ1:149:C1H5KACXX:3:1101:12161:2150 2:N:0:GCTCGGTA
AAATTAAACCCCCTAAATGTCATAACCATTTTTAGTTTGTAGCTTTTTAATGTGGAGTAATCAGACTACATGCACTATTTTAATATGGTTGCATTGTTTA
+
CCCFFFFFHHHHHJJJJJJIJJJJJJJJJJJJJJJHIJJIJIIJJJJJJJGHHIJIJFHIJJJGJJJJJJJJJJHHHHHHHFFFFFFDCEDEDEEDCDCC
@D3NH4HQ1:149:C1H5KACXX:3:1101:12654:2145 2:N:0:GCTCGGTA
TCTACAAGTAACTGGTGGTAATAACTTGTGGTTAATGCATATGTAGAATCATATAGTTGGAAGATGCCAATAACACAATTCTCAAAATACATGAAAATAA
+
CCCFFFFFHHHHHJJEHIHIJJJJJJJJIJJHHIJJJJJJJJJIIJJJJJGHIJJJJIJJJJJJJIJJJJIIJJJJIIJHHHHHFFFFDFFEEEDEEDDD
@D3NH4HQ1:149:C1H5KACXX:3:1101:12735:2247 2:N:0:GCTCGGTA
GGCCTATGTGGAAAACAGGGGGGACTCGGGCTGGGAAACCATTGATTAATAAATATTATGTACTGGGGCCTGAGAAACTCAGAGTTAGGTTTATTTCAGC
+
@CCFFFFFHHHHHJIJJJIJIIDBDDDDDCDDDDDDDDDDDDDCDEDEDECCDEEDEEEDBCDDD>BDBDDDDDDCDDDDCDDD:@ACD@CCCCDEEEDD
@D3NH4HQ1:149:C1H5KACXX:3:1101:12933:2090 2:N:0:GCTCGGTA
TATTTTGCCATCTTTAACCAAGCAGAAAAAAGTTAAGGATTCTTTTTTTTCTTACATTAGCCTTTCTGACATTTATTCCACAACTCCAGGCTATCTCTGT
+
CCCFFFFFHHHHHJJJIJJJJJJIJHIJIJEHCHGGIJIJJJJJIJJJJJIIJIHHHHHHFFFFFFEEEEEEEEFEDEEEDDDDDDCDDDDDDDDDDEDD
@D3NH4HQ1:149:C1H5KACXX:3:1101:12879:2207 2:N:0:GCTCGGTA
ATTGAAATGGAATTATAAATACCATCAACACAAAAGGGAATTTGCTCAGAAGCTGAAATACACCAAGTGAATTAGAGGAAGAGTGGGAAGCAGAGGAGAG
+
@@@DBADFHHDFBEFDE>FF<HFGJGDHDHIIIJJFGED;HIGHDFCHG@DDBFF<FFFGHGHIEAC7=@7@@AAH>=6;;2@6;?A;ABBDCAAB28A8
@D3NH4HQ1:149:C1H5KACXX:3:1101:12973:2245 2:N:0:GCTCGGTA
TGTCTCCTTCCAACCAGGATCCCTGGGCAATAAGGTAACTTCCACCACTATTGGCAGGTGGTTGCGAACCTGCATTGCCACTGCGTACCAATCGCAGGTG
+
CC@FFFFFHHHHHJJJJJJJJJJIJJJJJJJJJJJFHIJIJJJJJJJJJIJIIJJJJJ=CHAHEHBDFACCBBDCD@ACCCACC@;?BB@CCCDD?<B8:
@D3NH4HQ1:149:C1H5KACXX:3:1101:13551:2136 2:N:0:GCTCGGTA
CCCGGGCGATGTGTTCTACCTGCACTCTCGTCTTCTGGAAAGAGCAGCCAAAATGAACGATGCATTTGGAGGGGGATCCCTGACCGCTCTCCCAGTCATC
+
CCCFFFFFHHHHGIIJJJJIJIJJIJJJJJHJIJJJJJJGIJIJJJJJJIGHJHHHHHFFFFDDEEEEDDDDDDDB@BDDDDDDDDDDDDDDDDDDDDEC
File renamed without changes.
Loading

0 comments on commit 3dbb176

Please sign in to comment.