GitHub

To run the code for datasets without errors:

python assembler.py input_file kmer_size

kmer_size is a range of values, so it can run multiple kmer size at once. It should be formatted like start-end.

To run the code for datasets with errors:

python assembler.py input_file kmer_size error_threshold

error_threshold is basically the number of time a kmer appears. When generating kmers from reads, an error in a read will result in a kmer that does not appear as often as other kmers. Try different error_threshold values to see if longest contig length changes.

To output to a file for easier reading, pipe stdout to file:

python assembler.py input_file kmer_size [error_threshold] > output_file

To run the mapping project:

python mapper.py genome_file reads_file thread_count

thread_count is the number of cores you'd like to run this program on (for speed-up purposes). Note: if any of the arguments are missing the program will fail. To output to a SAM file to use with samtools:

python mapper.py genome_file reads_file thread_count > file_name.sam

To run the mapping project with errors:

python mapper.py genome_file reads_file thread_count kmer_size

kmer_size is the size of the kmer a read will be broken down into for error handling.

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
Mapping Project		Mapping Project
.gitignore		.gitignore
CS418-Project2		CS418-Project2
README.md		README.md
assembler.py		assembler.py
ba5i.py		ba5i.py
results		results
tree.ps		tree.ps
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Contributors 2

Languages

jy19/CS418

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages