This is a collection of scripts and accompanying example datasets that are intended to be used internally for commonly used bioinformatic pipelines including: antismash, bigscape, genome assembly, local blast, and differential abundance analysis. To use these tools, simply click on the folder and follow the installation and usage instructions in the corresponding README.md file. A folder titled "example_data" is included with each pipeline serving as a test case and example for proper script function.
Almost all bioinformatic programs are created without a graphical user interface (GUI, basically a window) and must be used from the command line interface (CLI). We will use Windows Subsystem for Linux (WSL) to install a Linux virtual machine in Windows thus allowing us to use the CLI to control our bioinformatics programs. This is also more convenient as the Linux file system will be accessible from the Windows File Explorer and vice versa.
Refer to this page for details: https://learn.microsoft.com/en-us/windows/wsl/install
- Open the search menu and type "Windows Powershell"
- Enter one of these commands to install WSL or WSL and Ubuntu:
# This will install the default Linux distribution which is Ubuntu. If you want to use a different distribution see below.
wsl --install
# Use this command to choose a specific Linux distribution, here I am using debian
wsl --install -d Debian
- Restart your computer to complete the installation
Now you should be ready to use Linux on your Windows machine!
To access the Linux distribution we just installed, we will need a CLI program (Terminal).
- In the Windows search bar, type "Microsoft Store"
- In the Microsoft Store, search for "Windows Terminal" and install it
- Open Windows Terminal and click the dropdown menu (downward arrow) – You should see Ubuntu (or whatever Linux distribution you installed) listed here
- Click on Ubuntu, you have now opened your first Linux terminal (shell)
I highly suggest you play around here to get used to using the CLI to navigate your file system. For example, the command ls
will list all of the files in your current directory. To make a new directory (folder), type mkdir new_dir
where "new_dir" is the name of the new directory. Then if you type ls
again, you will see your new directory. Then you can use cd new_dir
to change directory to that directory.
In this new directory, if you type ls
again, you will not see any files because we haven't added any files here. To make a file, use touch new_file
to make a new file with name "new_file" and nano new_file
to edit that file. We can also change the name of the file using mv new_file new_name
. Then ls
again will show that the file now has a new name. Note that the mv
command is actually used to move files. So, we can use mv new_file ~
to move the file to the home directory (denoted by the variable ~
). Then if we type ls ~
we can see the new file in our home directory.
See the link below for a more in depth tutorial on basic shell commands.
- Use the command
sudo apt update
to refresh the list of isntalled and installable packages - Use the command
sudo apt upgrade
to update all installed packages - To run these commands together, use
sudo apt update && sudo apt upgrade
. You should do this before installing any new packages to ensure you have the most up-to-date versions available.
Miniconda is a minimal installer for Conda, Python, their dependencies and a small collection of packages. It is the most commonly used environment manager for Python-based programs and is heavily used in almost all bioinformatics tools. We will be installing Miniconda3 to manage our Python virtual environments. To isntall Miniconda, follow these instructions provided by the University of British Columbia: Miniconda installation tutorial by UBC.
Conda is a very well-documented program, for a list of common conda commands, see this cheatsheet.
Text editors are essential to writing scripts. I prefer to use Sublime Text but any rich text editor will do (i.e. notepad++, VSCode, Atom, etc.)