pagetitle |
---|
SARS-CoV-2 Genomics |
:::highlight
Questions
- How can I combine existing commands to do new things?
- How can I save and re-use commands?
Learning Objectives
- Construct command pipelines with two or more stages.
- Create files from the command line using a text editor.
- Write a shell script that runs a command or series of commands for a fixed set of files.
- Run a shell script from the command line.
- Customise shell scripts to work with inputs defined by the user.
:::
:::note This section has an accompanying slide deck. :::
Now that we know a few basic commands, we can finally look at the shell's most powerful feature: the ease with which it lets us combine existing programs in new ways.
If we wanted to count how many files we have in the data/sequencing_run1/
directory, we could do it in two steps:
$ ls data/sequencing_run1/ > sequencing_files.txt
$ wc -l sequencing_files.txt
- Used the
ls
command to list the files and save the output in a text file (with the>
redirect operator). - And then count the lines in the resulting file.
But what if we wanted to do this without creating the file?
It turns out we can send the output of one command and pass it on to another using a special operator |
called the pipe.
Like this:
$ ls data/sequencing_run1/ | wc -l
4
What happened there is that the output of ls
was sent "through the pipe" to the wc
command.
:::exercise
Let's go back to our primer files:
$ ls artic_primers_pool*.bed
Using the |
pipe, write a chain of commands that does the following:
- Combine (or "concatenate") both primer files.
Hint
Use thecat
command. - Search for lines with the pattern "LEFT".
Hint
Use thegrep
command. - Count the number of lines of the output.
Answer
The three commands we want to use to achieve this are:
cat
to concatenate the files.grep
to only print the lines that match "LEFT".wc -l
to count the number of lines.
We can chain all the commands together like this:
$ cat artic_primers_pool*.bed | grep "LEFT" | wc -l
109
:::
:::exercise
If you had the following two text files:
cat animals_1.txt
deer
rabbit
raccoon
rabbit
cat animals_2.txt
deer
fox
rabbit
bear
What would be the result of the following command?
cat animals*.txt | head -n 6 | tail -n 1
Answer
The result would be "fox". Let's go through this step-by-step.
cat animals*.txt
would combine the content of both files:
deer
rabbit
raccoon
rabbit
deer
fox
rabbit
bear
head -n 6
would then print the first six lines of the combined file, so:
deer
rabbit
raccoon
rabbit
deer
fox
And finally tail -n 1
would return the last line of this output:
fox
So far we have been running commands directly on the console, in an interactive way. However, to re-run a series of commands (or an analysis), we can save the commands in a file so that we can re-run all those operations again later by typing a single command. The file containing the commands is usually called a shell script (you can think of them as small programs).
For example, let's create a shell script that counts the number of sequences in one of our FASTA files in the data
directory.
We could achieve this with the following command:
$ cat data/envelope_protein.fa | grep ">" | wc -l
To write a shell script we save this command within a text file. But how do we create a text file from within the command line?
There are many text editors available for programming, but we will use a simple one that can be used from the command line, called nano
.
First let's create a directory to save our script:
$ mkdir scripts/
Now we can create a file with Nano in the following way:
$ nano scripts/count_fasta_sequences.sh
This opens a text editor, where you can type the commands you want to save in the file.
Once we're happy with our text, we can press Ctrl+O (press the Ctrl or Control key and, while holding it down, press the O key) to write our data to disk.
We'll be asked what file we want to save this to: press Return to accept the suggested default of scripts/count_fasta_sequences.sh
.
Once our file is saved, we can use Ctrl+X to quit the editor and return to the shell.
We can check with ls scripts/
that our new file is there.
Note that because we saved our file with .sh
extension (the conventional extension used for shell scripts), Nano does some colouring of our commands (this is called syntax highlighting) to make it easier to read the code.
:::note Text Editors
When we say, "nano
is a text editor," we really do mean "text": it can only work with plain character data, not tables, images, or any other human-friendly media.
We use it in examples because it is one of the least complex text editors.
However, because of this trait, it may not be powerful enough or flexible enough for the work you need to do after this workshop.
On Unix systems (such as Linux and Mac OS X), many programmers use Emacs or Vim. Both of these run from the terminal and have very advanced features, but require more time to learn.
Alternatively, programmers also use graphical editors, such as Visual Studio Code. This software offers many advanced capabilities and extensions and works on Windows, Mac OS and Linux. :::
Now that we have our script, we can run it using the program bash
:
$ bash scripts/count_fasta_sequences.fa
Which will print the result on our screen.
The script we wrote so far works on a specific FASTA file (in our example data/envelope_protein.fa
).
But what if we wanted to give it as input a file of our choice?
We can make our script more versatile by using a special shell variable that means "the first argument on the command line".
Here is our modified script:
#!/bin/bash
echo "Processing file: $1"
cat "$1" | grep ">" | wc -l
We have done two things:
- We use the variable
$1
to indicate the file that we want to process will be given from the command line. - We use the
echo
command to print an informative message to the user.
If we run our new script, this is the result:
$ bash scripts/count_fasta_sequences.sh data/envelope_protein.fa
Processing file: data/envelope_protein.fa
4
:::highlight
Key Points
- The
|
pipe allows to chain several commands together. The output of the command on the left of the pipe is sent as input to the command on the right. - The
nano
text editor can be used to create or edit files from the command line. - We can save commands in a text file, which we call a shell script. Shell scripts have extension
.sh
. - We can run a shell script using the program
bash
. - We can use custom inputs to our script using the special variable
$1
.
:::