Practical Computing for Biologists: Chapters 2, 5, 6, 16, Appendices 2, 3.
Unix “Basics” and “Finding Things” from UConn CBC: http://bioinformatics.uconn.edu/unix-basics/
Software Carpentry Shell Novice lesson: Episodes 5-7: https://swcarpentry.github.io/shell-novice/
Review basic commands and server access from UConn_Unix_basics
1. pwd - print working directory
2. ls - list directory contents
3. mkdir - create a directory
4. unzip - decompression
5. mv - move file
6. cp - copy file
7. cat - print contents of file
8. touch - create empty file
9. rm - remove file
10. wc - count lines/words/characters/ in file
11. > - redirects output to new file
12. >> - redirects output to append to existing file
13. * - wildcard that specifies any input
grepWe can search for patterns inside of files and print them using the grep command.
Let’s head over to the writing directory and try using grep:
Have a look at haiku.txt:
# first we must navigate to the correct folder:
cd ~/MEDS5420/data-shell/writing/leisure/
cat haiku.txt
## bash: line 1: cd: /Users/runner/MEDS5420/data-shell/writing/leisure/: No such file or directory
## Errors teach, not taunt,
## Each bug a lesson hidden,
## Growth in every line.
##
## With searching comes loss
## and the presence of absence:
## "My Thesis" not found.
##
## Yesterday it worked
## Today it is not working
## Software is like that.
When was haiku.txt last modified?
ls -l haiku.txt
## -rw-r--r-- 1 runner staff 220 Jan 28 23:05 haiku.txt
grep bug haiku.txt
## Each bug a lesson hidden,
In the above command, the first argument “bug” is the pattern we are searching for. The default action for grep is to return the entire line in which the pattern was found.
Let’s instead search for the word day:
grep day haiku.txt
## Yesterday it worked
## Today it is not working
In this case that grep shows us results with larger words containing “day”.
We might instead only want to see exact words not part of larger words.
To impose word boundaries, we use the -w flag:
grep -w day haiku.txt
There are no results because “day” is only part of larger words in haiku.txt
Sometimes we want to search for more than a single word.
To search for a phrase, we need to use double quotes so that grep treats the pattern as a single argument.
grep -w "is not" haiku.txt
## Today it is not working
Other very useful grep flags are -n, -i and -v:
grep -n "it" haiku.txt
## 5:With searching comes loss
## 9:Yesterday it worked
## 10:Today it is not working
grep -n -w -i "the" haiku.txt
## 6:and the presence of absence:
As you might have guessed:
-n prints the line number of the matching line.-i ignores capitalization (also called “case”; the “i” comes from case-insensitive)You can learn more about grep flags using grep --help
The real power of grep is using a special class of wildcards known as “regular expressions” (the “re” in grep).
Let’s use regular expressions to find lines where the second letter is “o”:
grep '^.o' haiku.txt
## Today it is not working
## Software is like that.
Explanation of the pattern:
^) tells grep to only look from the start of a line rather than anywhere in the line..) tells grep to match any single character (letter, number, or symbol) - basically a single character wild card.Some other useful expression in grep:
$ specifies the matching at end of a line.* in grep, the asterisk is a repetition operator. This is commonly coupled to . as .* to act as a wild card of unspecified lengthLearning the full power of regular expressions takes time, but for now just know that they exist. If you want to make use of them, check out these cheat sheets and other online resources.
haiku.txt:Look in the song_lyrics folder inside the data-shell folder and you should see a single file: TS_example.txt
The TS_example.txt file contains lyrics to a song by a well-known contemporary artist. Using the command line utilities you have learned, try the following:
1. Print the number of lines in the file.
2. Print the lines and line number that have the word ‘shake’ in them to a new file called shake-lines.txt.
3. Print the number of lines that have the word ‘shake’ in them.
4. Devise a way to print the number of times ‘shake’ appears in the song. Be sure to include all instances or forms of the word.
*hint: use the manuals for different functions to see what your options could be.
findfind .
find shows us files and directories.
The power of find is in specifying “tests” or “filters”.
find . -type f
find . -type d
The above filters search for files and directories, respectively.
Search depth:
One can specify how far down the file hierarchy to go by controlling depth (first you should navigate one directory closer to root than the data-shell directory):
find ./data-shell -maxdepth 2 -mindepth 2 -type f
The above command searches for all files two directory levels within the data-shell folder.
Quick try: Try other combinations of levels and types and verify by counting number of items in output.
Let’s try matching by name:
find ./data-shell -name "*.txt"
Quick try: Combine find and grep to find the number of text files within 2 and 3 levels inside of the data-shell folder.
variables are strings that can be assigned values. To create a variable use the following format:
var=variable # when setting variable do not use spaces
To see what the variable is you can print it to the screen with echo:
echo $var # the '$' designates that this is a variable
*Try using echo without the dollar sign.
Whole sentences or lists can be designated as variables:
fileList=*.txt
echo $fileList
If your variable is going to be combined with another string, make sure you surround the variable with a curly bracket. For instance:
School=MEDS
echo listing: $School
echo this class is $School_5420 # something's wrong
echo this class is ${School}_5420 # properly inserts variable
## listing: MEDS
## this class is
## this class is MEDS_5420
In short, double quotes allows variables within strings to be interpreted, whereas single quotes makes them literal.
Try out this example:
instructor="Michael Guertin"
echo "My instructor for MEDS5420 is $instructor"
## My instructor for MEDS5420 is Michael Guertin
or
instructor="Michael Guertin"
echo 'My instructor for MEDS5420 is $instructor'
## My instructor for MEDS5420 is $instructor
Uses for backticks - the key usually just under the escape key.
Backticks allow one to insert the result of a command in place within a command line.
One nice use for this is to set variables as outputs of commands.
Here’s an example with the command date that prints the date and time to the screen:
Compare the two following examples:
info=date
echo The date and time is: $info
## The date and time is: date
vs.
info=`date`
echo The date and time is: $info
## The date and time is: Wed Jan 28 23:06:11 UTC 2026
Backticks can cause problems when using other quotations, so there is another way to run a command in place:
echo The date and time is: $(date)
## The date and time is: Wed Jan 28 23:06:11 UTC 2026
Certain text editors are designed for scripting and can recognize code. Importantly, they do not embed the document or fonts with hidden characteristics that can cause problems when running programs on you computer. There are three features that you should look for in an editor:
1. language specific highlighting
2. line number display
3. version control
MAC USERS: Download BBedit here: http://www.barebones.com/products/bbedit/download.html?ref=tw_alert: http://www.barebones.com/products/bbedit/download.html?ref=tw_alert and then install it:
Open your text editor BBEdit on Mac.
PC USERS:
download Visual Studio: https://visualstudio.microsoft.com/downloads/
or
download notepad++ here: https://notepad-plus-plus.org/
Note: You can also use emacs or other command line editors such as nano or vim. These command line editors are the functional equivalent of opening a file in BBEdit, Visual Studio, or NotePad. The interface is a bit clunky and requires keyboard prompts to save, write, and exit. We will be using emacs or nano when we work on the server next time.
Figure 1: XKCD: Real programmers
A quick primer for emacs is:
#generate the file
touch filename.sh
# open the emacs command line editor
emacs filename.sh
#you are now in EMACS
write some code
ctrl-X ctrl-W to save as another name
make edits
ctrl-X ctrl-S to save
ctrl-X ctrl-C to exit
#you are back in the Terminal
A quick primer for nano is:
touch filename_nano.sh
nano filename_nano.sh
#you are now in NANO
write some code
ctrl-O, then <Enter/Return> to save
ctrl-O, then backspace to write as a new file name, then <Enter/Return> to save
make edits
ctrl-O, then <Enter/Return> to save
ctrl-X to exit
haiku.txt:grep -i '^s' haiku.txtgrep -i 'd$' haiku.txtgrep -i '^t.*g$' haiku.txt\< (start of word) and \> (end of word).grep -w -i "n.*" or grep -i '\<n' haiku.txtThe TS_example.txt file contains lyrics to a song by a well-known contemporary artist. Using the command line utilities you just learned, try the following:
1. Print the number of lines in the file: wc -l TS_example.txt
2. Print the lines and line number that have the word ‘shake’ in them to a new file called shake-lines.txt: grep -n -i shake TS_example.txt > shake-lines.txt -this answer will print all instances of shake regardless of upper and lowercase letters due to the -i option.
3. Print the number of lines that have the word ‘shake’ in them: grep -c shake TS_example.txt -this answer is not sensitive to upper and lowercase
4. Devise a way to print the number of times ‘shake’ appears in the song: grep -o -i shake TS_example.txt | wc -w
-here the -o will only print the shake part of the line to a new line in the output and we can pipe to wc and count the number of words