Unix is an operating system invented in the early 1970s at AT&T Bell Labs. Today there are many variants of Unix in wide use around the world, including the Linux operating systems and macOS.
The key elements provided by a Unix-like operating system are
- a file system, consisting of folders which can nest and store
- a set of programs, each serving a limited function,
- a shell which provides mechanisms for constructing workflows involving multiple programs and files.
Several Unix shells are available, but the most popular ones provide approximately the same functionality and interface. The most popular shell is called bash. Bash is the default shell in macOS (Mojave and earlier) and some Linux distributions. As of 2016, you can also run bash natively on Windows. If you are a Windows user, it is recommended that you go ahead and install the Windows Subsystem for Linux so you can use the same commands as Linux and Mac users.
If you want to follow along below before you figure out your local setup, you can use the executable cells you see in this page (which are bash cells, not Python cells) or launch a Binder instance (select Terminal or bash from the New pull-down menu in the top right). The latter approach is recommended, because that environment provides some shortcuts that will be helpful to practice (like completing commands and file names when you hit the tab key).
When you first open the shell, you'll be in your home
pwd (which stands for print working directory).
On Linux, the users' home directories are in a directory called
/home/, while on macOS they're in
/Users/. Since your user name on Binder is
jovyan (a sci-fi reference to a term that means an inhabitant of Jupiter), the directory printed when you run the cell above is called
/home/jovyan. The character
~ has a special meaning: it is automatically expanded to the path for your home directory.
~refers to the user's home directory
pwdprints the contents of the current working directory
/home/jovyan is called a path. The forward slashes in a path separate directories, and each directory or file in the path is in the directory immediately to its left. For example,
The very first slash is the root directory, and all of the files and directories on the machine are nested in this directory.
You can view the contents of directory with
ls, and you can change directory using the
cd command. If the initial slash is omitted in a directory name, the name is interpreted relative to the current directory. For example, you can navigate to
/Users/jovyan from the
/Users directory by running
cd jovyan. Note that arguments are supplied to Unix commands by separating them with spaces following the name of the command. You can also navigate to containing folders using
... For example,
cd ../../ navigates to the grandparent directory of the current directory.
Write three lines of Unix code in the cell below which change directory into
my-data-science-project, list the contents of that directory, and then change back to the original directory.
Solution. Here's an example solution
cd my-data-science-project ls cd ../
Since the original directory was the user's home directory, we could have used
cd ~ instead in the last step.
List the files in the subdirectory
bin of the root directory.
Solution. The simplest way to do it in one line is
mkdir command makes a new directory. So we can make a new directory, check that it's there, and navigate into it as follows:
mkdir example-directory # won't return anything!
cd example-directory # won't return anything!
One extremely useful shortcut is to type an initial part of the file or directory name and hit the tab key to get the rest to pop up (note that this does not work in the cells above, but it will work on your own computer or on mybinder.org). You can also hit the tab key twice to get a list of possible completions. Using this tab completion feature is advised, for two reasons: (1) it saves typing time, and (2) it reduces spelling errors. If the shell is still completing directory names in your path as you type it, you can be sure that those directories are actually present in the operating system. If you insist on typing out the path in full, it takes significantly
Another time-saving device is the use of the up and down arrow keys to access previously used commands. You can see a list of what you've run in the shell with the
The position of the cursor in the shell cannot be controlled with your mouse or trackpad. Therefore, it is essential to master a few keyboard shortcuts to avoid having to press the forward and backward arrow keys dozens of times when you need to navigate the text at the prompt.
ctrl-aMove the cursor to the beginning of the line
ctrl-eMove the cursor to the end of the line
ctrl-lClear the screen
ctrl-cQuit the command that is currently running
alt-fMove the cursor forward one word (
alt-bMove the cursor backward one word (
Note that you can't directly use a space character in a Unix path name, because it would be interpreted by bash as an argument separator. To accommodate a file with a space in its name, escape the space by putting a backslash in front of it. For example,
cd My Essays changes directory into a folder called "My Essays".
Here are some other important commands:
mvMove a file from one directory to another
rmRemove a file
cpCopy a file from one directory to another
touchCreate a file or update its last-modified time
openOpen a file (
catPrint the contents of a file to the terminal
lessView the contents of a file in a viewer
manShow the documentation for a command
headPrint the first 10 lines of a file
tailPrint the last 10 lines of a file
wcCount the number of words, lines, and characters in a file
grepFind specific text in file contents
vimOpen an editor for making changes to a file
Many commands in bash take
rm -i gives you an interactive session where you can say for each file whether you want to delete it. Some options can themselves take arguments, in which case those arguments are listed directly after the option. For example,
head -n 20 data.txt prints the first 20 lines of the file
data.txt. You can read about the options a command takes by viewing its man page (for example,
Navigate into the
my-data-science-project directory and the use the
grep command to figure out which file contains the text
Some helpful information: (i)
grep -r text directory searches recursively for
text in the
directory, and (ii)
. is an alias for the current directory.
Solution. Running the commands below, we find that
setup.py contains the
cd my-data-science-project grep -r find_packages .
Vim is the command line text editor which most consistently available on Unix systems. As a result, you will sometimes find yourself needing some basic familiarity with it, even if you use another editor for the bulk of your work. Furthermore, vim is designed to prioritize efficiency over intuitiveness, so it's really helpful to learn a few vim ideas before you need them. To practice with Vim, open this course's Binder page, open a new Terminal ("New", top right), and run
vim tmp.txt. Alternatively, you can run vim in your own Terminal if you have macOS or Linux, or you can download it for Windows.
The most important distinction between vim and most other text editors is that it has multiple modes, the main ones being insert mode and command mode. Insert mode is similar to what other editors provide: keystrokes you type appear as characters in the file. Command mode is for performing various actions on the file.
A vim session often opens to command mode by default. To activate insert mode, press
i. To get back to command mode, press the escape key. To save a file, type
:w while in command mode and press enter. To close the file, type
:q from command mode and press enter. To force-exit vim, type
:q! while in command mode and press enter.
To undo and redo, use
ctrl-r. Copy and paste are
p; Page up and page down are
The single most important vim command is the one for force-exiting, because sometimes a vim editor opens automatically when you run some other command, and all you want to do is get out. If you are in insert mode, what key sequence must you enter to force-exit vim?
Solution. The correct key sequence is
[esc]:q!: the escape button switches to command mode, and then
Bash supports variable definition using similar syntax to Python. The main differences are (1) spaces cannot be used around the equals sign, and (2) variable names are conventionally all upper case. Another distinction from Python is that a dollar sign is required to access a variable's value:
MY_FAVORITE_NUMBER=3 echo $MY_FAVORITE_NUMBER
echo simply prints its arguments.
Some special variables are available in a bash session without you having to define them yourself. For example, if you run
echo $PATH, you'll see a colon-separated list of directories. These are the directories where
bash searches for
which command. For example
which echo prints
/bin/echo. If you look in the
/bin directory, you'll see that many of the bash commands we've discussed so far are actually executables in that directory.
Utilities you install on your computer often make their
PATH. This is done by inserting a line of code in your bash profile, which is a file with a special name that is read by bash every time you start a bash session. For example, if you have a directory, say
Users/jovyan/anaconda3/bin, which contains executables that you want to be able to run from the command line, you can add the line
~ refers to your
In the command
export PATH="/Users/jovyan/anaconda3/bin:$PATH", the dollar sign is used to access the original value of
PATH (so that you're adding to the set of
PATH directories, not replacing all of the ones that were stored in
PATH previously), and the
export command makes the new value of
PATH available to the bash session (rather than just the
If you try to run a command and bash says
command not found, one strong possibility is that the executable file that should run that command is "not on your PATH" (a phrase you will see often on StackOverflow!). The solution to this problem is to locate the executable's directory—usually by searching the internet to figure out where the installer puts the executable by default—and edit your
Write a line of bash code that adds
/Library/Frameworks/R.framework/Resources to the end of
PATH, so that directory is searched for executables last when a command is run in bash. Where should that line of code be placed?
Solution. The appropriate bash command is
export PATH="$PATH:/Library/Frameworks/R.framework", and it should go in
The output of a command like
echo $PATH, which prints to the screen by default, may be redirected to a file using the operators
>> or fed as input to another bash command on the same line using the pipe operator
|. The use of such operators in Unix is called piping, and it's a key element of bash's design.
The difference between
>> is that the former eliminates whatever might have been in the file previously, and the latter appends to the end of the target file's current contents.
tmp.txt will contain two lines of text after these two commands are run:
echo "This is the first line" > tmp.txt echo "This is the second line" >> tmp.txt
You can check that this worked as expected by running
The pipe operator is the mechanism for composing commands in Unix. For example,
echo "The quick brown fox jumped over the lazy dog" | wc
forwards the text returned by the first command to the
wc command, thereby counting the number of lines, words, and characters in the sentence
"The quick brown fox jumped over the lazy dog".
Write a three-command pipe, using
tail, prints the portion of a document
mydoc.txt between lines 100 and 110.
Solution. If we select the first 110 lines, then the desired lines are the last 11 lines of that selection. So we can do
cat mydoc.txt | head -n 110 | tail -n 11
Performing actions on a single file at a time can get pretty time-consuming if there are many files involved. Consider, for example, a directory with 1000 images files, one for each frame of a short video. Suppose the images are named
img001.png, and so on. If you want to move all of these files into a subdirectory called
frames, you can do the third and fourth lines of this block:
touch img000.png # make sure there are actually touch img001.png # image files to move mkdir frames mv img*.png frames/
The asterisk in the file name is telling the command to act on every file whose name looks like
img, followed by any number of other characters, followed by
.png". We call
img*.png a glob pattern (short for global). The asterisk is a wild card. The other common wildcards are
?, which matches any single character, and expressions like
[a-e] which match any single character in a given range of characters. You can also list out the characters to match:
[aeiou] matches any lowercase vowel.
Which of the following names match the glob pattern
Solution. The first and third options match. The second one doesn't because the pattern specifies that the first character must be uppercase or lowercase