CS5600 Lab2: Shell

In this lab, you will learn how a shell is built. You will improve (or reinforce) your shell-using skills. You will also gain experience with C programming (you will interact with critical constructs, such as pointers and data structures, as well as standard idioms like linked lists). Along the way, you will use the fork(), a system call that we’ve intensively discussed in lectures.

A shell parses a command line, and then runs (executes) that command line. One can also think of GUIs as shells, in which case the “command line” is expressed by the user’s mouse clicks, for example. We’ve given you the skeleton, and nearly all of the parsing code, for a simple but surprisingly functional shell. You will complete the parsing code. You will also fill in the logic for executing the command line: you will implement support for executing commands, I/O redirection, pipes, as well as conditional, sequential, and background execution. You will also implement several internal commands, such as cd. (Puzzle: why does cd have to be built into the shell, when most other commands are executed by exec()ing an existing program?)

Two notes:

There is not much code to write, but there is a lot to absorb. We observed from prior semesters that students are eager to write code sometimes before understanding what to write! Please read lab instructions carefully and code when you know what you’re doing. Please expect to come back and read this page many times when working on your Lab2.
We recommend beginning this lab early (again, early is often earlier than you think).

Challenge: Lab2’s challenge is optional. In this challenge, you will implement an additional piece of functionality, described at the end of the lab.

Section 0: Getting started

Click the GitHub Lab2 link on Canvas homepage to create your Lab2 clone on GitHub.
Start your VM and open the VM’s terminal.
Clone Lab2 repo to VM:
```
$ cd ~
$ git clone git@github.com:NEU-CS5600-21fall/lab2-<Your-GitHub-Username>.git lab2
```
Note that the repo address git@github.com:... can be obtained by going to GitHub repo page (your cloned lab2), clicking the green “Code” button, then choose “SSH”.
- Why “21fall” in the repo address? This is an accident. See why in a Piazza question here.

Check contents:

$ cd ~/lab2
$ ls

// you should see:
Makefile  cmdparse.c  cmdparse.h  cmdrun.c
cmdrun.h  main.c      slack.txt   tester.pl

Note: all the shell commands below happen within the folder ~/lab2/.

Section 1: Shell commands and constructs (warm-up)

It will be much easier to do the coding work in this lab if you have some familiarity with shells in general; this is for two reasons. First, comfort with shells makes you more productive. Second, if you have a good handle on what a shell is supposed to do, the coding work will make more sense. This portion of the lab is intended to provide some of that background (but some of the background you will have to acquire by “playing around” with the shell on your computer). In this part of the lab, we will interact with the installed shell on your system (rather than the source that you retrieved above). We will be assuming the bash shell, which is the default shell on both of the development platforms in this class.

Run a cmd

A shell is a program whose main purpose is to run other programs. Two of the key workhorses in a shell are fork() and execve() (as we saw in class). Here is a simple shell command:

$ ls -a

The shell parses this command into two arguments, ls and -a. The ls argument names the binary that should be executed. So the shell forks a child process to execute ls with those two arguments. The ls program has a simple job: it prints all file names in the current working directory to the console. (ls -a will show all files, including hidden ones.) Meanwhile, the parent (the shell) waits for the child to finish; when it does, the parent returns to read another command.

You may be interested in a reasonable tutorial for Unix shells. You can find others by searching for, e.g., “shell tutorial” on Google. Let us know if you find one you really like.

Input/output redirection

Each program has standard input, standard output, and standard error file descriptors, whose numbers are 0, 1, and 2, respectively. The ls program writes its output to the standard output file descriptor. Normally this is the same as the shell’s standard output, which is the terminal (your screen). But the shell lets you redirect these file descriptors to point instead to other files. For example:

$ ls > files.txt

This command doesn’t print anything to the screen. But let’s use the cat program, which reads a file and prints its contents to standard output, to see what is in output.txt:

$ cat files.txt
// you should see a list of file names of the current directory

The > filename operator redirects standard output, < filename redirects standard input, and 2> filename redirects standard error. (The syntax varies from shell to shell; we generally follow the syntax of the Bourne shell or bash.)

Backgrounding

You can also execute a command in the background with the & operator. Normally, the shell will not read a new command until the previous command has exited. But the & operator tells the shell not to wait for the command.

$ echo foo &
$ foo

Note: foo is printed on top of the next shell prompt.

Command separator

Shells offer several ways to chain commands together. For example, the ; operator (also called “command separator”) says “do one command, then do another”. This shell command prints two lines:

$ echo foo ; echo bar
// you should see:
foo
bar

Conditional chaining

The && and || operators chain commands together based on their exit status. You can think of the exit status as a command’s “return value”. If a command accomplishes its function successfully, that command generally exits with status 0, by calling exit(0). (This is also what happens when a program runs off the end of its main function.) But if there is an error, most commands will exit with status 1. For example, the cat command will exit with status 0 if it reads its files successfully, and 1 otherwise:

$ cat output.txt
foo                                         // exit status 0

$ echo $?
0

$ cat NULL.txt
cat: NULL.txt: No such file or directory    // exit status 1

$ echo $?
1

(The special variable $? in bash contains the exit value of the previous command. However, the shell we build in this lab does not support this variable.)

Now, && says “execute the command on the right only if the command on the left exited with status 0”. And || says “execute the command on the right only if the command on the left exited with status NOT equal to 0”. For example:

// suppose files.txt exists and contains "foo"
$ cat files.txt && echo "files.txt exists!"
foo
files.txt exists!

// suppose NULL.txt doesn't exist
$ cat NULL.txt && echo "NULL.txt exists!"
cat: NULL.txt: No such file or directory    // Note: does not run echo!

$ cat files.txt || echo "files.txt does not exist."
foo

$ cat NULL.txt || echo "NULL.txt does not exist."
cat: NULL.txt: No such file or directory
NULL.txt does not exist.

Subshell

Parentheses ( ) allow you to run a set of commands in a subshell. This, in turn, lets you redirect the standard input, output, and error of a group of commands at once. For example:

$ ( echo foo ; echo bar ) > output.txt
$ cat output.txt
foo
bar

The exit status of a subshell is the same as the exit status of the last command executed.

Pipe

Finally, the pipe operator | sends the output of one command to the input of another. For example:

$ echo foo | rev
oof

Note: rev reverses a string.

Another example:

$ echo "foo\nbar" | shuf -n 1
// the output can be either foo or bar

Note: you have to install shuf first. (How? See command not found.)

Builtin commands

In addition to running programs from the file system, shells have builtin commands that provide functionality that could not be obtained otherwise. Three builtins commands that our shell will implement are cd, pwd, and exit.

The cd command changes the shell’s current directory, which is the default directory that the shell uses for files. So cd dir changes the current directory to dir. (You can think of the current directory as the directory that shows up by default when you use an “Open” or “Save” dialog in a GUI program.) Of course, files can also be manipulated using absolute pathnames, which do not depend on the current directory; /home/studentname/lab2/cmdparse.c is an example. The pwd command shows the current working directory.

There may also come a time when you would like to leave your shell; the exit command instructs the shell to exit with a given status. (exit alone exits with status 0.)

(Why are cd and exit part of the shell instead of standalone programs?)

Some useful commands

You may find the following commands particularly useful for testing your shell. Find out what they do by reading their manual pages. Be creative with how you combine these!

cat (print one or more files to standard output)
echo (print arguments to standard output)
true (exit with status 0)
false (exit with status 1)
sleep (wait for N seconds then exit)
sort (sort lines)

Compiling, running, and testing

Now that you’ve acquired some familiarity with shell commands and constructs, it’s time to look at the code that you’ve been given.

Compile and run your shell:

$ cd ~/lab2/
$ make
$ ./cs5600sh

You can type commands to the shell; after each command you type, it will display the parser output and then attempt to execute the command. Initially, it won’t execute anything, but as you complete the lab, more and more kinds of commands will work.

You can run the parser on its own (no commands will be actually executed) :

$ ./cs5600sh -p

To quit your shell, you can press Ctrl-C (pressing the key Ctrl and C together). After you implement the exit command, you will be able to type “exit”.

Test cases. You can and should interact with your shell by typing commands to it. You should also run a collection of test cases that are meant to stress your implementation:

$ make test

This command invokes the file tester.pl; you should read this file to see what tests your shell is being given, and what the expected output is.

Note: passing tester.pl is necessary but not sufficient. We will be using additional tests to grade and test whether your code meets the specifications. Feel free to add additional tests to tester.pl.

Section 2: Parsing command lines

At a high level, cs5600 shell does two things: parse command lines, and execute those command lines. This piece of the lab is about parsing; the next (and longer) piece is about executing.

In our shell, parsing has two aspects (which are interleaved): converting a typed command line into a sequence of tokens, and converting that sequence of tokens into an internal representation (the execution piece acts on this internal representation). Tokens can be command names, command arguments, or special shell tokens. For instance, the command $ ls -l "foo bar" > zot consists of five tokens: ls, -l, foo bar (the quotes are not part of the token), >, and zot.

To motivate the internal representation, notice that command lines can contain multiple commands, separated by various operators: ;, &, |, &&, ||, (, or ). Our internal representation is a list of commands, with information about the operator that separates each command from the next (along with other information, such as what kinds of redirection the user specified for each command). The data structure that implements this representation is (mostly) a linked list of commands, with each node in the list containing information about the command (we said “mostly” because, owing to subshells, the data structure can be thought of as a hybrid of a list and a tree). The key structure in the code is the struct command (in cmdparse.h).

Exercise 1 read and play with the skeleton code

A good way to get familiar with a code base is to look at the key data structures and interfaces (which are usually in header files). To this end:
Read the code and comments in cmdparse.h.
Familiarize yourself with cmdparse.c and main.c. The implementation of parsing is in cmdparse.c, and is driven from main.c. You will need to have a solid understanding of this code in order to do the remaining exercises.
It may help to play around with the parser for the shell that you have been given:
$ make
$ ./cs5600sh -p                 // -p means "parse only"
cs5600$ ls -l "foo bar" > zot
Inspect the output.
You can and should try typing in some of the commands from the warm-up section.

There is missing functionality in cmdparse.c. It does not handle parentheses properly, and it does not free the memory that it uses. Your job is to fill these gaps in Exercise 2 and 3.

Exercise 2 implement cmd_parse()

Current code doesn’t handles parentheses and subshells:
$ make; ./cs5600sh
cs5600$ ls (abc def)
Syntax error
In cmdparse.c, complete cmd_parse() so that it handles parentheses and subshells.
If you did this correctly, then make test should now pass the first five tests (it will by default pass some of the successive tests).

Exercise 3 implement cmd_free()

In cmdparse.c, complete cmd_free().

We do not provide testing code for this. One hint: what does strdup() do? Type man strdup to find out.

A helpful resource may be gdb. See also our review session II with gdb introduction.

Speaking of resources…in this and the next part of the lab, you will need some comfort with C. We have been building this comfort in class, but you may still feel shaky. Here are some potentially helpful resources:

Milestone If you finished this section before 02/22, you’re in good shape; otherwise, you should hurry up! There is a fake deadline on Canvas (Lab2 (milestone)), which serves as a reminder only.

Section 3: Executing command lines

In this part of the lab, you will write the code to execute the parsed command line. Now would be a good time to take another look at main.c to see the high-level loop and logic for the entire shell.

Although we have given you the necessary skeleton, most of the “work” for executing command lines is unimplemented. Your job is to fill it in:

Exercise 4 finish cmdrun.c

In cmdrun.c, implement cmd_line_exec() and cmd_exec(). The comments in cmdrun.c give specifications and hints.
It may be best to implement cmd_line_exec() first, and then to fill out cmd_exec() in pieces, testing the functionality as you go (using make test and your own testing).

Your functions will need to make heavy use of system calls. To learn more about the exact use of these system calls, you want to consult the manual pages. The interface to these pages is a program called man. Systems programmers need to type man a lot; you should too.

For example, man waitpid will give you documentation on the parameters and return values to waitpid(). Note that sometimes you will need to specify a section of the manual. For example, man open may not give you documentation on the system call open(); you would want man 2 open.

Here is a list of some man commands that will be useful in this lab (you may need others). NOTE: when you don’t know how to implement a function in Exercise 4, read this list to see if you miss anything.

// the numbers may be optional, depending on your system
$ man 2 waitpid   # note WEXITSTATUS!
$ man 2 fork
$ man 2 open
$ man 2 close
$ man 2 dup2
$ man 3 strdup
$ man 3 strcmp
$ man 2 pipe
$ man 2 execve  # or "man execvp" or "man 3 exec"
$ man 2 chdir
$ man 3 getcwd
$ man 3 perror
$ man 3 strtol  

You can learn more about the man program by $ man man.

A hint, concerning the exec variants: note that the argv array includes the command name itself as the “zeroth” argument. For example, in the command echo foo, the initial element in the array is argv[0], which is the address of memory that holds the string echo. And argv[1] is the address of memory that holds the string foo. This is not necessarily intuitive.

A word about testing: if a particular test is giving you trouble, it may be helpful to type it into the standard shell (bash, as covered in the warmup earlier), and see what happens. Then play around with (or modify) the test, to try to understand what bash expects. That may give you insight about what your code is supposed to do.

(optional) Challenge: command queues

Taking this challenge should implement an additional piece of functionality. As usual, this is optional; successfully finishing this challenge will give you 20 extra credits (out of 100 for normal Lab2).

Challenge

You will implement command queues. There’s no skeleton provided, so leave extra time for this. Like background commands, command queues let a shell user run commands in parallel. But command queues limit the number of commands that execute simultaneously. For instance, a queue with parallelism 2 will run at most 2 commands at a time. Additional commands are saved in a queue; they execute, in order, as running commands complete, with at most 2 running commands active at any time. Command queues let a user run a bunch of commands without taking up lots of machine resources. It also is an interesting way to prevent certain limited fork bombs.
Implement command queues using three special shell commands.
The makeq qname size command creates a command queue named qname with parallelism size. You can also use makeq to reset the parallelism of an existing queue.
The q qname command... command adds a command to the command queue named qname. This command will be run when there is room in the queue, possibly immediately. Much like a background command, the q command returns immediately with status 0. (This is true even if command must wait to run.)
The waitq qname command blocks until qname is empty, then returns with status 0.
There is no command to delete a command queue.

Some examples may help. This simple example shows a command queue with parallelism 1. Such a queue runs one command at a time. Here, we enter the queue commands at the cs5600$ prompt:

cs5600$ makeq myq 1
[3 args "makeq" "myq" "1"] .
cs5600$ q myq echo "This is printed immediately."
[4 args "q" "myq" "echo" "This is printed immediately."] .
cs5600$ This is printed immediately.
cs5600$ q myq echo "That emptied the queue, so this is printed immediately too."
[4 args "q" "myq" "echo" "That emptied the queue, so this is printed immediately too."] .
cs5600$ That emptied the queue, so this is printed immediately too.
cs5600$ q myq sleep 1 ; q myq echo "This should run after 1 second." ; echo "This should be printed first." ; sleep 2
[4 args "q" "myq" "sleep" "1"] ;
[4 args "q" "myq" "echo" "This should run after 1 second."] ;
[2 args "echo" "This should be printed first."] ;
[2 args "sleep" "2"] .
This should be printed first.
This should run after 1 second.           // (it appeared after 1 second)
cs5600$

This example shows a queue with parallelism 2. This time, we give the commands all at once, in a command file. When run as ./cs5600sh < commandfile, cs5600sh will print messages with the described delays.

makeq myq2 2
q myq2 sleep 1
q myq2 sleep 2
q myq2 echo This is printed 1 second after the shell began.
q myq2 sleep 3
q myq2 echo This is printed 2 seconds after the shell began.
waitq myq2
echo This is printed 4 seconds after the shell began.

If a command queue command is called with too few arguments, or bad arguments, it should write q: Syntax error (or makeq: Syntax error, or waitq: Syntax error) to standard error. For example:

cs5600$ q X
q: Syntax error
cs5600$ q
q: Syntax error

Note that queue commands can have redirections just like normal commands.

cs5600$ q X 2> /tmp/x
cs5600$ cat /tmp/x
q: Syntax error

Queued commands should “wake up” immediately after the relevant job ID completes even if the shell is waiting for another process at the time. For example:

makeq myq3 1
q myq3 sleep 50
q myq3 echo This should appear after 50 seconds
sleep 100 ; echo This should appear after 100 seconds
  =>
... waits about 50 seconds, then prints: ...
This should appear after 50 seconds
... waits another 50 seconds or so, then prints: ...
This should appear after 100 seconds

Beware of errors! Unless you are careful, problematic commands might mess up your queue.

makeq myq4 1
q myq4 sleep 10
q myq4 echo This should appear after 10 seconds
q myq4                                                   // Should cause a syntax error
q myq4 echo This should appear immediately thereafter    // ... but a common error might cause it to NEVER appear

Subshell note: Command queues in subshells (parentheses) don’t need to be supported. The queueing commands do not mean anything special inside a subshell.

Maximum queue size note: A command queue must be able to support up to 1024 enqueued processes, but it doesn’t need to support more than 1024 enqueued processes.

Current directory note: We will not test interactions between the cd command and queued commands. In the following, it is OK for the queued ls command to execute in either / or /tmp.

makeq myq5 1
q myq5 sleep 10
cd /
q myq5 ls
cd /tmp

Prompt note: You do not need to run queued commands while cs5600sh is blocked at a cs5600$ prompt. (You do need to run queued commands while cs5600sh is blocked waiting for another process, as described above.)

Implementing the q command will be a bit tricky. Questions to ponder: How does the shell know when a background process completes? How can the shell tell a waiting q command to start running?

Implementing the waitq command is also tricky. If you design your q implementation well, you should be able to reuse most of it for waitq! But note the difference: a q command will run whenever there is room in the queue, but waitq must not complete until the queue has no active processes.

It is OK if your command queue implementation creates a process for a queued command as soon as it is entered. However, those processes must block until it is their turn to run according to the queue.

Finally, submit your work

Submitting consists of three steps:

Executing this checklist:
- Fill in slack.txt with (1) your name and NUID, (2) slack hours you used, (3) if you took the challenge, and (4) the materials that you consulted.
- Make sure that your code can make without problems.
- Make sure you have committed all your changes (check with git status).
- Make sure you have added (git add) all files that you created.

Push your code to GitHub:

 $ cd ~/lab2
 $ git commit -am 'submit lab2'
 $ git push origin 

 Counting objects: ...
  ....
  To ssh://github.com/NEU-CS5600-21fall/lab2-<username>.git
   7447116..ceed758  main -> main

Actually commit your lab (with timestamp and git commit id):
1. Get the git commit id of your work. A commit id is a 40-character hexadecimal string. You can obtain the commit id for the last commit by running the command git log -1 --format=oneline.
2. Paste both your git repo address and the commit id (the hexadecimal string) to Canvas. In Canvas, there will be an assignment for this lab. You should paste the git repo address and the commit id in two lines:
```
 git@github.com:NEU-CS5600-21fall/lab2-<username>.git
 29dfda9c788fade33421f242b5dd8312732fd3c9
```
  Notice: the repo address must start with git@github.com:... (not https://...). You can get your repo address on GitHub repo page by clicking the green “Code” button, then choose “SSH”.
3. Note: You can submit as many times as you want; we will grade the last commit id submitted to Canvas. Also, you can submit any commit id in your pushed git history; again, we will grade the commit id submitted to Canvas.

NOTE: Ground truth is what and when you submitted to Canvas.

A non-existent commit id in Canvas means that you have not submitted the lab, regardless of what you have pushed to GitHub—we will not grade it. So, please double check your submitted commit id!
The time of your submission for the purposes of tracking lateness is the timestamp on Canvas, not the timestamp on GitHub.

This completes the lab.

Acknowledgments

This lab is created by Eddie Kohler, with modifications by the CS5600 staff. Lab instructions are adapted from Mike Walfish’s cs202 lab instructions.