Lab2: CS3650 Shell

In this lab, you will learn how a shell is built. You will improve (or reinforce) your shell-using skills. You will also gain experience with C programming (you will interact with critical constructs, such as pointers and strings). Along the way, you will use the fork(), a system call that we will intensively discuss in lectures.

A shell parses a command line, and then runs (executes) that command line. One can also think of GUIs as shells, in which case the “command line” is expressed by the user’s mouse clicks, for example. We’ve given you the skeleton, and all of the parsing code, for a simple shell sh3650. You will also fill in the logic for executing the command line: you will implement support for executing internal and external commands and I/O redirection.

Some notes:

There is not much code to write, but there is a lot to absorb. We observed from prior semesters that students are eager to write code sometimes before understanding what to write! Again, doing labs is supposed to be a learning (instead of evaluating) process. You need to study what you’re asked to do first.
Please read lab instructions carefully, and expect to come back and read this page many times when working on your Lab2.
In the instruction, we will mention commands (like ls), syscalls (like chdir), C library functions (like sprintf) that might be new to you. Please use man (you will see what this is later) or Google or chatGPT to figure out what they are.
We recommend beginning this lab early (again, early is often earlier than you think).

Section 0: Getting started

Click the GitHub Lab2 link on Canvas homepage to create your Lab2 clone on GitHub.
Start your VM and open the VM’s terminal.
Clone Lab2 repo to VM:
```
$ cd ~
$ git clone git@github.com:NEU-CS3650-labs/lab1-<Your-GitHub-Username> lab2
```
Note that the repo address git@github.com:... can be obtained by going to GitHub repo page (your cloned lab2), clicking the green “Code” button, then choose “SSH”.

Check contents:

$ cd ~/lab2
$ ls

// you should see:
Makefile  parser.c  parser.h  sh3650.c  slack.txt

Part 1: Shell commands and constructs (warm-up)

It will be much easier to do the coding work in this lab if you have some familiarity with shells in general; this is for two reasons. First, comfort with shells makes you more productive. Second, if you have a good handle on what a shell is supposed to do, the coding work will make more sense. This portion of the lab is intended to provide some of that background (but some of the background you will have to acquire by “playing around” with the shell on your computer). In this part of the lab, we will interact with the installed shell on your system (rather than the source code that you retrieved above). We will be assuming the bash shell, which is the default shell on both of the development platforms in this class.

A. Basic functionality

Run a cmd

A shell is a program whose main purpose is to run other programs. Two of the key workhorses in a shell are fork() and execve(). Here is a simple shell command:

$ ls -a

The shell parses this command into two arguments, ls and -a. The ls argument names the binary (executable program) that should be executed. So the shell forks a child process to execute ls with those two arguments: the first argument is the binary itself ls (yes, ls program will see its name as an input) and the second argument -a is what we provide to the binary. The ls program has a simple job: it prints all file names in the current working directory to the console. (ls -a will show all files, including hidden ones.) Meanwhile, the parent process (the shell) waits for the child to finish; when it does, the parent returns to read another command.

You may be interested in a reasonable tutorial for Unix shells. You can find others by searching for, e.g., “shell tutorial” on Google. Let us know if you find one you really like.

Internal commands

In the above example, ls is a program on your file system, which you can find by $ which ls (this shows where the cmd ls locates in your file system). In addition to running programs from the file system, shells have internal commands (also known as builtin commands) that provide functionality that could not be obtained otherwise. Three internal commands that our shell will implement are cd, pwd, and exit.

The cd command changes the shell’s current directory, which is the default directory that the shell uses for files. So cd dir changes the current directory to dir. (You can think of the current directory as the directory that shows up by default when you use an “Open” or “Save” dialog in a GUI program.) Of course, files can also be manipulated using absolute pathnames, which do not depend on the current directory; /home/studentname/lab2/parser.c is an example. The pwd command shows the current working directory.

There may also come a time when you would like to leave your shell; the exit command instructs the shell to exit with a given status. (exit alone exits with status 0.)

(Why are cd and exit part of the shell instead of standalone programs?)

Exit status and `$?`

A command finishes with an exit status. You can think of the exit status as a command’s “return value”. If a command accomplishes its function successfully, that command generally exits with status 0, by calling exit(0). (This is also what happens when a program runs off the end of its main function.) But if there is an error, most commands will exit with status 1. For example, the cat command will exit with status 0 if it reads its files successfully, and 1 otherwise:

$ cat parser.c
...                                         // exit status 0

$ echo $?
0

$ cat donotexit.txt
cat: donotexist.txt: No such file or directory    // exit status 1

$ echo $?
1

The special variable $? in bash contains the exit value of the previous command.

Input/output redirection

Each program has standard input, standard output, and standard error file descriptors, whose numbers are 0, 1, and 2, respectively. The ls program writes its output to the standard output file descriptor. Normally this is the same as the shell’s standard output, which is the terminal (your screen). But the shell lets you redirect these file descriptors to point instead to other files. For example:

$ ls > files.txt

This command doesn’t print anything to the screen. But let’s use the cat program, which reads a file and prints its contents to standard output, to see what is in output.txt:

$ cat files.txt
// you should see a list of file names of the current directory

The > filename operator redirects standard output, < filename redirects standard input, and 2> filename redirects standard error. (The syntax varies from shell to shell; we generally follow the syntax of the Bourne Again Shell or bash.)

B. Advanced features

Backgrounding

You can also execute a command in the background with the & operator. Normally, the shell will not read a new command until the previous command has exited. But the & operator tells the shell not to wait for the command.

$ echo foo &
$ foo

Note: foo is printed on top of the next shell prompt.

Command separator

Shells offer several ways to chain commands together. For example, the ; operator (also called “command separator”) says “do one command, then do another”. This shell command prints two lines:

$ echo foo ; echo bar
// you should see:
foo
bar

Conditional chaining

Instead of always executing commands in sequence like ;, && and || allow you to conditionally execute commands based on their exit status: && says “execute the command on the right only if the command on the left exited with status 0”. And || says “execute the command on the right only if the command on the left exited with status NOT equal to 0”. For example:

// suppose files.txt exists and contains "foo"
$ cat files.txt && echo "files.txt exists!"
foo
files.txt exists!

// suppose NULL.txt doesn't exist
$ cat NULL.txt && echo "NULL.txt exists!"
cat: NULL.txt: No such file or directory    // Note: does not run echo!

$ cat files.txt || echo "files.txt does not exist."
foo

$ cat NULL.txt || echo "NULL.txt does not exist."
cat: NULL.txt: No such file or directory
NULL.txt does not exist.

Pipe

Finally, the pipe operator | sends the output of one command to the input of another. For example:

$ echo foo | rev
oof

Note: rev reverses a string.

Another example:

$ echo -e "foo\nbar" | shuf -n 1
// the output can be either foo or bar

Note: you have to install shuf first. (How? See command not found.)

Some useful commands

You may find the following commands particularly useful for testing your shell. Find out what they do by reading their manual pages. Be creative with how you combine these!

cat (print one or more files to standard output)
echo (print arguments to standard output)
true (exit with status 0)
false (exit with status 1)
sleep (wait for N seconds then exit)
sort (sort lines)

Part 2: Implementing sh3650

For simplicity, our shell will only support:

the cd, pwd, and exit built-in commands
external programs, like ls and rev
redirection of external program input and output
shell variable $? (but not others)

Other features like backgrounding, conditional chaining, and pipe are not supported.

At various points in this description you are given instructions to refer to the “man page” for a system call or library function; please do so in a terminal window at that point. Note that much of the contents of a man page can be ignored. The most important parts for what we are doing are (1) the list of include files to use and (2) the arguments and return value. (the “RETURN VALUE” section is often near the end of a long man page)

We also provide a series of Appendix that are useful:

[A] Command Line Tokenizer explains how the parser works in our shell
[B] ASCII characters explains how a char type is interpreted as a human-understandable character for our shell.
[C] Testing and Debugging your shell gives advice to thoroughly test your shell, which hints how we will eventually (after the lab deadline) grade your lab.
Note: when you submit, the autograder on the Gradescope contains a subset of the final test cases. Meaning, if you get full scores when submitting doesn’t necessarily mean you get all the credits in the end. We suggest you go through the testing advice we give.

Section 1: Signals

If your shell is interactive, you’ll want to disable the ^C signal (that is, press Ctrl without releasing and then press C), so that you can quit out of a running program without terminating the shell:

signal(SIGINT, SIG_IGN); /* ignore SIGINT=^C */

Later when you use fork to create a subprocess, you’ll want to set it back to its default in that subprocess, so you can terminate a running command:

signal(SIGINT, SIG_DFL);

Exercise 1 disable ^C

edit file sh3650.c and add the line of disabling signal in the main function
make and run the shell
$ make
...
$ ./sh3650
sh3650>
when you type ^C, the shell won’t exit
it will exit properly on end of file (i.e. when you type ^D, which indicates end-of-file on the Unix terminal. Again, ^D is pressing Ctrl, not releasing, then press D)

Section 2: Internal commands

As introduced, internal commands are commands like cd, pwd, and exit that are contained within the shell, literally built in. This is either for performance reasons—internal commands execute faster than external commands, which usually require forking—or because a particular builtin needs direct access to the shell internals.

Note:

the command line tokenizer is described below, in the section Command Line Tokenizer
you can compare strings for equality using strcmp (“man 3 strcmp”), which returns zero if two strings are equal.

(question: why does cd have to be implemented as a built-in command rather than an executable run in a separate process? exit?)

cd

For the cd command you will use the chdir command (“man 2 chdir”) to change to the indicated directory. With no arguments you should use the value of the HOME environment variable, i.e. getenv("HOME").

Note that cd can fail two ways:

wrong number of arguments: print "cd: wrong number of arguments\n" to standard error - use fprintf(stderr, ...
chdir fails: print "cd: %s\n", strerror(errno) to standard error

In both cases set status to 1, and set it to 0 otherwise.

pwd

pwd will use the getcwd system call (“man 2 getcwd”) to get the current directory, passing it a buffer of PATH_MAX bytes, and print the result. You can assume getcwd always succeeds and set status to 0.

exit

exit takes zero or 1 argument; with more than 1 it prints "exit: too many arguments" to stderr and sets status=1. With 0 arguments it calls exit(0); with a single argument it calls exit(atoi(arg)), using atoi (“man 3 atoi”) to convert the argument from a string to an integer.

Exercise 2 implement cd, pwd, and exit

implement the three internal commands
make and test your implementation:
run make, run your shell ./sh3650, and test:
pwd: does it print out the right current directory? does it fail if you give it arguments?
cd /tmp to directories that exist, check with pwd
cd to non-existent directory, check (a) error message, (b) still in same directory
exit: does it work correctly with 0, 1, >1 argument? Try exiting with an arbitrary non-zero status and verify using the $? variable in your normal shell:
$ ./sh3650
sh3650> exit 5
$ echo $?
5
hints:
here are a list of useful library functions and syscalls: strcmp, chdir, getcwd, atoi, exit
factor cd, pwd, and exit into individual functions that each take argc and argv as arguments. Maybe return status as the return value, but more on that later.

Now that you’ve implemented your first commands, make sure that it ignores empty command lines without complaining or crashing.

Section 3: External commands

If a command isn’t an internal command, it’s an external one: you’ll fork a sub-process; in the child process you’ll use exec to run the command, while the parent will use wait to wait until it’s done.

After fork() (“man 2 fork”) you’ll want to do the following:

re-enable ^C (see section 1: signals)
use the execvp library function (“man 3 execvp”) to exec the indicated command

From the man page:

int execvp(const char *file, char *const argv[]);

The first argument is the executable name, while the second is the argv array to be passed to the newly loaded program. Instead of providing an argument count, the argv array is terminated with a NULL pointer. For example, if one runs $ ls /home, the following arguments are fed into execvp:

 argv-> +------+
        |   *--|---->"ls"
        +------+
        |   *--|---->"/home"
        +------+
        | NULL |
        +------+

execvp will load the executable ls (possibly under /usr/bin/ls) and pass it argc=2, argv={“ls”, “/home”}.

(question - how does execvp know where to find the executable ls?)

The command line parser I’ve given you makes sure that the argv[] array is terminated with a NULL pointer, so you can just pass it to execvp: execvp(argv[0], argv);

If execvp fails, you should print a message to standard error, "%s: %s\n", argv[0], strerror(errno), and then exit with EXIT_FAILURE. (question: why do you have to exit here, rather than returning?)

In the parent process you’ll need to wait for the child pracess to finish, using waitpid, and get its exit status (i.e. the argument passed to exit()) It’s ok to copy and paste the following code without fully understanding it:

    // the variable "pid" should be the child process's process id
    int status;
    do {
        waitpid(pid, &status, WUNTRACED);
    } while (!WIFEXITED(status) && !WIFSIGNALED(status));
    int exit_status = WEXITSTATUS(status);

Exercise 3 implement executing external cmds

implement the mentioned fork, execvp, and waitpid
run make, then test:
successful commands, e.g. ls, ls /tmp, etc.
unsuccessful ones, e.g. this-is-not-a-command
^C handling: run sleep 5 and verify you can kill it with ^C and return to your shell.
hint:
FACTORING: we suggest that you factor out the code which forks and execs, and put it in a separate function from where you call waitpid.
Debugging:
You may find the gdb command set follow-fork-mode child useful. documentation
Also the strace -f command can be very useful, although verbose: e.g. here’s a selection of the 140 lines it prints out for my (Peter) shell. (note that fork in Linux is actually implemented using a system call called clone)
 $ echo ls | strace -f ./sh3650
 execve("./sh3650", ["./sh3650"], 0xffffed0a7ba8 /* 24 vars */) = 0
 brk(NULL)                               = 0xaaaadecaa000
   ...
 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLDstrace: Process 119667 attached
 , child_tidptr=0xffff9934bf50) = 119667
 [pid 119667] set_robust_list(0xffff9934bf60, 24 <unfinished ...>
...
 [pid 119667] execve("/usr/local/sbin/ls", ["ls"], 0xffffe3f4c0a8 /* 24 vars */ <unfinished ...>
   ...

Section 4: The `$?` special shell variable

The basic shell has a number of built-in variables, listed under “Special Parameters” in the man page (man sh); we implement only one of these, $?: expands to the exit status of the most recent command.

To implement this you can just use sprintf to print the exit status into a buffer (e.g. define and use char qbuf[16]), and then go through your array of tokens, find any which compare equal to $?, and replace them with a pointer to that buffer.

Exercise 4 implement $?

implement $? in your sh3650.c

make and test:

 $ ./sh3650
 sh3650> false
 sh3650> echo $?
 1
 sh3650> ./sh3650
 // notice: below we're in another sh3650 shell
 sh3650> exit 5
 // now we're back to the first sh3650 shell
 sh3650> echo $?
 5

Section 5: File redirection

As mentioned in input/output redirection, people can redirect the inputs and outputs to files rather than consoles. For example, $ ls > foo.txt will redirect the output of ls to file foo.txt instead of printing on your console. How this works internally is that

the shell open (a syscall, see “man 2 open”) the required file foo.txt;
it dup2 (a syscall, see “man 2 dup2”) the foo.txt’s file descriptor to the standarded output (which is 1);
it close (a syscall, see “man 2 close”) the foo.txt’s file descriptor because we don’t need it anymore (now, file descriptor 1 points to foo.txt).

For your implementation, you should scan the shell’s input tokens for “>” and “<”, and replace standard input and output appropriately. Note that “<” (or “>”) may be followed by zero, one, or multiple words before “>” (“<”) or end of line:

zero words: don’t redirect
more than one: redirect to the first one

If you’ve factored out a “launch” function which takes an argv pointer and file descriptors for stdin and stdout, you can make a “wrapper” for it which checks for file redirection and replaces the appropriate file descriptors if necessary. (make sure you close any file descriptors that aren’t needed)

Exercise 5 implement redirection

implement the mentioned redirection in sh3650.c
hints:
here are some useful syscalls: open, dup2, close
for open, these flags might be useful: O_RDONLY, ~~O_CREAT|O_RDWR~~ update (01/29) O_CREAT|O_TRUNC|O_WRONLY, 0777
Debugging:
you can use the lsof command to list open file descriptors, to make sure you’re not leaking. E.g. from another terminal:
$ ps aux |grep sh3650
pjd       118403  0.0  0.0   2196   780 pts/4    S+   01:29   0:00 ./sh3650
$ lsof -a -d 0-999 -p 118403
COMMAND    PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
sh3650 118403  pjd    0u   CHR  136,4      0t0    7 /dev/pts/4
sh3650 118403  pjd    1u   CHR  136,4      0t0    7 /dev/pts/4
sh3650 118403  pjd    2u   CHR  136,4      0t0    7 /dev/pts/4
(the man page for lsof is horrible. The command means to list all open “normal” files, i.e. with file descriptors 0-999 (-d 0-999), AND (-a) are open in a specific process (-p 118403).
Just like before, the strace -f command can be quite useful.
 $ echo 'ls | cat' | strace -f ./sh3650 
 execve("./sh3650", ["./sh3650"], 0xffffeccb9e08 /* 24 vars */) = 0
... 240 more lines...

Finally, submit your work

Submitting consists of three steps:

Executing this checklist:
- Fill in ~/lab2/slack.txt with (1) your name, (2) your NUID, (3) slack hours you used, and (4) acknowledgements.
- Make sure that your code builds with no warnings.
  note: we will apply a 10% penalty to the compilation warnings you have.
- Make sure you have added (git add) all files that you created (if any).

Push your code to GitHub:

 $ cd ~/lab2
 $ git commit -am 'submit lab2'
 $ git push origin 

 Counting objects: ...
  ....
  To ssh://github.com/NEU-CS3650-labs/lab2-<username>.git
   7337116..ceed758  main -> main

Actually submit your lab via Gradescope:
- Navigate to https://www.gradescope.com/ and click on log in.
- Select login with “School Credentials” and select “Northeastern University”.
- Enter Northeastern SSO login information and you should be able to log in to your gradescope account.
- Now, on Canvas, go to the CS3650 course and click on “Gradescope 1.3” from the left navigation bar. You would then be asked to accept the course invitation after which you can access the course on Gradescope.
- On Gradescope, select the lab/assignment you wish to submit and click on “Upload Submission”.
- You would then be asked to upload a zip file consisting of the files the lab/assignment specifies.
  Note: you can either zip all files within your lab folder, or zip your lab folder whose name must start with “lab2-“ (this is supposed to be your GitHub repo name, lab2-<username>). If you zip a folder named, for example, “mysubmit”, the Gradescope will complain.
- After uploading the zip file, the autograder will evaluate your submission and based on it provide a score for your submission.
- After the manual grading process is performed by the TAs, your final score for the lab/assignment will be released.

This completes the lab.

Appendix

A. Command Line Tokenizer

The simplest way of tokenizing a line in C is to use the strtok library function, or the slightly less horrible strsep, which overwrite whitespace characters to split a line into multiple strings. An example: start with the line "ls | cat", zero out the whitespace characters:

['l']['s'][' ']['|'][' ']['c']['a']['t'][ 0 ]
    -> ['l']['s'][ 0 ]['|'][ 0 ]['c']['a']['t'][ 0 ]

and keep pointers to the beginning of each region of non-whitespace characters:

     argv[]         ['l']['s'][ 0 ]['|'][ 0 ]['c']['a']['t'][ 0 ]
     +-----+          ^              ^         ^
     |  *--|----------+              |         |
     +-----+                         |         |
     |  *--|-------------------------+         |
     +-----+                                   |
     |  *--|-----------------------------------+
     +-----+
     | ... |

Problem: this breaks when you don’t have any whitespace, like "ls|cat" The parser you’re given handles this by copying the input string into a second buffer, rather than modifying it in place:

        input string:                                  buffer:
['l']['s']['|']['c']['a']['t'][ 0 ]     [ 0 ][ 0 ][ 0 ][ 0 ][ 0 ][ 0 ][ 0 ][ 0 ][ 0 ]

  output:
     argv[]                          -> ['l']['s'][ 0 ]['|'][ 0 ]['c']['a']['t'][ 0 ]
     +-----+                              ^              ^         ^
     |  *--|------------------------------+              |         |
     +-----+                                             |         |
     |  *--|---------------------------------------------+         |
     +-----+                                                       |
     |  *--|-------------------------------------------------------+
     +-----+
     |  0  |  <- terminated with NULL pointer (see arg formats in "man 3 execvp")
     +-----+ 
     | ... |

The skeleton code you’re given shows an example of how to use it.

For a “real” shell you’d probably use a tokenizer and parser based on the standard compiler tools lex and yacc, creating an abstract syntax tree of linked “token” objects. That’s far too complicated for this assignment, so we have a simple tokenizer that does a pretty good job of splitting simple lines with redirection symbols and single and double quotes, and returns pointers to strings rather than more complex structures.

The parser is not guaranteed to be bug-free, but your code will only be tested against the cases we have tested.

B. ASCII characters

By default C uses the basic 8-bit ASCII character set, rather than the much larger Unicode character set used in today’s user interfaces. To see the actual character set, we can print out a string containing the bytes 1 through 255, with a 256th byte as the null terminator:

$ cat > test.c <<EOF
#include <stdio.h>
int main(void) {
    char c, buf[256];
    for (int i = 0, c = 1; i < 256; i++)
    buf[i] = c++;
    printf("%s", buf);
}
EOF
$ gcc test.c
$ ./a.out | od -A d -t c

You should see the following - note that offsets (left column) are in decimal, while non-printing characters are printed in octal, which no one uses anymore. (“od” = “octal dump”)

The “missing” character at the end of the second line is actually a space, ' ', and there are several backslash-style escaped characters, of which the only ones we care about are \n (newline) and sometimes \t (tab).

0000000  001 002 003 004 005 006  \a  \b  \t  \n  \v  \f  \r 016 017 020
0000016  021 022 023 024 025 026 027 030 031 032 033 034 035 036 037    
0000032    !   "   #   $   %   &   '   (   )   *   +   ,   -   .   /   0
0000048    1   2   3   4   5   6   7   8   9   :   ;   <   =   >   ?   @
0000064    A   B   C   D   E   F   G   H   I   J   K   L   M   N   O   P
0000080    Q   R   S   T   U   V   W   X   Y   Z   [   \   ]   ^   _   `
0000096    a   b   c   d   e   f   g   h   i   j   k   l   m   n   o   p
0000112    q   r   s   t   u   v   w   x   y   z   {   |   }   ~ 177 200
0000128  201 202 203 204 205 206 207 210 211 212 213 214 215 216 217 220
0000144  221 222 223 224 225 226 227 230 231 232 233 234 235 236 237 240
0000160  241 242 243 244 245 246 247 250 251 252 253 254 255 256 257 260
0000176  261 262 263 264 265 266 267 270 271 272 273 274 275 276 277 300
0000192  301 302 303 304 305 306 307 310 311 312 313 314 315 316 317 320
0000208  321 322 323 324 325 326 327 330 331 332 333 334 335 336 337 340
0000224  341 342 343 344 345 346 347 350 351 352 353 354 355 356 357 360
0000240  361 362 363 364 365 366 367 370 371 372 373 374 375 376 377    

C. Testing and Debugging your shell

In order to test your submission properly, you need to think like someone who is trying to break it. In other words, there are two types of tests you’ll want to write:

basic 1+1=2 tests, verifying that each of the (few) functions your shell performs are working
diabolical tests that try to provoke your code into using NULL pointers and crashing

Note: for some of these tests you may be running the sh3650 executable a whole bunch of times. The -fsanitize-address compile option causes it to take about a second to start up, but you’ll probably save more time in the long run if you keep it enabled.

Below we give some advice on how to test each section of your code:

Test cases for Exercise 1

Start your shell in interactive mode, verify that ^C doesn’t kill it.

Test cases for Exercise 2

Note that a lot of these tests mention checking the value of $? inside your shell - obviously you’ll have to defer this until implementing $?. (and you’ll need external commands, so you can use echo $? to see its value)

But at this stage you can test $? outside of your shell, verifying that you called exit() with the right argument.

$ ./sh3650 <<EOF
exit 5
EOF
$ echo $?  // to see if it is 5

Here are some test cases you should test:

empty line: should not crash, should prompt for next command (in interactive mode)
end-of-file/^D: A control-D character on the console should cause your shell to exit gracefully, i.e. with $? set to 0
cd/pwd:
- no argument: cd with no should change (as reported by pwd ) to your home directory, $HOME. Both should set $? to zero.
- valid arg: cd /tmp (or other real directory) should work, as reported by pwd, set $? to zero
- invalid: cd /not-a-directory should print cd: No such file or directory and set $? to 1
- extra args: cd a b should print cd: wrong number of arguments and set $? to 1
exit
- exit should exit, with $? set to 0
- exit 7 (or whatever) should exit, with $? set to 7
- exit 1 2 should print exit: too many arguments and set $? to 1

Test cases for Exercise 3

verify that you can run a few simple commands - e.g. ls, echo a b c, /bin/ls etc.

check that it fails correctly on bogus commands, e.g.

$ this-is-not-a-command
this-is-not-a-command: No such file or directory
$ ./this-is-not
./this-is-not: No such file or directory

Test cases for Exercise 4

Go back to the tests for Exercise 2 and verify the status after each command

verify that you handle commands returning a status of 0, 1, and another value, and report a status of 1 for command not found:

sh3650> true
sh3650> echo $?
0
sh3650> false
sh3650> echo $?
1
sh3650> sh -c 'exit 7'
sh3650> echo $?
7
sh3650> not-a-command
not-a-command: No such file or directory
sh3650> echo $?
1

Test cases for Exercise 5

You’ll need to test the “normal” cases (i.e., input and output redirection work for a single command), and some abnormal cases:

multiple > file or < file for a single command
one and multiple ‘>’ or ‘<’ without files
’> file’, ‘< file’, ‘>’ and ‘>’ (and combinations) on a line with no commands

Finally, you need to check that you’re not “leaking” open file descriptors. The easiest way to do this is to run your shell in one terminal window, running a number of commands with redirected I/O, then go to another window, find the process ID of your program with ps, and use the lsof utility to list its open files:

$ ps aux |grep sh3650
cs3650     58863  0.0  0.0   2196  1152 pts/1    S+   15:52   0:00 ./sh3650
cs3650     58865  0.0  0.0   8492  2048 pts/3    S+   15:52   0:00 grep --color=auto sh3650

$ lsof -p 58863
COMMAND   PID   USER   FD   TYPE DEVICE SIZE/OFF    NODE NAME
sh3650 58863 cs3650  cwd    DIR  259,2     4096  918097 /home/cs3650/cs3650-f23/hw1
sh3650 58863 cs3650  rtd    DIR  259,2     4096       2 /
sh3650 58863 cs3650  txt    REG  259,2   107968  917589 /home/cs3650/cs3650-f23/hw1/sh3650
sh3650 58863 cs3650  mem    REG  259,2  1641496 2359916 /usr/lib/aarch64-linux-gnu/libc.so.6
sh3650 58863 cs3650  mem    REG  259,2   187776 2359751 /usr/lib/aarch64-linux-gnu/ld-linux-aarch64.so.1
sh3650 58863 cs3650    0u   CHR  136,1      0t0       4 /dev/pts/1
sh3650 58863 cs3650    1u   CHR  136,1      0t0       4 /dev/pts/1
sh3650 58863 cs3650    2u   CHR  136,1      0t0       4 /dev/pts/1

Those three lines at the bottom are the three file descriptors: 0, 1 and 2, i.e. standard input, output, and error. The ‘u’ means they’re open for read+write, and the actual “file” is a terminal device, /dev/pts/1. If you have a bunch of higher-numbered file descriptors listed, you’re leaking them.

Acknowledgments

This lab is created by Peter Desnoyers. Lab instruction Part 1 is adapted from Mike Walfish’s cs202 lab instructions; Part 2 is borrowed from Peter’s prior CS5600 shell lab.