CS5600 Lab0: Setup and tools
We are going to use git and Linux virtual machine for distributing and collecting assignments.
Section 0: GitHub
If you don’t have a GitHub account, sign up for one here. You only need a Free plan for the labs.
Section 1: Setup a Linux VM
In this course, all of our programming assignments will be assessed on a Linux virtual machine (VM). You can think of a virtual machine as a way to run a particular operating system (in our case, an instance of Ubuntu) on top of another operating system (the one that controls your laptop or desktop).
We recommend VirtualBox, which runs on Windows, Linux, and MacOS (with x86 CPUs) and has been successfully used in this class for many years.
You have three options to build the class VM:
Option 1: use pre-built CS5600 VM image.
CS5600 staffs have prepared a ready-to-use VM image for you. To choose this option:
- download and install VirtualBox: be sure to download the package that is appropriate for your system. Once it has downloaded, install it by double clicking on the installer and following the prompts. The default settings for installation are generally the right ones.
- download the pre-built VM image from Canvas->Files->CS5600.ova
- run the VM by double clicking the image file or importing the image in the VirtualBox.
- Note: you can find VM’s sudo password on Canvas homepage.
- (optional) Watch this video from 13:15, you will see how to run a terminal and update packages (you don’t have to update the packages). The video also explains some basics of Linux and C.
Option 2: install you own VM from scratch
- Again, you need to first download and install VirtualBox.
- Next you need to download the target operating system—Ubuntu 20.04.x LTS by going to https://ubuntu.com/download/desktop. This will download an
.ISO
file to your computer. An ISO file is an image of the data on an optical disc, and it will appear to the virtual machine as a (virtual) CD/DVD-ROM drive.- Finally, follow along with the video here to setup and configuration of an Ubuntu virtual machine. You will need to pause the video as you go along.
Get your extra credits by runningUpdate 1/20 We don’t have this extra credit for this semester. Apologize for this stale message.uname -a
in the terminal, taking a screenshot, and upload it to the assignment “lab0-challenge”.
Notes:
If you are using Apple M1 chip (check here), you’re unable to install VirtualBox. You need to find an Intel/AMD x86 machine to install VirtualBox. If you’re unable to find one, see option 3 below.
For Mac users, you may face a permission problem when installing VirtualBox. See the solution here. Or, if you see “The Installation Failed”, see here.
Option 3: use qemu
to run Ubuntu
This option is for students who use Apple M1 chip. We thank the Khoury system team and CS3650 staff for their help and sharing.
- install brew: if you haven’t installed brew, install it.
- install QEMU: open Terminal on your Mac, and run
$ brew install qemu
- download script: go to Canvas->Files->qemu.sh and download the file
- download CS3650 VM: create an empty folder and run the file
qemu.sh
$ mkdir ~/cs5600/ $ mv YOUR_DOWNLOAD_DIR/qemu.sh ~/cs5600/ $ cd ~/cs5600/ $ chmod +x qemu.sh $ ./qemu.sh
- (after a while) you should be able to see a window named “QEMU” with a prompt “cs3650-guest login:”
- QEMU VM login: find the username and password printed on the Terminal (the one you run
$ ./qemu.sh
); or find them in the fileqemu.sh
(by$ cat qemu.sh
).Below are in the terminal of the QEMU window (after your login). Note: you cannot copy paste here.
- install ssh by running:
$ sudo apt install ssh keychain ssh-server
- enable ssh with password: you should modify file
/etc/ssh/sshd_config
and change the linePasswordAuthentication no
toPasswordAuthentication yes
. Here is one way withnano
:$ sudo nano /etc/ssh/sshd_config // change PasswordAuthentication no => yes // save and quit
- restart the ssh server:
$ sudo service ssh restart
Note: the Ubuntu installed in QEMU is a minimum version. If you need any tools, install it by
sudo apt install
(also Google is your good friend).Finally, we can ssh to the Ubuntu inside QEMU by running cmd in your Mac’s terminal:
$ ssh -p5600 vagrant@localhost
This will ask for password that you have seen in your terminal or
qemu.sh
.This completes the installation of QEMU VM for M1.
Of course, besides VirtualBox and QEMU, if you have an Ubuntu installed, you can use it as well. But, notice that we may need you to install packages and update environments in later labs. We strongly recommend you to use a VM, so that you don’t have to mess up your working environment.
See some useful links/references about Linux command line and C in the end of this page1.
Section 2: Git and GitHub
What is git?
Git was developed by Linus Torvalds for development of the Linux kernel. It’s is a distributed version control system, which means it supports many local repositories which each track changes and can synchronize with each other in a peer-to-peer fashion. It’s the best widely-available version control system, and certainly the most widely used. For information on how to use git, see:
For the workflow in GitHub:
Cloning the lab0 repository
Please click the GitHub Lab0 link on Canvas homepage to create your own private clone of the lab0 repository; this clone lives on (is hosted by) GitHub. Once that clone exists, you will perform a further clone to get that private repository onto your devbox. You’ll do your work on your devbox, and then push your work to the GitHub-hosted private repository for us to grade.
Here’s how it should work.
- Click the GitHub Lab0 link on Canvas homepage to create your Lab0 clone on GitHub.
- Log in to GitHub.
- Provide a name.
- The link should automatically clone the repository. For instance, if your username name was
foobar
, you should now have a repository on GitHub calledNEU-CS5600-22spring/lab0-foobar
NEU-CS5600-21fall/lab0-foobar
.
Update 2/2: 22spring vs. 21fall In lab0’s instructions, the git repo urls were all initially with “22spring”, which should be “21fall”. We updated all the git repo urls below.
Why “21fall” in the repo address? This is an accident. See why in a Piazza question here.
Teaching GitHub about your identity
The easiest way to access GitHub repositories is using an SSH key, a secret key stored on your CS5600 VM that defines your identity. Follow the steps below to create a key for your virtual machine.
Enter your VM: double click the image or start the VM in VirtualBox
Open Terminal.
(i) for students who use CS5600 pre-built VM: enter your name and email when being asked.
(ii) for students who build their own VMs: run
ssh-keygen -t rsa -b 2048
and follow the instructions.- Press enter to use the default file path and key name (should be
~/.ssh/id_rsa
). - Choose a password or leave it empty.
This creates your ssh keys, which live in the directory
~/.ssh
. Your public key is in the file ~/.ssh/id_rsa.pub.- Press enter to use the default file path and key name (should be
Run
cat .ssh/id_rsa.pub
to display your public key.Copy your public key (that is, select the text on the screen, and copy it to the clipboard).
In GitHub, go to your profile settings page (accessible via the upper-rightmost link–this looks like a bunch of pixels for new accounts). Select “SSH and GPG keys” and hit the “New SSH key” button. Then copy and paste the contents of your
~/.ssh/id_rsa.pub
(from the VM) into the “Key” section. Give the key a sensible title, hit the “Add SSH key” button, and you’re good to go.
Creating a local clone
Once GitHub knows your SSH identity, you’re ready to clone your lab repository and start doing work! Here’s how to get a local clone of your private repo on your machine:
Enter your VM and open a terminal
Configure your git “identity” as it shows up in commits:
$ git config --global user.name "FIRST_NAME LAST_NAME" $ git config --global user.email "YOUR_@COLLEGE_EMAIL"
Clone your lab0 repo:
$ cd ~ $ git clone git@github.com:NEU-CS5600-21fall/lab0-<Your-GitHub-Username>.git lab0
Note that the
git@github.com:...
can be obtained on GitHub by clicking the “Clone or download” button. You want to clone using SSH, not HTTPS, so you might need to click “Use SSH”.Look at the files in the repo:
$ cd ~/lab0/ $ ls
You should see:
Makefile debug.c hello.c slack.txt
Exercise 1 compile and run hello world.
- compile and run the helloworld
$ cd ~/lab0/ $ gcc -o hello hello.c $ ./hello
- Now you should see
hello world
GCC (gcc
in the above example) is a widely used compiler. In the above command, -o
specifies the output file (namely, hello
) and hello.c
is the source file to be compiled. See a quick introduction of gcc
here. In this course, you don’t have to be an expert of gcc
. We will provide compiling supports for labs. (Nevertheless, having some understanding of how gcc
works is helpful.)
Section 3: Debugging
This part of the lab will give you practice debugging.
Navigating to syntax errors
Try to compile a program debug
:
$ cd ~/lab0/
$ make
make
is a system to organize compilation. When you run make
, the compilation system will look for a file named Makefile
and executes the rules within. Take a look at our simple Makefile
which compiles the program debug
(you can open it using your favorite text editor). We put comments to explain things. Again, you don’t have to be an expert of make
, and here is a quick introduction.
After running make
, you will see:
debug.c:45:5: error: ...
This is because the code has a syntax error; thus, it cannot be compiled.
Exercise 2 Fix two errors.
- Use the compiler’s error message to determine what’s wrong (a syntax error). After you fix the syntax error, the code will compile. Try
make
again:$ make
Now, you should be able to compile but with a warning message
debug.c:10:12: warning ...
.- Run
debug
. You will see:$ ./debug double a number (10) is (0) debug: debug.c:48: main: Assertion 'num + num == doub_num' failed. Aborted (core dumped)
Though our code compiled, but it was not correct. It failed on an assertion (
num + num == doub_num
). Readdebug.c
to fix this problem and let the program pass the assertion.
Hints:
- Warning information is useful.
- This error is a typo. You should be able to fix the problem by changing only one line of code.
After fixing the two errors, you should be able to see:
double a number (10) is (20)
Segmentation fault (core dumped)
Aha! Our code passes the assertion, but it is still not correct (core dumps are bad). Specifically, the segmentation fault means that our program issued an illegal memory reference, and the operating system ended our process. Making matters worse, we have no idea what the problem in the code is. In the following section, you will learn how to use gdb
to debug this kind of problem.
Debugging with gdb
Run gdb: Use the GNU debugger, or gdb to run the program:
$ gdb debug
(gdb)
Set breakpoints: One thing that you might want to do is to set a breakpoint before the program begins executing. Breakpoints are a way of telling gdb that you want it to execute your program and then stop, or break, at a place that you define. Use the following command to set a breakpoint at the main function:
(gdb) b main
Breakpoint 1 at 0x155f: file debug.c, line 42.
Run the program: Then use gdb’s command run to actually start the program (this is the general pattern in gdb: one invokes the debugger, perhaps sets a breakpoint, and then starts the program with run):
(gdb) run
Backtrace: The program will be stopped when it reaches the breakpoint. At this point, you will be presented with gdb
’s command prompt again. To see the “call stack” (or stack trace), which is the list of functions that have called this one—literally, the stack frames on top of the current one—you issue backtrace
or bt
for short:
(gdb) bt
Experienced developers will often ask for a stack trace as step 0 or 1 of understanding a code problem. Get in the habit of asking gdb
to give you a backtrace.
Continue running: To make the program continue running after a breakpoint, use continue
, or c
for short:
(gdb) c
Step through the code: Of course, if you just c
every time you hit a breakpoint, then you will lose control of the program. You often want the command next
, or n
:
(gdb) n
This “executes” the next line of code, for example executing an entire function. (The command step executes a single line of C code. There is little difference between step
and next
unless you are about to enter a function. step
steps into the function; next
“steps over” the function.)
Inspect the values of variables: In gdb
’s command prompt, the program is stalled. You can query the program’s current global and local variables with the print
command, or p
for short.
Update 1/20: correct the gdb print Run gdb
on debug
. Set a breakpoint at the function test_linked_list
process_msg
.
At this breakpoint, determine the value of the integer argument i
mem
:
(gdb) print mem
$1 = 0xc0000000c <error: Cannot access memory at address 0xc0000000c>
This means that variable mem
holds a value 0xc0000000c
(note that you may see a different value here. Why? Take a look at the type of mem
.).
Aside: you can check local variables’ names using:
(gdb) info local
Core dump: If a program terminated abnormally (for example, debug
), the state of the program will be recorded by the OS and (if core dumps are enabled) saved in a so-called core dump
. gdb
can use a core dump to inspect a crash situation.
To debug using core dumps, you must first enable core dumps, and then point gdb
at the relevant file. We’ll do this in several steps:
// specify the core dump file
$ sudo sysctl -w kernel.core_pattern=core
// enable core dumps
$ ulimit -c unlimited
$ ./debug
$ ls -l core
$ gdb ./debug core
The idea here is that the core
file gives gdb
enough information to recover the memory and CPU state of the program at the moment of the crash. This will allow you to determine which instructions experienced the error.
Exercise 3 Fix segfault.
- Use
gdb
and core dump file to study the segfault and fix the bug.- Again, the bug is a typo. You should be able to fix the bug with one line of code change.
After fixing the typo, you should be able to see:
$ ./debug
double a number (10) is (20)
processed msg: HELLO WORLD!
Section 4: C string and memory overflow
Though it seems that debug
works fine, there is a deeper bug that is related to manipulating strings and memory. Run the AddressSanitizer version of debug
:
$ make
$ ./debug-mem-check
double a number (10) is (20)
=================================================================
==4236==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60200000001c at pc 0x7f616e8a8a6d bp 0x7ffed4d8ad90 sp 0x7ffed4d8a538
... // more error messages
Memory checker—AddressSanitizer
AddressSanitizer is a tool to detect buffer overflow and memory corruption bugs. debug-mem-check
is compiled from debug.c
and AddressSanitizer (read Makefile
to see how we compile debug-mem-check
).
C string
In C, a string (e.g., "hello!"
) is a sequence of char
s with a trailing zero (\0
). A C string is also called a Null-terminated string. The ending \0
is used for recognizing the end of a string.
For example, a string “hello!” in memory would look like:
------------------------------
| h | e | l | l | o | ! | \0 |
------------------------------
Given a string (char * str = "hello!"
), we can tell the length of the string by strlen
. In particular, strlen
is a library function returns the number of characters that precede the terminating NULL character (see function details by $ man strlen
). Namely, strlen(str)
returns 6
.
The deeper bug
The bug detected by AddressSanitizer is a combination of incorrect manipulating strings and memory. Here are some hints: think of invoking strlen
on a string without trailing \0
. What will happen? strlen
will look for the ending \0
and go way beyond the end the string until it finds a ‘\0’ (which belongs to other data) or reaching some illegal memory.
Exercise 4 pass debug-mem-check
Fix the deeper bug and pass
debug-mem-check
. After fixing the bug, you should see:$ make ./debug-mem-check double a number (10) is (20) processed msg: HELLO WORLD!
Section 5: Saving changes by committing
As you modify the skeleton files to complete the labs, you should frequently save your work to protect against laptop failures and other unforeseen troubles, and to create “known good” states. You save the changes by first “committing” them to your local lab repo and then “pushing” those changes to the repo stored on github.com.
$ git commit -am "saving my changes"
$ git push origin
Note that whenever you add a new file, you need to manually tell git to “track it”. Otherwise, the file will not be committed by git commit. Make git track a new file by typing:
$ git add <your-new-file>
After you’ve pushed your changes by typing git push origin
, they are safely stored on github.com. Even if your laptop catches on fire in the future, those pushed changes can still be retrieved. However, you must remember that doing git commit
by itself does not save your changes on github.com (it only saves your changes locally). So, don’t forget to type git push origin
.
To see if your local repo is up-to-date with your origin repo on github.com and vice versa, type git status
.
Exercise 5 commit and push.
- go to the lab0 folder
cd ~/lab0/
- commit your modifications
git commit -am"my CS5600 commits"
- push to github.com,
git push origin
You should see something like:
Counting objects: ... .... To ssh://github.com/NEU-CS5600-21fall/lab0-<username>.git 7337116..ceed758 main -> main
Finally, submit your work
Submitting lab0 consists of two steps:
- Executing this checklist:
- Fill in
slack.txt
with (1) your name, (2) your NUID, (3) slack hours you used. - Make sure you have finished Exercise 5, namely you have done
git commit -am'my CS5600 commits'
andgit push origin
(Note: the commit message doesn’t matter.)
- Fill in
Actually commit your lab (with timestamp and git commit id):
Get the git commit id of your work. A commit id is a 40-character hexadecimal string. You can obtain the commit id for the last commit by running the command
git log -1 --format=oneline
.- Paste both your git repo address and the commit id (the hexadecimal string) to Canvas. In Canvas, there will be an assignment for this lab. You should paste the git repo address and the commit id in two lines:
git@github.com:NEU-CS5600-21fall/lab0-studentid.git 29dfda9c788fade33421f242b5dd1ff5295fd3c9
Notice: the repo address must start with
git@github.com:...
(nothttps://...
). You can get your repo address on GitHub repo page by clicking the green “Code” button, then choose “SSH”. - Note: You can submit as many times as you want; we will grade the last commit id submitted to Canvas. Also, you can submit any commit id in your pushed git history; again, we will grade the commit id submitted to Canvas.
NOTE: Ground truth is what and when you submitted to Canvas.
A non-existent commit id in Canvas means that you have not submitted the lab, regardless of what you have pushed to GitHub—we will not grade it. So, please double check your submitted commit id!
The time of your submission for the purposes of tracking lateness is the timestamp on Canvas, not the timestamp on GitHub.
This completes the lab.
Acknowledgments
Some links were borrowed from prior CS5600s (2020fall and 2021spring). A large portion of this writeup was borrowed from Mike Walfish’s CS202, which further borrowed materials from Harvard’s CS61, Jinyang Li’s CS201, and Aurojit Panda’s 3033.
- “The Linux Command Line,” 5th edition, William Shotts http://linuxcommand.org/tlcl.php.
- The Linux command line for beginners: https://ubuntu.com/tutorials/command-line-for-beginners#1-overview.
- Programming in C, Fourth Edition, Stephen G. Kochan, Addison-Wesley Professional, 2014 (available online through NEU library).
- C Tutorial: https://www.cprogramming.com/tutorial/c-tutorial.html.