CS5600 Lab0: Setup and tools

We are going to use git and Linux virtual machine for distributing and collecting assignments.

Section 0: GitHub

If you don’t have a GitHub account, sign up for one here. You only need a Free plan for the labs.

Section 1: Setup a Linux VM

In this course, all of our programming assignments will be assessed on a Linux virtual machine (VM). You can think of a virtual machine as a way to run a particular operating system (in our case, an instance of Ubuntu) on top of another operating system (the one that controls your laptop or desktop).

We recommend VirtualBox, which runs on Windows, Linux, and MacOS (with x86 CPUs) and has been successfully used in this class for many years.

You have three options to build the class VM:

Option 1: use pre-built CS5600 VM image.

CS5600 staffs have prepared a ready-to-use VM image for you. To choose this option:
download and install VirtualBox: be sure to download the package that is appropriate for your system. Once it has downloaded, install it by double clicking on the installer and following the prompts. The default settings for installation are generally the right ones.
download the pre-built VM image from Canvas->Files->CS5600.ova
run the VM by double clicking the image file or importing the image in the VirtualBox.
Note: you can find VM’s sudo password on Canvas homepage.
(optional) Watch this video from 13:15, you will see how to run a terminal and update packages (you don’t have to update the packages). The video also explains some basics of Linux and C.

Option 2: install you own VM from scratch

Again, you need to first download and install VirtualBox.
Next you need to download the target operating system—Ubuntu 20.04.x LTS by going to https://ubuntu.com/download/desktop. This will download an .ISO file to your computer. An ISO file is an image of the data on an optical disc, and it will appear to the virtual machine as a (virtual) CD/DVD-ROM drive.
Finally, follow along with the video here to setup and configuration of an Ubuntu virtual machine. You will need to pause the video as you go along.
~~Get your extra credits by running uname -a in the terminal, taking a screenshot, and upload it to the assignment “lab0-challenge”.~~ Update 1/20 We don’t have this extra credit for this semester. Apologize for this stale message.

Notes:

If you are using Apple M1 chip (check here), you’re unable to install VirtualBox. You need to find an Intel/AMD x86 machine to install VirtualBox. If you’re unable to find one, see option 3 below.
For Mac users, you may face a permission problem when installing VirtualBox. See the solution here. Or, if you see “The Installation Failed”, see here.

Option 3: use qemu to run Ubuntu

This option is for students who use Apple M1 chip. We thank the Khoury system team and CS3650 staff for their help and sharing.
install brew: if you haven’t installed brew, install it.
install QEMU: open Terminal on your Mac, and run
  $ brew install qemu
download script: go to Canvas->Files->qemu.sh and download the file
download CS3650 VM: create an empty folder and run the file qemu.sh
  $ mkdir ~/cs5600/
  $ mv YOUR_DOWNLOAD_DIR/qemu.sh ~/cs5600/
  $ cd ~/cs5600/
  $ chmod +x qemu.sh
  $ ./qemu.sh
(after a while) you should be able to see a window named “QEMU” with a prompt “cs3650-guest login:”
QEMU VM login: find the username and password printed on the Terminal (the one you run $ ./qemu.sh); or find them in the file qemu.sh (by $ cat qemu.sh).
Below are in the terminal of the QEMU window (after your login). Note: you cannot copy paste here.
install ssh by running:
  $ sudo apt install ssh keychain ssh-server
enable ssh with password: you should modify file /etc/ssh/sshd_config and change the line PasswordAuthentication no to PasswordAuthentication yes. Here is one way with nano:
  $ sudo nano /etc/ssh/sshd_config
  // change PasswordAuthentication no => yes
  // save and quit
restart the ssh server:
  $ sudo service ssh restart
Note: the Ubuntu installed in QEMU is a minimum version. If you need any tools, install it by sudo apt install (also Google is your good friend).
Finally, we can ssh to the Ubuntu inside QEMU by running cmd in your Mac’s terminal:
$ ssh -p5600 vagrant@localhost
This will ask for password that you have seen in your terminal or qemu.sh.
This completes the installation of QEMU VM for M1.

Of course, besides VirtualBox and QEMU, if you have an Ubuntu installed, you can use it as well. But, notice that we may need you to install packages and update environments in later labs. We strongly recommend you to use a VM, so that you don’t have to mess up your working environment.

See some useful links/references about Linux command line and C in the end of this page¹.

Section 2: Git and GitHub

What is git?

Git was developed by Linus Torvalds for development of the Linux kernel. It’s is a distributed version control system, which means it supports many local repositories which each track changes and can synchronize with each other in a peer-to-peer fashion. It’s the best widely-available version control system, and certainly the most widely used. For information on how to use git, see:

For the workflow in GitHub:

GitHub Guides: Hello World

Cloning the lab0 repository

Please click the GitHub Lab0 link on Canvas homepage to create your own private clone of the lab0 repository; this clone lives on (is hosted by) GitHub. Once that clone exists, you will perform a further clone to get that private repository onto your devbox. You’ll do your work on your devbox, and then push your work to the GitHub-hosted private repository for us to grade.

Here’s how it should work.

Click the GitHub Lab0 link on Canvas homepage to create your Lab0 clone on GitHub.
Log in to GitHub.
Provide a name.
The link should automatically clone the repository. For instance, if your username name was foobar, you should now have a repository on GitHub called ~~NEU-CS5600-22spring/lab0-foobar~~ NEU-CS5600-21fall/lab0-foobar.

Update 2/2: 22spring vs. 21fall In lab0’s instructions, the git repo urls were all initially with “22spring”, which should be “21fall”. We updated all the git repo urls below.
Why “21fall” in the repo address? This is an accident. See why in a Piazza question here.

Teaching GitHub about your identity

The easiest way to access GitHub repositories is using an SSH key, a secret key stored on your CS5600 VM that defines your identity. Follow the steps below to create a key for your virtual machine.

Enter your VM: double click the image or start the VM in VirtualBox
Open Terminal.
(i) for students who use CS5600 pre-built VM: enter your name and email when being asked.
(ii) for students who build their own VMs: run ssh-keygen -t rsa -b 2048 and follow the instructions.
- Press enter to use the default file path and key name (should be ~/.ssh/id_rsa).
- Choose a password or leave it empty.
This creates your ssh keys, which live in the directory ~/.ssh. Your public key is in the file ~/.ssh/id_rsa.pub.
Run cat .ssh/id_rsa.pub to display your public key.
Copy your public key (that is, select the text on the screen, and copy it to the clipboard).
In GitHub, go to your profile settings page (accessible via the upper-rightmost link–this looks like a bunch of pixels for new accounts). Select “SSH and GPG keys” and hit the “New SSH key” button. Then copy and paste the contents of your ~/.ssh/id_rsa.pub (from the VM) into the “Key” section. Give the key a sensible title, hit the “Add SSH key” button, and you’re good to go.

Creating a local clone

Once GitHub knows your SSH identity, you’re ready to clone your lab repository and start doing work! Here’s how to get a local clone of your private repo on your machine:

Enter your VM and open a terminal

Configure your git “identity” as it shows up in commits:

 $ git config --global user.name "FIRST_NAME LAST_NAME"
 $ git config --global user.email "YOUR_@COLLEGE_EMAIL"

Clone your lab0 repo:
```
 $ cd ~
 $ git clone git@github.com:NEU-CS5600-21fall/lab0-<Your-GitHub-Username>.git lab0
```
Note that the git@github.com:... can be obtained on GitHub by clicking the “Clone or download” button. You want to clone using SSH, not HTTPS, so you might need to click “Use SSH”.

Look at the files in the repo:

 $ cd ~/lab0/
 $ ls

You should see:

 Makefile        debug.c         hello.c         slack.txt

Exercise 1 compile and run hello world.

compile and run the helloworld
$ cd ~/lab0/
$ gcc -o hello hello.c
$ ./hello
Now you should see
hello world

GCC (gcc in the above example) is a widely used compiler. In the above command, -o specifies the output file (namely, hello) and hello.c is the source file to be compiled. See a quick introduction of gcc here. In this course, you don’t have to be an expert of gcc. We will provide compiling supports for labs. (Nevertheless, having some understanding of how gcc works is helpful.)

Section 3: Debugging

This part of the lab will give you practice debugging.

Navigating to syntax errors

Try to compile a program debug:

$ cd ~/lab0/
$ make

make is a system to organize compilation. When you run make, the compilation system will look for a file named Makefile and executes the rules within. Take a look at our simple Makefile which compiles the program debug (you can open it using your favorite text editor). We put comments to explain things. Again, you don’t have to be an expert of make, and here is a quick introduction.

After running make, you will see:

debug.c:45:5: error: ...

This is because the code has a syntax error; thus, it cannot be compiled.

Exercise 2 Fix two errors.

Use the compiler’s error message to determine what’s wrong (a syntax error). After you fix the syntax error, the code will compile. Try make again:
$ make
Now, you should be able to compile but with a warning message debug.c:10:12: warning ....
Run debug. You will see:
$ ./debug
double a number (10) is (0)
debug: debug.c:48: main: Assertion 'num + num == doub_num' failed.
Aborted (core dumped)
Though our code compiled, but it was not correct. It failed on an assertion (num + num == doub_num). Read debug.c to fix this problem and let the program pass the assertion.
Hints:
Warning information is useful.
This error is a typo. You should be able to fix the problem by changing only one line of code.

After fixing the two errors, you should be able to see:

double a number (10) is (20)
Segmentation fault (core dumped)

Aha! Our code passes the assertion, but it is still not correct (core dumps are bad). Specifically, the segmentation fault means that our program issued an illegal memory reference, and the operating system ended our process. Making matters worse, we have no idea what the problem in the code is. In the following section, you will learn how to use gdb to debug this kind of problem.

Debugging with `gdb`

Run gdb: Use the GNU debugger, or gdb to run the program:

$ gdb debug
(gdb)

Set breakpoints: One thing that you might want to do is to set a breakpoint before the program begins executing. Breakpoints are a way of telling gdb that you want it to execute your program and then stop, or break, at a place that you define. Use the following command to set a breakpoint at the main function:

(gdb) b main
Breakpoint 1 at 0x155f: file debug.c, line 42.

Run the program: Then use gdb’s command run to actually start the program (this is the general pattern in gdb: one invokes the debugger, perhaps sets a breakpoint, and then starts the program with run):

(gdb) run

Backtrace: The program will be stopped when it reaches the breakpoint. At this point, you will be presented with gdb’s command prompt again. To see the “call stack” (or stack trace), which is the list of functions that have called this one—literally, the stack frames on top of the current one—you issue backtrace or bt for short:

(gdb) bt

Experienced developers will often ask for a stack trace as step 0 or 1 of understanding a code problem. Get in the habit of asking gdb to give you a backtrace.

Continue running: To make the program continue running after a breakpoint, use continue, or c for short:

(gdb) c

Step through the code: Of course, if you just c every time you hit a breakpoint, then you will lose control of the program. You often want the command next, or n:

(gdb) n

This “executes” the next line of code, for example executing an entire function. (The command step executes a single line of C code. There is little difference between step and next unless you are about to enter a function. step steps into the function; next “steps over” the function.)

Inspect the values of variables: In gdb’s command prompt, the program is stalled. You can query the program’s current global and local variables with the print command, or p for short.

Update 1/20: correct the gdb print Run gdb on debug. Set a breakpoint at the function ~~test_linked_list~~process_msg.

At this breakpoint, determine the value of the ~~integer i~~ argument mem:

(gdb) print mem
$1 = 0xc0000000c <error: Cannot access memory at address 0xc0000000c>

This means that variable mem holds a value 0xc0000000c (note that you may see a different value here. Why? Take a look at the type of mem.).

Aside: you can check local variables’ names using:

(gdb) info local

Core dump: If a program terminated abnormally (for example, debug), the state of the program will be recorded by the OS and (if core dumps are enabled) saved in a so-called core dump. gdb can use a core dump to inspect a crash situation.

To debug using core dumps, you must first enable core dumps, and then point gdb at the relevant file. We’ll do this in several steps:

// specify the core dump file
$ sudo sysctl -w kernel.core_pattern=core
// enable core dumps
$ ulimit -c unlimited 

$ ./debug
$ ls -l core
$ gdb ./debug core

The idea here is that the core file gives gdb enough information to recover the memory and CPU state of the program at the moment of the crash. This will allow you to determine which instructions experienced the error.

Exercise 3 Fix segfault.

Use gdb and core dump file to study the segfault and fix the bug.
Again, the bug is a typo. You should be able to fix the bug with one line of code change.

After fixing the typo, you should be able to see:

$ ./debug
double a number (10) is (20)
processed msg: HELLO WORLD!

Section 4: C string and memory overflow

Though it seems that debug works fine, there is a deeper bug that is related to manipulating strings and memory. Run the AddressSanitizer version of debug:

$ make
$ ./debug-mem-check
double a number (10) is (20)
=================================================================
==4236==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60200000001c at pc 0x7f616e8a8a6d bp 0x7ffed4d8ad90 sp 0x7ffed4d8a538
... // more error messages

Memory checker—AddressSanitizer

AddressSanitizer is a tool to detect buffer overflow and memory corruption bugs. debug-mem-check is compiled from debug.c and AddressSanitizer (read Makefile to see how we compile debug-mem-check).

C string

In C, a string (e.g., "hello!") is a sequence of chars with a trailing zero (\0). A C string is also called a Null-terminated string. The ending \0 is used for recognizing the end of a string.

For example, a string “hello!” in memory would look like:

------------------------------
| h | e | l | l | o | ! | \0 |
------------------------------

Given a string (char * str = "hello!"), we can tell the length of the string by strlen. In particular, strlen is a library function returns the number of characters that precede the terminating NULL character (see function details by $ man strlen). Namely, strlen(str) returns 6.

The deeper bug

The bug detected by AddressSanitizer is a combination of incorrect manipulating strings and memory. Here are some hints: think of invoking strlen on a string without trailing \0. What will happen? strlen will look for the ending \0 and go way beyond the end the string until it finds a ‘\0’ (which belongs to other data) or reaching some illegal memory.

Exercise 4 pass debug-mem-check

Fix the deeper bug and pass debug-mem-check. After fixing the bug, you should see:
$ make
./debug-mem-check
double a number (10) is (20)
processed msg: HELLO WORLD!

Section 5: Saving changes by committing

As you modify the skeleton files to complete the labs, you should frequently save your work to protect against laptop failures and other unforeseen troubles, and to create “known good” states. You save the changes by first “committing” them to your local lab repo and then “pushing” those changes to the repo stored on github.com.

$ git commit -am "saving my changes"
$ git push origin

Note that whenever you add a new file, you need to manually tell git to “track it”. Otherwise, the file will not be committed by git commit. Make git track a new file by typing:

$ git add <your-new-file>

After you’ve pushed your changes by typing git push origin, they are safely stored on github.com. Even if your laptop catches on fire in the future, those pushed changes can still be retrieved. However, you must remember that doing git commit by itself does not save your changes on github.com (it only saves your changes locally). So, don’t forget to type git push origin.

To see if your local repo is up-to-date with your origin repo on github.com and vice versa, type git status.

Exercise 5 commit and push.

go to the lab0 folder cd ~/lab0/
commit your modifications git commit -am"my CS5600 commits"
push to github.com, git push origin
You should see something like:
 Counting objects: ...
 ....
 To ssh://github.com/NEU-CS5600-21fall/lab0-<username>.git
  7337116..ceed758  main -> main

Finally, submit your work

Submitting lab0 consists of two steps:

Executing this checklist:
- Fill in slack.txt with (1) your name, (2) your NUID, (3) slack hours you used.
- Make sure you have finished Exercise 5, namely you have done git commit -am'my CS5600 commits' and git push origin (Note: the commit message doesn’t matter.)
Actually commit your lab (with timestamp and git commit id):
1. Get the git commit id of your work. A commit id is a 40-character hexadecimal string. You can obtain the commit id for the last commit by running the command git log -1 --format=oneline.
2. Paste both your git repo address and the commit id (the hexadecimal string) to Canvas. In Canvas, there will be an assignment for this lab. You should paste the git repo address and the commit id in two lines:
```
 git@github.com:NEU-CS5600-21fall/lab0-studentid.git
 29dfda9c788fade33421f242b5dd1ff5295fd3c9
```
  Notice: the repo address must start with git@github.com:... (not https://...). You can get your repo address on GitHub repo page by clicking the green “Code” button, then choose “SSH”.
3. Note: You can submit as many times as you want; we will grade the last commit id submitted to Canvas. Also, you can submit any commit id in your pushed git history; again, we will grade the commit id submitted to Canvas.

NOTE: Ground truth is what and when you submitted to Canvas.

A non-existent commit id in Canvas means that you have not submitted the lab, regardless of what you have pushed to GitHub—we will not grade it. So, please double check your submitted commit id!
The time of your submission for the purposes of tracking lateness is the timestamp on Canvas, not the timestamp on GitHub.

This completes the lab.

Acknowledgments

Some links were borrowed from prior CS5600s (2020fall and 2021spring). A large portion of this writeup was borrowed from Mike Walfish’s CS202, which further borrowed materials from Harvard’s CS61, Jinyang Li’s CS201, and Aurojit Panda’s 3033.

- “The Linux Command Line,” 5th edition, William Shotts http://linuxcommand.org/tlcl.php.
- The Linux command line for beginners: https://ubuntu.com/tutorials/command-line-for-beginners#1-overview.
- Programming in C, Fourth Edition, Stephen G. Kochan, Addison-Wesley Professional, 2014 (available online through NEU library).
- C Tutorial: https://www.cprogramming.com/tutorial/c-tutorial.html.
↩