CS6640 Lab4: Syscall, Exception, and Memory protection

In previous labs, the OS has no protection method to prevent user programs (like helloworld and ult) from sabotaging the execution of other processes. In this lab, you will implement basic OS protections and better system calls, including:

using ecall to implement syscalls and handle different exceptions
running user applications in user-level
implementing physical memory protection (PMP)

There are limited lines of code you will have to write (our reference implementation takes 80 lines of code). Yet, we expect you spend significant efforts on reading RISC-V specifications to understand what code to write. So, if you have little clue of how to even start for the first pass, that’s normal. It takes time to understand a bunch of detailed and seemingly messy information. You should read the specification again the second day. You will very likely find that you understand better!

Fetch lab4 from upstream repo

fetch lab4 branch
```
$ git fetch upstream lab4

<you should see:>
...
 * [new branch]      lab4       -> upstream/lab4
```
If you get error: No such remote 'upstream', run git remote add upstream git@github.com:NEU-CS6640-labs/egos-upstream.git.

create a local lab4 branch

$ git checkout --no-track -b lab4 upstream/lab4
<you should see:>
Switched to a new branch 'lab4'

confirm you’re on local branch lab4

$ git status
<you should see:>
On branch lab4
nothing to commit, working tree clean

rebase your previous labs to the current branch
```
<you're now on branch lab4>
$ git rebase lab3
```
You need to resolve any conflict caused by the rebase. Read how to resolve conflict here.
Note: rebase will give you a cleaner linear commit history (than git merge), which is easier to understand and is easier trace bugs.
push branch lab4 to remote repo origin
```
$ git push -u origin lab4
```
In the following, you should work on this lab4 branch, and push to the origin’s lab4 branch to submit your work. CS6640 staff will grade your origin:lab4 branch. If you have any updates you want to keep from previous labs, you should merge/rebase to the lab4 branch.

Section 1: Syscalls and exceptions

Your egos implementation so far has no process isolation and is vulnerable to malicious modifications from user applications. For example, crash1 is a program that allocates a large chunk of memory that exceeds the available memory. Run crash1, and you will see:

➜ /home/cs6640 crash1
_sbrk: heap grows too large (0x8201740 + 33554456 > 0xa000000)
_sbrk: heap grows too large
[FATAL] fatal exception (pid=6) 7

This is not ideal, as we should not panic the entire OS simply because a process allocates too much memory. Instead, OS should kill this process like what most of today’s OSes do.

To implement this, we need to do two things: (1) implementing syscalls with ecall (a RISC-V instruction) and (2) handling different exceptions accordingly.

Implementing synchronous syscalls

The current syscall implementation uses software interrupt to trap to kernel. See syscall workflow and read sys_invoke() in grass/syscall.c. During a system call, sys_invoke() triggers a software interrupt which will be handled by trap_handler() in grass/kernel.c. In trap_handler, the kernel further decides to call either intr_entry() (for handling interrupts) or excp_entry() (for handling exceptions).

Now, the syscalls are handled within intr_entry because they are triggered by software interrupts. What you should do next is to replace the software interrupt with something designed for synchronous syscalls to trap to kernel.

Exercise 1 implementing syscalls with ecall

Read ecall and mret (Ch3.3.1 and Ch3.3.2) to understand how ecall works.
In Makefile, change the line SYSCALLFUNC=SOFTTIMER to SYSCALLFUNC=ECALL
Use ecall to trap to kernel in sys_invoke(), grass/syscall.c.
You will need to write assembly instructions; take a look at handout week04a, panel 2
When finished, make and run your code; then, you should see an error message:
[FATAL] fatal exception (pid=1) 9
Hint: take a look at the trap reason table to see what is this “9” exception.

Handling exceptions

The reason why you see an error message above is that you don’t have an exception handler yet. Next, you will implement the exception handler in grass/kernel.c.

Exercise 2 handling exceptions

To understand the trap prologue before excp_entry(), read:
trap_entry() in earth/cpu_intr.c,
trap_handler() in grass/kernel.c,
and ctx_entry() in kernel.c.
Implement excp_entry() in grass/kerenl.c
Here are some hints:
there are multiple exceptions about ecall (see the trap reason table).
You should capture them all.
How to handle syscalls?
Read intr_entry() to see how we did that before.
How to kill a process?
Read inter_entry() about killing a process when encountering ctrl+C.
Take care of which “mepc” you should “mret” to.
When finished, your kernel should boot normally and you should see:
...
➜ /home/cs6640 crash1
_sbrk: heap grows too large (0x8201740 + 33554456 > 0xa000000)
_sbrk: heap grows too large
[INFO] process 6 killed by exception 7

Section 2: RISC-V privilege levels

Run user apps in U-Mode

Right now, all user apps have the privilege to do whatever the kernel could do. To see this, run:

➜ /home/cs6640 crash2
[SUCCESS] Crash2 succeeds in running high-privileged instructions

Here, crash2 successfully runs an instruction that should have been only executed by the kernel. This is problemetic.

RISC-V processors provide privilege levels and related protection mechanisms to address this problem. In a RISC-V CPU, there are three privilege levels:

Machine (M-Mode): the highest privilege level; earth runs in Machine mode.
Supervisor (S-Mode): the second highest privilege level; grass and system processes runs in this mode.
User (U-Mode): the most limited level; user applications run in this mode.

Read Introduction to RISC-V privilege levels (Ch1). Notice the “encoding” of each mode in “Table 1.1”: for those who are very familiar with x86 ring levels, the “level 0” in RISC-V is the lowest privileged level, which is the opposite of “ring0” in x86.

Next, you will run user apps run in U-Mode

Exercise 3 run user apps in U-Mode

Read mstatus.MPP (Ch3.1.6.1) to understand how mret changes the privilege levels.
Fill in ctx_entry() in grass/kernel.c
Fill in proc_yield() in grass/scheduler.c
After finished, you should see:
➜ /home/cs6640 crash2
[INFO] process 6 killed by exception 2
...
➜ /home/cs6640 crash2 yield
[INFO] process 6 killed by exception 2
...
Both crash2 and crash2 yield should be killed.

Section 3: Physical memory protection

After running user apps in U-Mode, the user-level programs cannot run high-priveleged instructions. But, they still can touch all memory, some of which contains important kernel data structures (e.g., PCB array). To see this, run crash3:

➜ /home/cs6640 crash3
[SUCCESS] Crash3 succeeds in modifying earth's code
[SUCCESS] Crash3 succeeds in modifying disk contents
[SUCCESS] Crash3 succeeds in corrupting the memory of other processes

You will see that crash3 can modifying critical memory.

To address this problem, we need to protect memory from invalid accesses. There are multiple ways to do this. In this lab, you will use a hardware feature provided by RISC-V CPUs, named Physical Memory Protection or PMP. Read physical memory protection (Ch3.7) to understand how PMP works.

Exercise 4 protect memory with PMP

Implement PMP in pmp_init() of earth/cpu_mmu.c
update 10/19: in pmp_init comments, regions (e.g., “0x00000000 - 0x20000000”) are “start inclusive to end exclusive” (including 0x00000000 and excluding 0x20000000).
After finished, you should see:
➜ /home/cs6640 crash3 disk
[INFO] process 6 killed by exception 7
...
➜ /home/cs6640 crash3 earth
[INFO] process 6 killed by exception 7
Both the crash3 disk and crash3 earth should be properly killed.

Exercise 5 protecting other processes’ memory

Read TODO in apps/user/crash3.c
Read questions in slack/lab4.txt and write down your answers.

Finally, submit your work

Submitting consists of three steps:

Executing this checklist:
- Fill in ~/cs6640/egos/slack/lab4.txt.
- Make sure that your code build with no warnings.

Push your code to GitHub:

 $ cd ~/cs6640/egos/
 $ git commit -am 'submit lab4'
 $ git push origin lab4

 Counting objects: ...
  ....
  To github.com/NEU-CS6640-labs/egos-<YOUR_ID>.git
     c1c38e6..59c0c6e  lab4 -> lab4

Actually commit your lab (with timestamp and git commit id):
1. Get the git commit id of your work. A commit id is a 40-character hexadecimal string. You can obtain the commit id for the last commit by running the command git log -1 --format=oneline.
2. Submit a file named git.txt to Canvas. (there will be an assignment for this lab on Canvas.) The file git.txt contains two lines: the first line is your github repo url; the second line is the git commit id that you want us to grade. Here is an example:
```
 git@github.com:NEU-CS6640-labs/egos-<YOUR_ID>.git
 29dfdadeadbeefe33421f242b5dd8312732fd3c9
```
  Notice: the repo address must start with git@github.com:... (not https://...). You can get your repo address on GitHub repo page by clicking the green “Code” button, then choose “SSH”.
3. Note: You can submit as many times as you want; we will grade the last commit id submitted to Canvas. Also, you can submit any commit id in your pushed git history; again, we will grade the commit id submitted to Canvas.
  Notice: if you submit multiple times, the file name (git.txt) changes to git-N.txt where N is an integer and represents how many times you’ve submitted. We will grade the file with the largest N.

NOTE: Ground truth is what and when you submitted to Canvas.

A non-existent repo address or a non-existent commit id in Canvas means that you have not submitted the lab, regardless of what you have pushed to GitHub—we will not grade it. So, please double check your submitted repo and commit id!
The time of your submission for the purposes of tracking lateness is the timestamp on Canvas, not the timestamp on GitHub.

This completes the lab.