CS6640 Lab3: Kernel scheduler and MLFQ

In this lab, you will explore how to better schedule processes. The current egos-2k+ is using a naive scheduler that schedules the next available process in an array (similar to Round-Robin), which is inefficient for latency-sensitive processes, like shell.

In this lab, your tasks are:

collecting scheduling metrics (like turnaround time),
implementing a Multi-level Feedback Queue scheduler,
and enabling process to sleep.

Your modifications will be within two files grass/scheduler.c and library/queue2.h.

Fetch lab3 from upstream repo

Lab3 is pushed to the upstream branch upstream/lab3. Note that upstream points to the git@github.com:NEU-CS6640-labs/egos-upstream.git. Next, you will fetch the branch and create a lab3 branch on your local repo. (note: your private GitHub repo origin, pointing to git@github.com:NEU-CS6640-labs/egos-<YOUR_ID>.git).

Here are steps to fetch lab3:

fetch lab3 branch
```
$ git fetch upstream lab3

<you should see:>
...
 * [new branch]      lab3       -> upstream/lab3
```
If you get error: No such remote 'upstream', run git remote add upstream git@github.com:NEU-CS6640-labs/egos-upstream.git.

create a local lab3 branch

$ git checkout --no-track -b lab3 upstream/lab3
<you should see:>
Switched to a new branch 'lab3'

confirm you’re on local branch lab3
```
$ git status
<you should see:>
On branch lab3
nothing to commit, working tree clean
```
update 09/29: make sure your repo has commit cce552d23125d7e.... The commit message is thanks Micah for finding the bug. There was a bug in CPU timer, see details in this github issue.
rebase your previous labs to the current branch
```
<you're now on branch lab3>
$ git rebase lab2
```
You need to resolve any conflict caused by the rebase. Read how to resolve conflict here.
Note: rebase will give you a cleaner linear commit history (than git merge), which is easier to understand and is easier trace bugs.
push branch lab3 to remote repo origin
```
$ git push -u origin lab3
```
In the following, you should work on this lab3 branch, and push to the origin’s lab3 branch to submit your work. CS6640 staff will grade your origin:lab3 branch, so if you have any updates you want to keep from previous labs, you should merge/rebase to the lab3 branch.

Tune QUANTUM on your machine

Since qemu’s performance highly depends on the machine it runs on, you need to tune the QUANTUM (defined in library/egos.h) to receive timer interrupts in a reasonable speed.

Here is how to tune your QUANTUM:

fetch quantum_tuning branch

<in your lab repo>
$ git fetch upstream quantum_tuning

create a local quantum_tuning branch

$ git checkout --no-track -b quantum_tuning upstream/quantum_tuning

run egos-2k+

$ make && make qemu
...
[CRITICAL] ------------- Booting -------------
[CRITICAL] This is a simple timer example
[CRITICAL] Got a timer interrupt!
[CRITICAL] Got a timer interrupt!
...

We hope you see the message “Got a timer interrupt!” 1–3 times per second.

If the frequency is too high or too low, go to library/egos.h and modify the value of QUANTUM accrodingly.
Finally, apply your tuned version of QUANTUM in your lab3 branch.

Section 1: Kernel scheduler and scheduling metrics

A. Process organization

egos-2k+ manages processes mainly in file grass/process.h and grass/process.c. Below we introduce the basic design of egos-2k+ processes.

Process status

The status is defined in process.h, including:

    PROC_UNUSED,
    PROC_LOADING, /* allocated and wait for loading elf binary */
    PROC_READY,   /* finished loading elf and wait for first running */
    PROC_RUNNING,
    PROC_RUNNABLE,
    PROC_WAIT_TO_SEND,
    PROC_WAIT_TO_RECV,
    PROC_SLEEPING,

For a scheduler, the status can be classified as:

the current running process: PROC_RUNNING
available for scheduling: PROC_READY, PROC_RUNNABLE
unavaiable for scheduling: PROC_LOADING, PROC_WAIT_TO_SEND, PROC_WAIT_TO_RECV, and PROC_SLEEPING (you will implement sleeping processes later)

Process life cycle

A life cycle of a process starts with (1) being allocated (proc_alloc() in process.c), then (2) being scheduled to run for multiple times (proc_yield() in grass/scheduler.c), and finally (3) being freed (proc_free() in process.c).

According, the following functions in scheduler.c will be called:

proc_on_arrive(int pid): when creating pid
proc_yield(): when deciding next running process
proc_on_stop(int pid): when destroying pid

Later, in your Exercise 1, you will need to collect scheduling information in the above functions.

Process control block (PCB)

Each process is associated with one process control block, namely PCB. PCB contains the necessary information for process to context switch and metadata for syscalls and scheduling. The PCB is the struct process defined in grass/process.h

struct process{
    int pid;
    int status;
    int receiver_pid; /* used when waiting to send a message */
    void *sp, *mepc;  /* process context = stack pointer (sp)
                       * + machine exception program counter (mepc) */
    // scheduling attributes
    union {
        unsigned char      chars[64];
        unsigned int       ints[16];
        float              floats[16];
        unsigned long long longlongs[8];
        double             doubles[8];
    } schd_attr;
};

In this lab, you will design how to use union schd_attr to save necessary information for your scheduler.

PCB array

To manage PCBs, egos-2k+ use a simple array, struct process proc_set[MAX_NPROCESS], defined in grass/kernel.c. This means egos-2k+ can have at most MAX_NPROCESS number of processes, then the kernel will run out of empty PCBs.

Process id (pid) vs. index of PCB array

The pid and the index in PCB array has a deterministic 1-on-1 mapping: the pid is the index plus one. For example, the first process, sys_proc, has the pid=1 and is stored at the proc_set[0]; the second process, sys_file, has the pid=2 and is stored at the proc_set[1].

By design, the process id starts with 1 and the first four processes are belong to the OSes. Starting from pid=5 is the user’s applications.

Current running process

The scheduler keeps record of which is the current running process: proc_curr_idx is the index to the PCB array and points to the current running process. We also provide two helper macros, curr_pid (the current pid) and curr_status (the current process status). Both are defined in process.h.

B. Scheduling metrics

In the following, you will calculate four scheduling metrics and print them when a process exits:

turnaround time: the time between process arrival and stop
response time: the time between process arrival and the first time being scheduled
CPU time: the actual CPU time used by the process
the number of schedules: how many times this process has been scheduled; or, to be specific, how many times the process’s status has been set to PROC_RUNNING before exit

Exercise 1 Calculate scheduling metrics

collect scheduling information in multiple places within grass/scheduler.c
calculate the four metrics above
hints:
search “[lab3-ex1]” and read the corresponding comments
you need to plan how to use the 64B schd_attr in PCB
you can get the current time by earth->gettime(); this is a 64-bit timestamp (in unsigned long long or m_uint64)
the 64-bit timestamp is the CPU cycles since CPU starts. It will overflow (from 0xffffffffffffffff to 0x0). You need to handle the overflow.
we measure the turnaround/response/CPU time in the unit of scheduling time slice QUANTUM, defined in egos.h.
when finished, you should see:
➜ /home/cs6640 ls
./     ../     README
[INFO] proc 5 died after 2 yields, turnaround time: 0.29, response time: 0.23, cputime: 0.03
➜ /home/cs6640 echo

[INFO] proc 5 died after 1 yields, turnaround time: 0.14, response time: 0.13, cputime: 0.01
Note: the actual numbers for turanaround/response/CPU time may vary a lot. This depends on your local environment and your machine.
update 10/06: if you see “3 yilelds” for ls and “2 yields” for echo, that is also correct.

Section 2: Multi-level Feedback Queue

The current scheduler is inefficient for latency-sensitive applications. To see this, try run loop & first and then run ls: (To see what loop does, check apps/user/loop.c.)

<run "loop" in the background>
➜ /home/cs6640 loop &
[INFO] process 5 running in the background
➜ /home/cs6640 ls
<you will wait for a while to see the result>

The reason why ls responses super slow is becuase the scheduler doesn’t priorities system processes (which provide essential services) and short-lived processes (like ls) over long running jobs (like loop).

In the following, you will implement Multi-level Feedback Queue (MLFQ) to overcome this shortcoming and make ls repond faster. If you want to know MLFQ basics, read OSTEP Chapter 8.

You will implement a simple three-level MLFQ:

There will be three queues: high-/mid-/low-priority queues.
The scheduler will always run processes in higher priority queues.
Within each queue, the processes run in round-robin fashion.
When a process is created, it is placed at the high-priority queue.
Once a process uses up a QUANTUM of CPU time, the scheduler downgrades it to a lower-level queue. So, a process will be placed on the mid-priority queue when its CPU time is > 1 and < 2.
Processes with CPU time > 2 will stay in the low-priority queue until they finish.

Exercise 2 Implement your MLFQ

update Makefile to start using your MLFQ. Change the first line of Makefile from SCHEDULER=NAIVE to SCHEDULER=MLFQ (this enables macros MLFQ in C code).
implement your queue in library/queue2.h (you should reuse functions in your apps/user/queue.h)
implement mlfq_init(), mlfq(), and other places with #ifdef MLFQ. You can also search “[lab3-ex2]”.
when finished, you should see ls runs much faster:
➜ /home/cs6640 loop &
[INFO] process 5 running in the background
➜ /home/cs6640 ls
<you should see results immediately>

update 10/15: a clarification of “shell will block low-priority processes”

In our labs, sys_shell uses “pulling” to continously check if users have typed anything. So, the process of sys_shell is always PROC_RUNNABLE when waiting for inputs.
Also, as requested, the process of sys_shell should have the highest priority.
Therefore, there is a starvation problem: for a long-running process like loop, when it leaves the high-queue, it will never be executed again.
To fix this, a scheduler should intentionally pick any runnable process other than sys_shell, when the current running process is sys_shell.
Or, another “natrual” approach is to ignore the current running process when picking the next process to run; that is, when a timer interrupt arrives, pick one process from MLFQ to run before putting the current process back to the queues.
Note: with or without starvation will not affect your grades in lab3. Both implementaitons should pass the tests.

Section 3: Enabling sleeping processes

Next, you will allow processes to sleep. Now, if you run:

➜ /home/cs6640 loop sleep
Sleep for 10 QUANTUM...
                   ...and waked up
<the two lines are printed instantly>

After implementing slepping processes, we would expect the loop will stall for a while and then print the second line.

Exercise 3 Implement process sleep

you will implement sleeping processes in proc_on_sleep() of scheduler.c.
the semantics of grass->sys_sleep(10) is to sleep at least 10 QUANTUM and the process will be eventually wake up. You don’t have to wake the process exactly after 10 QUANTUM (which is hard in the current implementation).
the kernel should not panic (calling FATAL) when there are sleeping processes but no runnable ones. The scheduler shold wait until one sleeping process is wake up.
When you finished, you should see:
➜ /home/cs6640 loop sleep
Sleep for 10 QUANTUM...
<wait a while, then see>
                   ...and waked up

Finally, submit your work

Submitting consists of three steps:

Executing this checklist:
- Fill in ~/cs6640/egos/slack/lab3.txt.
- Make sure that your code build with no warnings.

Push your code to GitHub:

 $ cd ~/cs6640/egos/
 $ git commit -am 'submit lab3'
 $ git push origin lab3

 Counting objects: ...
  ....
  To github.com/NEU-CS6640-labs/egos-<YOUR_ID>.git
     c1c38e6..59c0c6e  lab3 -> lab3

Actually commit your lab (with timestamp and git commit id):
1. Get the git commit id of your work. A commit id is a 40-character hexadecimal string. You can obtain the commit id for the last commit by running the command git log -1 --format=oneline.
2. Submit a file named git.txt to Canvas. (there will be an assignment for this lab on Canvas.) The file git.txt contains two lines: the first line is your github repo url; the second line is the git commit id that you want us to grade. Here is an example:
```
 git@github.com:NEU-CS6640-labs/egos-<YOUR_ID>.git
 29dfdadeadbeefe33421f242b5dd8312732fd3c9
```
  Notice: the repo address must start with git@github.com:... (not https://...). You can get your repo address on GitHub repo page by clicking the green “Code” button, then choose “SSH”.
3. Note: You can submit as many times as you want; we will grade the last commit id submitted to Canvas. Also, you can submit any commit id in your pushed git history; again, we will grade the commit id submitted to Canvas.
  Notice: if you submit multiple times, the file name (git.txt) changes to git-N.txt where N is an integer and represents how many times you’ve submitted. We will grade the file with the largest N.

NOTE: Ground truth is what and when you submitted to Canvas.

A non-existent repo address or a non-existent commit id in Canvas means that you have not submitted the lab, regardless of what you have pushed to GitHub—we will not grade it. So, please double check your submitted repo and commit id!
The time of your submission for the purposes of tracking lateness is the timestamp on Canvas, not the timestamp on GitHub.

This completes the lab.