OSI Lab3: Kernel scheduler and MLFQ

In this lab, you will explore how to better schedule processes. The current egos is using a naive scheduler that schedules the next available process in an array (similar to Round-Robin), which is inefficient for latency-sensitive processes, like shell.

In this lab, your tasks are:

collecting scheduling metrics (like turnaround time),
and implementing a Multi-level Feedback Queue scheduler.

Your modifications will be within three files library/queue.h, grass/scheduler.c, and grass/mlfq.c.

Fetch lab3 from upstream repo

Lab3 is pushed to the upstream branch upstream/lab3. Note that upstream points to the git@github.com:NEU-CS6640-labs/egos-upstream.git. Next, you will fetch the branch and create a lab3 branch on your local repo. (note: your private GitHub repo origin, pointing to git@github.com:NEU-CS6640-labs/egos-<YOUR_ID>.git).

Here are steps to fetch lab3:

fetch lab3 branch
```
$ git fetch upstream lab3

<you should see:>
...
 * [new branch]      lab3       -> upstream/lab3
```
If you get error: No such remote 'upstream', run git remote add upstream git@github.com:NEU-CS6640-labs/egos-upstream.git.

create a local lab3 branch

$ git checkout --no-track -b lab3 upstream/lab3
<you should see:>
Switched to a new branch 'lab3'

confirm you’re on local branch lab3

$ git status
<you should see:>
On branch lab3
nothing to commit, working tree clean

rebase your previous labs to the current branch
```
<you're now on branch lab3>
$ git rebase lab2
```
You need to resolve any conflict caused by the rebase. Read how to resolve conflict here.
Note: rebase will give you a cleaner linear commit history (than git merge), which is easier to understand and is easier trace bugs.
push branch lab3 to remote repo origin
```
$ git push -u origin lab3
```
In the following, you should work on this lab3 branch, and push to the origin’s lab3 branch to submit your work. OSI staff will grade your origin:lab3 branch, so if you have any updates you want to keep from previous labs, you should merge/rebase to the lab3 branch.

Section 1: Tune QUANTUM on your machine

Since qemu’s performance highly depends on the machine it runs on, you need to tune the QUANTUM (defined in library/egos.h) to receive timer interrupts in a reasonable speed.

Exercise 1 Tune QUANTUM on your machine

fetch quantum_tuning branch
<in your lab repo>
$ git fetch upstream quantum_tuning
create a local quantum_tuning branch
$ git checkout --no-track -b quantum_tuning upstream/quantum_tuning
run egos
$ make && make qemu
...
[CRITICAL] ------------- Booting -------------
[CRITICAL] This is a simple timer example
[CRITICAL] Got a timer interrupt!
[CRITICAL] Got a timer interrupt!
...
We hope you see the message “Got a timer interrupt!” 1–3 times per second.
If the frequency is too high or too low, go to library/egos.h and modify the value of QUANTUM accrodingly.
Finally, apply your tuned version of QUANTUM in your lab3 branch.

Section 2: Kernel scheduler and scheduling metrics

A. Process organization

egos manages processes mainly in file grass/process.h and grass/process.c. Below we introduce the basic design of egos processes.

Process status

The status is defined in process.h, including:

    PROC_UNUSED,
    PROC_LOADING,  /* allocated and wait for loading elf binary */
    PROC_READY,    /* finished loading elf and wait for first running */
    PROC_RUNNING,
    PROC_RUNNABLE,
    PROC_PENDING_SYSCALL,
    PROC_SLEEPING

For a scheduler, the status can be classified as:

the current running process: PROC_RUNNING
available for scheduling: PROC_READY, PROC_RUNNABLE
unavaiable for scheduling: PROC_LOADING, PROC_PENDING_SYSCALL, and PROC_SLEEPING (you will implement sleeping processes later).

Process life cycle

A life cycle of a process starts with (1) being allocated (proc_alloc() in process.c), then (2) being scheduled to run for multiple times (proc_yield() in grass/scheduler.c), and finally (3) being freed (proc_free() in process.c).

Accordingly, the following state–transition callback functions (in scheduler.c) will be called:

proc_on_arrive(int pid): when pid is created
proc_on_sched_in(int pid) when pid is scheduled to run
proc_on_sched_out(int pid) when pid is descheduled
proc_on_stop(int pid): when pid is destroyed

Later, in your Exercise 2, you will need to collect scheduling information in the above functions.

Process control block (PCB)

Each process is associated with one process control block, namely PCB. PCB contains the necessary information for process to context switch and metadata for syscalls and scheduling. The PCB is the struct process defined in grass/process.h

struct process {
    int pid;
    struct syscall syscall;
    enum proc_status status;
    uint mepc, saved_registers[32];
    // scheduling attributes
    union {
        unsigned char      chars[64];
        unsigned int       ints[16];
        float              floats[16];
        unsigned long long longlongs[8];
        double             doubles[8];
    } schd_attr;
};

In this lab, you will design how to use union schd_attr to save necessary information for your scheduler.

PCB array

To manage PCBs, egos use a simple array, struct process proc_set[MAX_NPROCESS + 1], defined in grass/kernel.c. By design, egos can have at most MAX_NPROCESS number of processes, and the slot 0 is a place holder.

Process id (pid) vs. index of PCB array

The pid and the index in PCB array has a deterministic 1-on-1 mapping: the pid equals the array index. For example, the first process, sys_proc, has the pid=1 and is stored at the proc_set[1]; the second process, sys_file, has the pid=2 and is stored at the proc_set[2].

By design, the process id starts with 1 and the first four processes belong to the OSes. Starting from pid=5 is the user’s applications.

Current running process

The scheduler keeps record of which is the current running process: curr_proc_idx is the index to the PCB array and points to the current running process. We also provide two helper macros, curr_pid (the current pid) and curr_status (the current process status). Both are defined in process.h.

B. Scheduler and scheduling metrics

OS scheduler

proc_yield() in grass/scheduler.c determines the next process to run. It performs bookkeeping and then invokes schedule(), which executes the scheduling algorithm to select the next runnable process. Currently, the system uses a naive round-robin scheduler.

Scheduling metrics

In the following, you will calculate four scheduling metrics and print them when a process exits:

turnaround time: the duration from the process’s arrival and its completion
response time: the duration from the process’s arrival to its first scheduling on the CPU
CPU time: the actual time the process spends executing on the CPU
the number of schedules: the count of times the process is scheduled for execution, specifically the number of transitions to the PROC_RUNNING state before the process exits.

Exercise 2 Calculate scheduling metrics

collect scheduling information in state-transition functions within grass/scheduler.c
calculate the four metrics above
hints:
search “[lab3-ex2]” and read the corresponding comments
you need to plan how to use the 64B schd_attr in PCB
you can get the current time by earth->gettime(); this is a 64-bit timestamp (in unsigned long long)
we measure the turnaround/response/CPU time in the unit of scheduling time slice QUANTUM, defined in egos.h.
when finished, you should see:
➜ /home/cs6640 % ls
./     ../     README
[INFO] proc[5] finished after 4 yields, turnaround time: 0.44, response time: 0.41, cputime: 0.01
➜ /home/cs6640 % echo

[INFO] proc[5] finished after 2 yields, turnaround time: 0.40, response time: 0.39, cputime: 0.00
Note: the actual numbers for turanaround/response/CPU time may vary a lot. This depends on your local environment and your machine.

Section 3: Multi-Level Feedback Queue

The current scheduler is inefficient for latency-sensitive applications. To see this, try run loop & first and then run ls: (To see what loop does, check apps/user/loop.c.)

<run "loop" in the background>
➜ /home/cs6640 % loop &
[INFO] process 5 running in the background
➜ /home/cs6640 % ls
<you will wait for a while to see the result>

The reason why ls responses super slow is becuase the scheduler doesn’t prioritize system processes (which provide essential services) and short-lived processes (like ls) over long running jobs (like loop).

In the following, you will implement Multi-Level Feedback Queue (MLFQ) to overcome this shortcoming and make ls repond faster. If you want to know MLFQ basics, read OSTEP Chapter 8.

You will implement a simple three-level MLFQ:

There will be three queues: high-/mid-/low-priority queues.
The scheduler will always run processes in higher priority queues.
Within each queue, the processes run in round-robin fashion.
When a process is created, it is placed at the high-priority queue.
Once a process uses up a QUANTUM of CPU time, the scheduler downgrades it to a lower-level queue. So, a process will be placed on the mid-priority queue when its CPU time is > 1 and < 2.
Processes with CPU time > 2 will stay in the low-priority queue until they finish.

Exercise 3 Implement your MLFQ

update Makefile to start using your MLFQ. Change the first line of Makefile from SCHEDULER=NAIVE to SCHEDULER=MLFQ (this enables macros MLFQ in C code).
update your queue in library/queue.h (you should reuse functions from your lab2)
in grass/mlfq.c, implement mlfq_init(), mlfq(), and any additional helper functions as needed.
when finished, you should see ls runs much faster:
➜ /home/cs6640 % loop &
[INFO] process 5 running in the background
➜ /home/cs6640 % ls
<you should see results immediately>

Discussion: What happens if the shell blocks low-priority processes?

In our labs, sys_shell uses “polling” to continously check for user input. As a result, the sys_shell process is always PROC_RUNNABLE when waiting for inputs.
Also, as specified, the process of sys_shell should have the highest priority.
This creates a starvation problem: for a long-running process like loop, once they leave the high-queue, they are never executed again.
To fix this, a scheduler should explicitly pick any runnable process other than sys_shell, when sys_shell is the the current running process.
Or, a better approach is is to exclude the currently running process when selecting the next process. Specifically, upon a timer interrupt, the scheduler should pick a process from the MLFQ to run before reinserting the current process back into the queues.

Finally, submit your work

Submitting consists of three steps:

Executing this checklist:
- Fill in ~/osi/egos/slack/lab3.txt.
- Make sure that your code build with no warnings.

Push your code to GitHub:

 $ cd ~/osi/egos/
 $ git commit -am 'submit lab3'
 $ git push origin lab3

 Counting objects: ...
  ....
  To github.com/NEU-CS6640-labs/egos-<YOUR_ID>.git
     c1c38e6..59c0c6e  lab3 -> lab3

Actually commit your lab (with timestamp and git commit id):
1. Get the git commit id of your work. A commit id is a 40-character hexadecimal string. You can obtain the commit id for the last commit by running the command git log -1 --format=oneline.
2. Submit a file named git.txt to Canvas. (there will be an assignment for this lab on Canvas.) The file git.txt contains two lines: the first line is your github repo url; the second line is the git commit id that you want us to grade. Here is an example:
```
 git@github.com:NEU-CS6640-labs/egos-<YOUR_ID>.git
 29dfdadeadbeefe33421f242b5dd8312732fd3c9
```
  Notice: the repo address must start with git@github.com:... (not https://...). You can get your repo address on GitHub repo page by clicking the green “Code” button, then choose “SSH”.
3. Note: You can submit as many times as you want; we will grade the last commit id submitted to Canvas. Also, you can submit any commit id in your pushed git history; again, we will grade the commit id submitted to Canvas.
  Notice: if you submit multiple times, the file name (git.txt) changes to git-N.txt where N is an integer and represents how many times you’ve submitted. We will grade the file with the largest N.

NOTE: Ground truth is what and when you submitted to Canvas.

A non-existent repo address or a non-existent commit id in Canvas means that you have not submitted the lab, regardless of what you have pushed to GitHub—we will not grade it. So, please double check your submitted repo and commit id!
The time of your submission for the purposes of tracking lateness is the timestamp on Canvas, not the timestamp on GitHub.

This completes the lab.