OSI Lab3: Kernel scheduler and MLFQ
In this lab, you will explore how to better schedule processes. The current egos is using a naive scheduler that schedules the next available process in an array (similar to Round-Robin), which is inefficient for latency-sensitive processes, like shell.
In this lab, your tasks are:
- collecting scheduling metrics (like turnaround time),
- and implementing a Multi-level Feedback Queue scheduler.
Your modifications will be within three files library/queue.h, grass/scheduler.c, and grass/mlfq.c.
Fetch lab3 from upstream repo
Lab3 is pushed to the upstream branch upstream/lab3. Note that upstream points to the git@github.com:NEU-CS6640-labs/egos-upstream.git. Next, you will fetch the branch and create a lab3 branch on your local repo. (note: your private GitHub repo origin, pointing to git@github.com:NEU-CS6640-labs/egos-<YOUR_ID>.git).
Here are steps to fetch lab3:
- fetch
lab3branch$ git fetch upstream lab3 <you should see:> ... * [new branch] lab3 -> upstream/lab3If you get
error: No such remote 'upstream', rungit remote add upstream git@github.com:NEU-CS6640-labs/egos-upstream.git. - create a local
lab3branch$ git checkout --no-track -b lab3 upstream/lab3 <you should see:> Switched to a new branch 'lab3' - confirm you’re on local branch
lab3$ git status <you should see:> On branch lab3 nothing to commit, working tree clean - rebase your previous labs to the current branch
<you're now on branch lab3> $ git rebase lab2You need to resolve any conflict caused by the rebase. Read how to resolve conflict here.
Note:rebasewill give you a cleaner linear commit history (thangit merge), which is easier to understand and is easier trace bugs. - push branch
lab3to remote repoorigin$ git push -u origin lab3 - In the following, you should work on this
lab3branch, and push to theorigin’slab3branch to submit your work. OSI staff will grade yourorigin:lab3branch, so if you have any updates you want to keep from previous labs, you should merge/rebase to thelab3branch.
Section 1: Tune QUANTUM on your machine
Since qemu’s performance highly depends on the machine it runs on, you need to tune the QUANTUM (defined in library/egos.h) to receive timer interrupts in a reasonable speed.
Exercise 1 Tune QUANTUM on your machine
- fetch
quantum_tuningbranch<in your lab repo> $ git fetch upstream quantum_tuning- create a local
quantum_tuningbranch$ git checkout --no-track -b quantum_tuning upstream/quantum_tuning- run
egos$ make && make qemu ... [CRITICAL] ------------- Booting ------------- [CRITICAL] This is a simple timer example [CRITICAL] Got a timer interrupt! [CRITICAL] Got a timer interrupt! ...We hope you see the message “Got a timer interrupt!” 1–3 times per second.
- If the frequency is too high or too low, go to
library/egos.hand modify the value ofQUANTUMaccrodingly.- Finally, apply your tuned version of
QUANTUMin yourlab3branch.
Section 2: Kernel scheduler and scheduling metrics
A. Process organization
egos manages processes mainly in file grass/process.h and grass/process.c. Below we introduce the basic design of egos processes.
Process status
The status is defined in process.h, including:
PROC_UNUSED,
PROC_LOADING, /* allocated and wait for loading elf binary */
PROC_READY, /* finished loading elf and wait for first running */
PROC_RUNNING,
PROC_RUNNABLE,
PROC_PENDING_SYSCALL,
PROC_SLEEPING
For a scheduler, the status can be classified as:
- the current running process:
PROC_RUNNING - available for scheduling:
PROC_READY,PROC_RUNNABLE - unavaiable for scheduling:
PROC_LOADING,PROC_PENDING_SYSCALL, andPROC_SLEEPING(you will implement sleeping processes later).
Process life cycle
A life cycle of a process starts with (1) being allocated (proc_alloc() in process.c), then (2) being scheduled to run for multiple times (proc_yield() in grass/scheduler.c), and finally (3) being freed (proc_free() in process.c).
Accordingly, the following state–transition callback functions (in scheduler.c) will be called:
proc_on_arrive(int pid): whenpidis createdproc_on_sched_in(int pid)whenpidis scheduled to runproc_on_sched_out(int pid)whenpidis descheduledproc_on_stop(int pid): whenpidis destroyed
Later, in your Exercise 2, you will need to collect scheduling information in the above functions.
Process control block (PCB)
Each process is associated with one process control block, namely PCB. PCB contains the necessary information for process to context switch and metadata for syscalls and scheduling. The PCB is the struct process defined in grass/process.h
struct process {
int pid;
struct syscall syscall;
enum proc_status status;
uint mepc, saved_registers[32];
// scheduling attributes
union {
unsigned char chars[64];
unsigned int ints[16];
float floats[16];
unsigned long long longlongs[8];
double doubles[8];
} schd_attr;
};
In this lab, you will design how to use union schd_attr to save necessary information for your scheduler.
PCB array
To manage PCBs, egos use a simple array, struct process proc_set[MAX_NPROCESS + 1], defined in grass/kernel.c. By design, egos can have at most MAX_NPROCESS number of processes, and the slot 0 is a place holder.
Process id (pid) vs. index of PCB array
The pid and the index in PCB array has a deterministic 1-on-1 mapping: the pid equals the array index. For example, the first process, sys_proc, has the pid=1 and is stored at the proc_set[1]; the second process, sys_file, has the pid=2 and is stored at the proc_set[2].
By design, the process id starts with 1 and the first four processes belong to the OSes. Starting from pid=5 is the user’s applications.
Current running process
The scheduler keeps record of which is the current running process: curr_proc_idx is the index to the PCB array and points to the current running process. We also provide two helper macros, curr_pid (the current pid) and curr_status (the current process status). Both are defined in process.h.
B. Scheduler and scheduling metrics
OS scheduler
proc_yield() in grass/scheduler.c determines the next process to run. It performs bookkeeping and then invokes schedule(), which executes the scheduling algorithm to select the next runnable process. Currently, the system uses a naive round-robin scheduler.
Scheduling metrics
In the following, you will calculate four scheduling metrics and print them when a process exits:
- turnaround time: the duration from the process’s arrival and its completion
- response time: the duration from the process’s arrival to its first scheduling on the CPU
- CPU time: the actual time the process spends executing on the CPU
- the number of schedules: the count of times the process is scheduled for execution, specifically the number of transitions to the
PROC_RUNNINGstate before the process exits.
Exercise 2 Calculate scheduling metrics
- collect scheduling information in state-transition functions within
grass/scheduler.c- calculate the four metrics above
- hints:
- search “[lab3-ex2]” and read the corresponding comments
- you need to plan how to use the 64B
schd_attrin PCB- you can get the current time by
earth->gettime(); this is a 64-bit timestamp (inunsigned long long)- we measure the turnaround/response/CPU time in the unit of scheduling time slice
QUANTUM, defined inegos.h.- when finished, you should see:
➜ /home/cs6640 % ls ./ ../ README [INFO] proc[5] finished after 4 yields, turnaround time: 0.44, response time: 0.41, cputime: 0.01 ➜ /home/cs6640 % echo [INFO] proc[5] finished after 2 yields, turnaround time: 0.40, response time: 0.39, cputime: 0.00- Note: the actual numbers for turanaround/response/CPU time may vary a lot. This depends on your local environment and your machine.
Section 3: Multi-Level Feedback Queue
The current scheduler is inefficient for latency-sensitive applications. To see this, try run loop & first and then run ls: (To see what loop does, check apps/user/loop.c.)
<run "loop" in the background>
➜ /home/cs6640 % loop &
[INFO] process 5 running in the background
➜ /home/cs6640 % ls
<you will wait for a while to see the result>
The reason why ls responses super slow is becuase the scheduler doesn’t prioritize system processes (which provide essential services) and short-lived processes (like ls) over long running jobs (like loop).
In the following, you will implement Multi-Level Feedback Queue (MLFQ) to overcome this shortcoming and make ls repond faster. If you want to know MLFQ basics, read OSTEP Chapter 8.
You will implement a simple three-level MLFQ:
- There will be three queues: high-/mid-/low-priority queues.
- The scheduler will always run processes in higher priority queues.
- Within each queue, the processes run in round-robin fashion.
- When a process is created, it is placed at the high-priority queue.
- Once a process uses up a
QUANTUMof CPU time, the scheduler downgrades it to a lower-level queue. So, a process will be placed on the mid-priority queue when its CPU time is > 1 and < 2. - Processes with CPU time > 2 will stay in the low-priority queue until they finish.
Exercise 3 Implement your MLFQ
- update Makefile to start using your MLFQ. Change the first line of Makefile from
SCHEDULER=NAIVEtoSCHEDULER=MLFQ(this enables macrosMLFQin C code).- update your queue in
library/queue.h(you should reuse functions from your lab2)- in
grass/mlfq.c, implementmlfq_init(),mlfq(), and any additional helper functions as needed.- when finished, you should see
lsruns much faster:➜ /home/cs6640 % loop & [INFO] process 5 running in the background ➜ /home/cs6640 % ls <you should see results immediately>
Discussion: What happens if the shell blocks low-priority processes?
- In our labs,
sys_shelluses “polling” to continously check for user input. As a result, thesys_shellprocess is alwaysPROC_RUNNABLEwhen waiting for inputs. - Also, as specified, the process of
sys_shellshould have the highest priority. - This creates a starvation problem: for a long-running process like
loop, once they leave the high-queue, they are never executed again. - To fix this, a scheduler should explicitly pick any runnable process other than
sys_shell, whensys_shellis the the current running process. - Or, a better approach is is to exclude the currently running process when selecting the next process. Specifically, upon a timer interrupt, the scheduler should pick a process from the MLFQ to run before reinserting the current process back into the queues.
Finally, submit your work
Submitting consists of three steps:
- Executing this checklist:
- Fill in
~/osi/egos/slack/lab3.txt. - Make sure that your code build with no warnings.
- Fill in
Push your code to GitHub:
$ cd ~/osi/egos/ $ git commit -am 'submit lab3' $ git push origin lab3 Counting objects: ... .... To github.com/NEU-CS6640-labs/egos-<YOUR_ID>.git c1c38e6..59c0c6e lab3 -> lab3Actually commit your lab (with timestamp and git commit id):
Get the git commit id of your work. A commit id is a 40-character hexadecimal string. You can obtain the commit id for the last commit by running the command
git log -1 --format=oneline.- Submit a file named
git.txtto Canvas. (there will be an assignment for this lab on Canvas.) The filegit.txtcontains two lines: the first line is your github repo url; the second line is the git commit id that you want us to grade. Here is an example:git@github.com:NEU-CS6640-labs/egos-<YOUR_ID>.git 29dfdadeadbeefe33421f242b5dd8312732fd3c9Notice: the repo address must start with
git@github.com:...(nothttps://...). You can get your repo address on GitHub repo page by clicking the green “Code” button, then choose “SSH”. - Note: You can submit as many times as you want; we will grade the last commit id submitted to Canvas. Also, you can submit any commit id in your pushed git history; again, we will grade the commit id submitted to Canvas.
Notice: if you submit multiple times, the file name (git.txt) changes togit-N.txtwhereNis an integer and represents how many times you’ve submitted. We will grade the file with the largestN.
NOTE: Ground truth is what and when you submitted to Canvas.
A non-existent repo address or a non-existent commit id in Canvas means that you have not submitted the lab, regardless of what you have pushed to GitHub—we will not grade it. So, please double check your submitted repo and commit id!
The time of your submission for the purposes of tracking lateness is the timestamp on Canvas, not the timestamp on GitHub.
This completes the lab.