Lab3: Concurrent Key-Value Store

In Lab3, you will write a simple concurrent Key-Value store (short as KV-store). A KV-store (sometimes called key-value database) is a data storage that saves and retrieves data by keys.

Lab3 designs a toy KV-store that

contains at most a fixed number of keys (the number is defined by MAX_TABLE in lab3.h).
the key is a short string (max length of 30).
each key has an associated integer value.
the KV-store has four operations: read, write, increase, and delete.
writes can insert new key-value pairs to the KV-store.

The KV-store has three components: a frontend server (defined in kvserver.c), a set of workers (defined in worker.c), and a backend KV-store (defined in kvstore.c). They talk through a concurrent queue. In particular, the kvserver puts an request (e.g., a write) to the queue; workers fetch the request from the queue, parse it (e.g., “oh, a write”), and execute it on the backend KV-store (e.g., write the value to the store).

You will leverage multi-threading to accelerate the KV-store. The KV-store has a main thread and multiple worker threads.

Section 0: Getting started

Click the GitHub Lab3 link on Canvas homepage to create your Lab3 clone on GitHub.
Start your VM and open the VM’s terminal.
Clone Lab3 repo to VM:
```
$ cd ~
$ git clone git@github.com:NEU-CS3650-labs/lab3-<Your-GitHub-Username>.git lab3
```
Note that the repo address git@github.com:... can be obtained by going to GitHub repo page (your cloned lab3), clicking the green “Code” button, then choose “SSH”.

Check contents:

$ cd ~/lab3
$ ls
// you should see:

Makefile    kvstore.c  lab3.h   queue.h    stats.c  worker.h  ops.txt
kvserver.c  kvstore.h  queue.c  slack.txt  stats.h  worker.c

updated (02/12) install required packages and make:
```
$ sudo apt install lib-dev
```
(see Canvas homepage for the VM’s password.)
Make sure you can make:
```
$ make
// you should be able to compile without problems
```

Note: all the shell commands below happen within the folder ~/lab3/.

Section 1. Multi-threaded key-value store

We provide you some skeleton code for the KV-store (in file kvserver.c). The KV-store server will have (by default) five threads:

The main thread. This thread reads user’s inputs from a file and has implemented four commands.
- load <f>: load workloads from a file <file>
- stats: print KV-store’s statistics
- list: dump current key-value pairs
- quit: quit KV-store server
The dispatcher thread. It reads a sequence of KV operations (called a workload) from a file and enqueues them to a global concurrent queue.
Worker threads. They wait on the queue for new requests and handle requests to write, read, increase, or delete individual database keys.

Exercise 0 Compile and play with the skeleton code.

You can compile the current KV-store and play with it.
 $ make
 $ ./kvserver
 cs3650>         // you should see this prompt
Type in help to see all three commands, and quit by typing in quit and pressing enter.

You will use pthreads (POSIX Threads) to create threads and their mutex and condition variable implementations for synchronization. You need to know pthreads to finish Lab3. Here is a pthread tutorial.

KV-store’s server code is in file kvserver.c, in which you need to create all the threads mentioned above.

Exercise 1 create dispatcher and worker threads.

Read kvserver.c and understand how it works. (note: you don’t have to fully understand every detail, but you should understand the architecture.)
Search “TODO” and read comments under “Exercise 1”.
Finish the task of creating threads by using pthread library.
hint (02/14) you should not call thread_join for worker threads because they are long-living threads and will never finish. (to see what the workers are doing, check the worker(...) function in worker.c)
Test worker threads by:
 $ make; ./kvserver
 ...
 // you should see:
 [INFO] worker thread dequeue NULL...
 [INFO] worker thread dequeue NULL...
 [INFO] worker thread dequeue NULL...
 [INFO] worker thread dequeue NULL...
 cs3650>
Test the dispatcher thread by:
 cs3650> load ops.txt
 // you should see:
 [INFO] #ops=[4] are loaded
 cs3650>
Notes:
the prompt “cs3650>” and the messages “[INFO] …” may intertwine. Why? because these are printed by different threads.
take a look at file ops.txt to understand how operations are defined. Try to add one more line and then load ops.txt to see what will happen. (the kvserver should say five ops are loaded.)

Section 2. Implement concurrent queue, statistics, and KV-store

In this section of Lab3 (Exercise 2/3/4), your coding style matters, which is different from prior labs. We (or a program) will read your code. If you violate the concurrency coding requirements below, we will deduct up to 20% of total lab points based on our subjective evaluation.

Requirement: monitor. In Lab3, you must implement shared states as monitors. We cover monitors in class, and here is a monitor chapter from OSTEP.

Notice that C language doesn’t have a native monitor support or object oriented programming. Hence, your are going to implement “manually constructed” monitors in C style, wherein:

all function calls are protected by a mutex (that is, the programmer inserts those acquire()/release() on entry and exit from every procedure that works with the shared state).
synchronization happens with condition variables whose associated mutex is the mutex that protects the functions.
Note that you do not have to use condition variables in every Monitor implementation; it’s just that whenever you use condition variables, you must follow the above point.

Lab3 has three pieces of state shared between different threads. They are (1) request queue, (2) KV-store statistics, and (3) KV-store storage. In this section, you will implement or fix them all.

Concurrent request queue

Request queue is shared between the dispatcher thread and worker threads. The dispatcher thread will read operations (read, write, increase, or delete) from a file (e.g., ops.txt), create new requests, and enqueue these requests. Meanwhile, worker threads dequeue tasks and process them concurrently.

Exercise 2 implement concurrent task queue.

Read queue’s data structure in queue.h.
Read the dispatcher thread (in kvserver.c) and worker threads (in worker.c) to understand how they use the queue.
Fill in TODO in queue.c to implement the queue. (Note: you should define your own mutexes and condition variables, and don’t forget to initialize them.)
When done, test your code by (1) running the server:
 $ make; ./kvserver
 cs3650>
 // you should not see "[INFO] worker thread dequeue NULL..." anymore.
and (2) running meaningful workloads:
 cs3650> load ops.txt
 // you should see: [updated 02/21]
 [INFO] #ops=[4] are loaded
 [INFO] read key[hello]
 [INFO] write key[hello]=val[10]
 [INFO] increase key[hello]+=val[8]
 [INFO] read key[hello]
Note: you can check the running trace by $ cat kvserver.trace to see the detailed op executions. This will help you debug your kvstore.

KV-store statistics

The KV-store keeps track of the queue length and the numbers of executed operations (reads/writes/increases/deletes). You can check these statistics by running stats in kvserver’s console:

$ ./kvserver
cs3650> stats
// you should see something like
queued:    0
writes:    0
reads:     0
deletes:   0
increases: 0

These statistics are shared across worker threads. Current implementation in stats.c is is not thread-safe.

Exercise 3 fix statistics tracking.

Read stats.h and workers.c to understand how worker threads update the statistics.
Fix functions in stats.c, and implement them to be thread-safe.

Key-value storage

Key-value storage is the core of the KV-store. Lab3 skeleton code (kvstore.h and kvstore.c) provides you a naive in-memory key-value storage. In this naive in-memory key-value storage:

keys and values are stored in two fix-sized arrays.
a key associates to a value by using the same index number. As an example, if key “KEY0” and value 1 are paired and the key “KEY0” is stored in the first slot of key array (i.e., keys[0]), then the value 1 should be stored in the first slot of value array (i.e., values[0]).
Keys are arbitrary letter-digit strings; they can be “KEY0” or “KEY-0” or “WHATSOEVER” or “xi0xw32x0sJ”; meaning, they do not have certain formats.

Exercise 4 finish the concurrent key-value storage.

Read kvstore.h and workers.c to understand how worker threads use the KV-store.
Implement functions in kvstore.c for a thread-safe concurrent KV-store.
When done, test your code by (1) running the server and loading workloads:
 $ make; ./kvserver
 cs3650> load ops.txt
(2) checking if the stats are what we expect
 cs3650> stats
 queued:    0
 writes:    1
 reads:     2
 deletes:   0
 increases: 1
If you have modified ops.txt before, you will see something different.
and (3) checking if the kv-store works as expected:
 cs3650> list
 // you will see someting like
=== kv-store ===
[0] hello: 18
Note: the list will call your implementation of kv_dump() in kvstore.c. So, the output will be different from above. But, for the original ops.txt, you should see one key “hello” with either value 18 or 10 (depending on which op runs first, the increase or the write).

Section 3. Concurrency testing

Debugging and testing a piece of concurrent code is hard. What you will do in Exercise 5 is design your concurrent test cases.

Exercise 5 write your own test cases.

Read ops.txt and parsereq() in kvserver.c to understand how operations are defined and parsed.
Implement at least two test cases:
large test cases (large.txt): the test case must be “large”. It should have about 10K operations (meaning the file has about 10K lines). The ops must include all four operations (read/write/increase/delete). Name the test case large.txt.
pressure test cases (pressure.txt): the test case must insert more than 1K keys. The test case must have writes and deletes. Name the test case pressure.txt.
You want to think about what “correctness conditions” the two test cases you will check. Namely, if your kv-store finishes, how do you know it works as expected?
You’re encouraged to implement more test cases.
You probably don’t want to write these large test cases out manually. You will need a program (not necessarily in C) to generate these test case files.

Note: you need to check in (meaning git add) your test cases, at least the above two, to your repo, and submit them to the gradescope.

(optional) Challenge: faster key-value storage

This challenge is optional; successfully finishing challenges will give you extra credits for labs. The credits are transferable to other labs to cover up any point losses in this or other labs. But the credits cannot transfer outside labs. Meaning, if you got all points of all labs, the extra credits won’t improve your overall grade.

As mentioned in Exercise 4, we provide you a naive KV-storage implementation, which is two arrays. Of course, there are better data structures—hash table, binary-search tree, trie, just to name a few. In this challenge, you will re-implement the key-value storage in kvstore.h and kvstore.c for better performance.

Several notes if you take this challenge:

we will not check your coding style in kvstore.c. (meaning, you do not have to follow the “monitor” requirements in kvstore.h/c; this will overwrite the requirement of Exercise 4.)
your faster KV-store will be used for the non-challenge test cases (for correctness) as well. After all, you will only submit one pair of kvstore.h/c files. (meaning, if your faster version has a concurrency bug, you will lose points, which is more likely to happen than the naive version.)
The extra points you will get depend on other challengers. The rules are:
1. The max extra credits are 20 points.
2. The fastest challenger will get 20 credits. (Say its running time is min sec.)
3. The baseline (our implementation) will be 0 credit. (Say baseline runs in max sec.)
4. For other challengers whose code running time sits between the two (say x sec) will get extra points of (max - x) / (max - min) x 20. For example, if min=1, max=101, and your code runs x=21, your will get (101-21) / (101-1) x 20 = 16.
5. If your implementation is slower than baseline, you get 0 extra credit.
6. If your implementation crashes or stops responding, you get 0 extra credit.
For the performance evaluation, we will discard the queue and run your code directly. So you must (1) write all your KV-storage code in kvstore.h/c, and (2) strictly follow the key-value store APIs defined in kvstore.h (namely, kv_read/kv_write/kv_increase/kv_delete).
It is a good habit to develop performance measurement code first then start your performance tuning.

Here are some details of the benchmark that we run to evaluate performance:

The evaluation benchmark is a mixed workload of read/write/increase/delete operations. The ratio #reads:#writes:#increase:#deletes is 8:1:0.5:0.5.
The evaluation metric (the “running time” mentioned above) is the total running time of all operations, which only include the running time of functions kv_read/kv_write/kv_increase/kv_delete.
We will increase MAX_TABLE in lab3.h to a much larger value. So, please do not expect there will only be 200 keys.

Challenge faster key-value store.

Write down Yes to accept the challenge in slack.txt.
Re-implement kvstore.h/c.
Don’t forget to thoroughly test your new implementation for correctness.

Finally, submit your work

Submitting consists of three steps:

Executing this checklist:
- Fill in ~/lab3/slack.txt with (1) your name, (2) your NUID, (3) slack hours you used, and (4) acknowledgements.
- Make sure that your code builds with no warnings.
  note: we will apply a 10% penalty to the compilation warnings you have.
- Make sure you have added (git add) all files that you created (if any).
- If you took the challenge, write Yes for challenge in slack.txt.

Push your code to GitHub:

 $ cd ~/lab3
 $ git commit -am 'submit lab3'
 $ git push origin 

 Counting objects: ...
  ....
  To ssh://github.com/NEU-CS3650-labs/lab3-<username>.git
   7337116..ceed758  main -> main

Actually submit your lab via Gradescope:
- Navigate to https://www.gradescope.com/ and click on log in.
- Select login with “School Credentials” and select “Northeastern University”.
- Enter Northeastern SSO login information and you should be able to log in to your gradescope account.
- Now, on Canvas, go to the CS3650 course and click on “Gradescope 1.3” from the left navigation bar. You would then be asked to accept the course invitation after which you can access the course on Gradescope.
- On Gradescope, select the lab/assignment you wish to submit and click on “Upload Submission”.
- You would then be asked to upload a zip file consisting of the files the lab/assignment specifies.
  Note: you can either zip all files within your lab folder, or zip your lab folder whose name must start with “lab3-“ (this is supposed to be your GitHub repo name, lab3-<username>). If you zip a folder named, for example, “mysubmit”, the Gradescope will complain.
- After uploading the zip file, the autograder will evaluate your submission and based on it provide a score for your submission.
- After the manual grading process is performed by the TAs, your final score for the lab/assignment will be released.

This completes the lab.

Acknowledgments

This lab is created by Peter Desnoyers, with modifications by the 24spring CS3650 staff.