CS5600 Lab3: Concurrent Key-Value Store
In Lab3, you will write a simple concurrent Key-Value store (short as KV-store). A KV-store (sometimes called key-value database) is a data storage that saves and retrieves data by keys.
Lab3 designs a toy KV-store that
- contains at most a fixed number of keys (the number is defined by
MAX_TABLE
inlab3.h
). - the key is a string.
- each key has an associated value (another string).
- the KV-store has three operations: read, write, and delete.
- writes can insert new key-value pairs to the KV-store.
The KV-store has two components: a server (called dbserver
) and a client (called dbclient
). They talk through network. In particular, the dbclient
sends an operation (e.g., a write) to the dbserver
; dbserver
parses the operation (e.g., “oh, a write”), executes it (e.g., write the value to storage), and returns the results (e.g., “okay”); finally, dbclient
receives the result and print it on screen.
You will leverage multi-threading to accelerate the KV-store server. The server has a console thread (or main thread), a network listener thread, and multiple worker threads.
Challenge: Lab3’s challenge is optional. In this challenge, you will try your best to improve the performance of the KV-store, which is described at the end of the lab.
Section 0: Getting started
- Click the GitHub Lab3 link on Canvas homepage to create your Lab3 clone on GitHub.
- Start your VM and open the VM’s terminal.
- Clone Lab3 repo to VM:
$ cd ~ $ git clone git@github.com:NEU-CS5600-labs/lab3-<Your-GitHub-Username>.git lab3
Note that the repo address
git@github.com:...
can be obtained by going to GitHub repo page (your cloned lab3), clicking the green “Code” button, then choose “SSH”. - Check contents:
$ cd ~/lab3 $ ls // you should see: Makefile dbserver.c kvstore.h queue.c slack.txt stats.h worker.h dbclient.c kvstore.c lab3.h queue.h stats.c worker.c
- install a required library: zlib
$ sudo apt install zlib1g-dev
Note: all the shell commands below happen within the folder ~/lab3/
.
Section 1. Multi-threaded key-value store
We provide you some skeleton code for the KV-store (in file dbserver.c
). The KV-store server will have (by default) six threads:
- The console (main) thread. This thread reads user’s inputs and has implemented three commands.
stats
: print KV-store’s statisticslist
: dump current key-value pairsquit
: quit KV-store server
- The listener thread. It opens a TCP socket, accept incoming connections, and add tasks to a queue for the worker threads.
- Four (by default) worker threads. They wait for new tasks and handle requests to write, read, or delete individual database keys.
Exercise 0 Compile and play with the skeleton code.
You can compile the current KV-store and play with it.
$ make $ ./dbserver cs5600> // you should see this prompt
Type in
help
to see all three commands, and quit by typing inquit
and pressing enter.Note: program
dbserver
will listen on port 5600 for TCP connections. If port 5600 has been used, the program will refuse to start.
You will use pthreads (POSIX Threads) to create threads and their mutex and condition variable implementations for synchronization. You need to know pthreads to finish Lab3. Here is a pthread tutorial.
KV-store’s server code is in file dbserver.c
, in which you need to create all the threads mentioned above.
Exercise 1 create listener and worker threads.
- Read
dbserver.c
and understand how it works. (note: you don’t have to fully understand every details.)- Search “TODO” and read comments under “Exercise 1”.
- Finish the task of creating threads by using pthread library.
- Test your code by:
$ make; ./dbserver ... // you should see: [INFO] worker thread dequeue NULL... [INFO] worker thread dequeue NULL... [INFO] worker thread dequeue NULL... [INFO] worker thread dequeue NULL... cs5600>
Note that the prompt “cs5600>” and the messages “[INFO] worker…” may intertwine. Why? because these are printed by different threads.
Section 2. Implement concurrent queue, statistics, and KV-store
In this section of Lab3 (Exercise 2/3/4), your coding style matters, which is different from prior labs. We (or a program) will read your code. If you violate the concurrency coding requirements below, we will deduct up to 20% of total lab points based on our subjective evaluation.
Requirement I: concurrency commandments (namely, six commandements). You need to re-read Mike Dahlin’s Coding Standards for Programming with Threads. You are required to follow these standards for this lab. In the real world, unclear multi-threaded code is particularly dangerous—because it is difficult or impossible to test. Moreover, if it’s not clear, then the programmer who comes after you cannot debug it, maintain it, or add new features.
Requirement II: monitor. In Lab3, you must implement shared states as monitors. We covered monitors in class (see the notes), and here is a monitor chapter from OSTEP.
Notice that C language doesn’t have native monitor supports or object oriented programming. Hence, your are going to implement “manually constructed” monitors in C style, wherein:
- all function calls are protected by a mutex (that is, the programmer inserts those acquire()/release() on entry and exit from every procedure that works with the shared state).
- synchronization happens with condition variables whose associated mutex is the mutex that protects the functions.
- Note that you do not have to use condition variables in every Monitor implementation; it’s just that whenever you use condition variables, you must follow the above point.
Lab3 has three pieces of state shared between different threads. They are (1) task queue, (2) KV-store statistics, and (3) KV-store storage. In this section, you will implement or fix them all.
Concurrent task queue
Task queue is shared between the listener thread and worker threads. The listener thread will read operations (read, write, or delete) from clients through network, create new tasks, and enqueue
these tasks. Meanwhile, worker threads dequeue
tasks and process them concurrently.
Exercise 2 implement concurrent task queue.
- Read queue’s data structure in
queue.h
.- Read the listener thread (in
dbserver.c
) and worker threads (inworker.c
) to understand how queue works.- Fill in
TODO
inqueue.c
to implement the queue. (Note: you should define your own mutexes and condition variables, and don’t forget to initialize them.)- When done, test your code by (1) running the server on one terminal:
$ make; ./dbserver cs5600> // you should not see "[INFO] worker thread dequeue NULL..." anymore.
- and (2) running client on another terminal:
$ ./dbclient --set=KEY0 ABC // you should see: val set to ABC ok
$ ./dbclient --get=KEY0 // you should see: READ: FAILED (X)
You can run ./dbclient --help
to see its detailed usage. Also notice that even the ./dbclient --set...
returns ok
, your read from the same key fails. This is because you haven’t implemented the key-value storage, which is your Exercise 4.
KV-store statistics
The KV-store keeps track of the queue length and the numbers of executed operations (reads/writes/deletes/failures). You can check these statistics by running stats
in dbserver
’s console:
$ ./dbserver
5600> stats
// you should see something like
queued: 0
writes: 0
reads: 0
deletes: 0
failures: 0
These statistics are shared across worker threads. Current implementation in stats.c
is buggy and is not thread-safe.
Exercise 3 fix statistics tracking.
- Read
stats.h
andworkers.c
to understand how worker threads update the statistics.- Fix functions in
stats.c
, and implement them to be thread-safe.
Key-value storage
Key-value storage is the core of the KV-store. Lab3 skeleton code (kvstore.h
and kvstore.c
) provides you a naive in-memory key-value storage. In this naive in-memory key-value storage:
- keys and values are stored in two fix-sized arrays.
- a key associates to a value by using the same index number. As an example, if key “KEY0” and value “ABC” are paired and the key “KEY0” is stored in the first slot of key array (i.e.,
keys[0]
), then the value “ABC” should be stored in the first slot of value array (i.e.,values[0]
). - Keys are arbitrary strings; they can be “KEY0” or “KEY-0” or “WHATSOEVER” or “xi,0xw32x0s>J%” (meaning, they do not have certain formats).
Exercise 4 finish the concurrent key-value storage.
- Read
kvstore.h
andworkers.c
to understand how worker threads use the KV-store.- Implement functions in
kvstore.c
for a thread-safe concurrent KV-store.- When done, test your code by (1) running the server on one terminal:
$ make; ./dbserver cs5600>
- and (2) running client on another terminal:
$ ./dbclient --set=KEY0 ABC // you should see: val set to ABC ok
$ ./dbclient --get=KEY0 // you should see: ="ABC"
Section 3. Concurrency testing
As mentioned in class, debugging and testing a piece of concurrent code is hard. What you will do in Exercise 5 is to write your own testing code, which is supposed to be multi-threading as well.
Your testing code should: (note: these are suggestions not requirements)
- use multiple threads,
- issue concurrent and random operations to
dbserver
, - and compare read/write/delete results to see if your
dbserver
is correct.
We provide some helper functions for you in dbclient.c
.
Exercise 5 write your own concurrency testing code.
- Read exercise 5’s description in
dbclient.c
.- Implement your test code.
Beyond dbclient
side information (which is under your control), dbserver
also produces a debug trace file dbserver.trace
(in the same folder as dbserver
). This trace file contains all operations executed by the current running dbserver
. Note that the trace file will be overwritten when running dbserver
another time.
We expect you to write reasonably comprehensive test cases. We will spend some efforts on analyzing your test cases. If your test cases are too simple (based on our subjective evaluation), you might get up to 5% penalty on your Lab3 scores.
(optional) Challenge: faster key-value storage
As always, the challenge is optional; successfully finishing challenges will give you extra credits.
As mentioned in Exercise 4, we provide you a naive KV-storage implementation, which is an array (in fact, two). Of course, there are better data structures—hash table, binary-search tree, neural networks, just to name a few. In this challenge, you will re-implement the key-value storage in kvstore.h
and kvstore.c
for better performance.
Several notes if you took this challenge:
we will not check your coding style in
kvstore.c
. (meaning, you do not have to follow “six rules” and “monitor” requirements inkvstore.h/c
; this will overwrite the requirement of Exercise 4.)your faster KV-store will be used for the non-challenge test cases (mostly for correctness) as well. After all, there is only one pair of
kvstore.h/c
files. (meaning, if your faster version has a concurrency bug, you lose points because of that, which is more likely to happen than the naive version.)The extra points you will get depend on other challengers. The rules are:
- The total extra credits are 20 points.
- The fastest challenger will get 100% of the credits. (Say its running time is
min
sec.) - The slowest challenger will get 20% of the credits. (Say baseline runs
max
sec.) - For other challengers whose code running time sits between the two (say
x
sec) will get extra points of(max - x) / (max - min) x 80% + 20%
. For example, ifmin=1
,max=101
, and your code runsx=21
, your score is84% = (101-21) / (101-1) x 80% + 20%
of the total 20 points.
For the performance evaluation, we will discard the network part and run your code directly. So you must (1) write all your KV-storage code in
kvstore.h/c
, and (2) strictly follow the key-value store APIs defined inkvstore.h
(namely,kv_read/kv_write/kv_delete
).It is a good habit to develop performance measurement code first then start your performance tuning.
Here are some details of the benchmark that we run to evaluate performance:
The evaluation benchmark is a mixed workload of read/write/delete operations. The ratio
#reads:#writes:#deletes
is8:1.5:0.5
.The evaluation metric (the “running time” mentioned above) is the total running time of all operations, which only include the running time of functions
kv_read/kv_write/kv_delete
.We will increase
MAX_TABLE
inlab3.h
. So, please do not expect200
keys at most.
Challenge faster key-value store.
- Re-implement
kvstore.h/c
.- Don’t forget to thoroughly test your new implementation.
Finally, submit your work
Submitting consists of three steps:
- Executing this checklist:
- Fill in
slack.txt
with (1) your name and NUID, (2) slack hours you used, (3) if you took the challenge, and (4) acknowledge your influences. - Make sure that your code can build without problems.
- Make sure you have added (
git add
) all files that you created. - Make sure you have committed all your changes (check with
git status
).
- Fill in
Push your code to GitHub:
$ cd ~/lab3 $ git commit -am 'submit lab3' $ git push origin Counting objects: ... .... To ssh://github.com/NEU-CS5600-labs/lab3-<username>.git 7447116..deadbeef main -> main
Actually commit your lab (with timestamp and git commit id):
Get the git commit id of your work. A commit id is a 40-character hexadecimal string. You can obtain the commit id for the last commit by running the command
git log -1 --format=oneline
.- Submit a file named
git.txt
to Canvas. (there will be an assignment for this lab on Canvas.) The filegit.txt
contains two lines: the first line is your github repo url; the second line is the git commit id that you want us to grade. Here is an example:git@github.com:NEU-CS5600-labs/lab3-<username>.git 29dfda9c788fade33421f242b5dd8312732fd3c9
Notice: the repo address must start with
git@github.com:...
(nothttps://...
). You can get your repo address on GitHub repo page by clicking the green “Code” button, then choose “SSH”. - Note: You can submit as many times as you want; we will grade the last commit id submitted to Canvas. Also, you can submit any commit id in your pushed git history; again, we will grade the commit id submitted to Canvas.
Notice: if you submit multiple times, the file name (git.txt
) changes togit-N.txt
whereN
is an integer and represents how many times you’ve submitted. We will grade the file with the largestN
.
NOTE: Ground truth is what and when you submitted to Canvas.
alertA non-existent commit id in Canvas means that you have not submitted the lab, regardless of what you have pushed to GitHub—we will not grade it. So, please double check your submitted commit id!
The time of your submission for the purposes of tracking lateness is the timestamp on Canvas, not the timestamp on GitHub.
This completes the lab.
Acknowledgments
This lab is created by Peter Desnoyers, with modifications by the CS5600 staff.