Lab3: Concurrent Key-Value Store
In Lab3, you will write a simple concurrent Key-Value store (short as KV-store). A KV-store (sometimes called key-value database) is a data storage that saves and retrieves data by keys.
Lab3 designs a toy KV-store that
- contains at most a fixed number of keys (the number is defined by
MAX_TABLE
inlab3.h
). - the key is a short string (max length of 30).
- each key has an associated integer value.
- the KV-store has four operations: read, write, increase, and delete.
- writes can insert new key-value pairs to the KV-store.
The KV-store has three components: a frontend server (defined in kvserver.c
), a set of workers (defined in worker.c
), and a backend KV-store (defined in kvstore.c
). They talk through a concurrent queue. In particular, the kvserver
puts an request (e.g., a write) to the queue; workers fetch the request from the queue, parse it (e.g., “oh, a write”), and execute it on the backend KV-store (e.g., write the value to the store).
You will leverage multi-threading to accelerate the KV-store. The KV-store has a main thread and multiple worker threads.
Section 0: Getting started
- Click the GitHub Lab3 link on Canvas homepage to create your Lab3 clone on GitHub.
- Start your VM and open the VM’s terminal.
- Clone Lab3 repo to VM:
$ cd ~ $ git clone git@github.com:NEU-CS3650-labs/lab3-<Your-GitHub-Username>.git lab3
Note that the repo address
git@github.com:...
can be obtained by going to GitHub repo page (your cloned lab3), clicking the green “Code” button, then choose “SSH”. - Check contents:
$ cd ~/lab3 $ ls // you should see: Makefile kvstore.c lab3.h queue.h stats.c worker.h ops.txt kvserver.c kvstore.h queue.c slack.txt stats.h worker.c
- updated (02/12) install required packages and make:
$ sudo apt install lib-dev
(see Canvas homepage for the VM’s password.)
Make sure you can make:$ make // you should be able to compile without problems
Note: all the shell commands below happen within the folder ~/lab3/
.
Section 1. Multi-threaded key-value store
We provide you some skeleton code for the KV-store (in file kvserver.c
). The KV-store server will have (by default) five threads:
- The main thread. This thread reads user’s inputs from a file and has implemented four commands.
load <f>
: load workloads from a file<file>
stats
: print KV-store’s statisticslist
: dump current key-value pairsquit
: quit KV-store server
- The dispatcher thread. It reads a sequence of KV operations (called a workload) from a file and enqueues them to a global concurrent queue.
- Worker threads. They wait on the queue for new requests and handle requests to write, read, increase, or delete individual database keys.
Exercise 0 Compile and play with the skeleton code.
You can compile the current KV-store and play with it.
$ make $ ./kvserver cs3650> // you should see this prompt
Type in
help
to see all three commands, and quit by typing inquit
and pressing enter.
You will use pthreads (POSIX Threads) to create threads and their mutex and condition variable implementations for synchronization. You need to know pthreads to finish Lab3. Here is a pthread tutorial.
KV-store’s server code is in file kvserver.c
, in which you need to create all the threads mentioned above.
Exercise 1 create dispatcher and worker threads.
- Read
kvserver.c
and understand how it works. (note: you don’t have to fully understand every detail, but you should understand the architecture.)- Search “TODO” and read comments under “Exercise 1”.
- Finish the task of creating threads by using pthread library.
- hint (02/14) you should not call
thread_join
for worker threads because they are long-living threads and will never finish. (to see what the workers are doing, check theworker(...)
function inworker.c
)- Test worker threads by:
$ make; ./kvserver ... // you should see: [INFO] worker thread dequeue NULL... [INFO] worker thread dequeue NULL... [INFO] worker thread dequeue NULL... [INFO] worker thread dequeue NULL... cs3650>
- Test the dispatcher thread by:
cs3650> load ops.txt // you should see: [INFO] #ops=[4] are loaded cs3650>
Notes:
- the prompt “cs3650>” and the messages “[INFO] …” may intertwine. Why? because these are printed by different threads.
- take a look at file
ops.txt
to understand how operations are defined. Try to add one more line and thenload ops.txt
to see what will happen. (thekvserver
should say five ops are loaded.)
Section 2. Implement concurrent queue, statistics, and KV-store
In this section of Lab3 (Exercise 2/3/4), your coding style matters, which is different from prior labs. We (or a program) will read your code. If you violate the concurrency coding requirements below, we will deduct up to 20% of total lab points based on our subjective evaluation.
Requirement: monitor. In Lab3, you must implement shared states as monitors. We cover monitors in class, and here is a monitor chapter from OSTEP.
Notice that C language doesn’t have a native monitor support or object oriented programming. Hence, your are going to implement “manually constructed” monitors in C style, wherein:
- all function calls are protected by a mutex (that is, the programmer inserts those acquire()/release() on entry and exit from every procedure that works with the shared state).
- synchronization happens with condition variables whose associated mutex is the mutex that protects the functions.
- Note that you do not have to use condition variables in every Monitor implementation; it’s just that whenever you use condition variables, you must follow the above point.
Lab3 has three pieces of state shared between different threads. They are (1) request queue, (2) KV-store statistics, and (3) KV-store storage. In this section, you will implement or fix them all.
Concurrent request queue
Request queue is shared between the dispatcher thread and worker threads. The dispatcher thread will read operations (read, write, increase, or delete) from a file (e.g., ops.txt
), create new requests, and enqueue
these requests. Meanwhile, worker threads dequeue
tasks and process them concurrently.
Exercise 2 implement concurrent task queue.
- Read queue’s data structure in
queue.h
.- Read the dispatcher thread (in
kvserver.c
) and worker threads (inworker.c
) to understand how they use the queue.- Fill in
TODO
inqueue.c
to implement the queue. (Note: you should define your own mutexes and condition variables, and don’t forget to initialize them.)- When done, test your code by (1) running the server:
$ make; ./kvserver cs3650> // you should not see "[INFO] worker thread dequeue NULL..." anymore.
- and (2) running meaningful workloads:
cs3650> load ops.txt // you should see: [updated 02/21] [INFO] #ops=[4] are loaded [INFO] read key[hello] [INFO] write key[hello]=val[10] [INFO] increase key[hello]+=val[8] [INFO] read key[hello]
Note: you can check the running trace by
$ cat kvserver.trace
to see the detailed op executions. This will help you debug your kvstore.
KV-store statistics
The KV-store keeps track of the queue length and the numbers of executed operations (reads/writes/increases/deletes). You can check these statistics by running stats
in kvserver
’s console:
$ ./kvserver
cs3650> stats
// you should see something like
queued: 0
writes: 0
reads: 0
deletes: 0
increases: 0
These statistics are shared across worker threads. Current implementation in stats.c
is is not thread-safe.
Exercise 3 fix statistics tracking.
- Read
stats.h
andworkers.c
to understand how worker threads update the statistics.- Fix functions in
stats.c
, and implement them to be thread-safe.
Key-value storage
Key-value storage is the core of the KV-store. Lab3 skeleton code (kvstore.h
and kvstore.c
) provides you a naive in-memory key-value storage. In this naive in-memory key-value storage:
- keys and values are stored in two fix-sized arrays.
- a key associates to a value by using the same index number. As an example, if key “KEY0” and value
1
are paired and the key “KEY0” is stored in the first slot of key array (i.e.,keys[0]
), then the value1
should be stored in the first slot of value array (i.e.,values[0]
). - Keys are arbitrary letter-digit strings; they can be “KEY0” or “KEY-0” or “WHATSOEVER” or “xi0xw32x0sJ”; meaning, they do not have certain formats.
Exercise 4 finish the concurrent key-value storage.
- Read
kvstore.h
andworkers.c
to understand how worker threads use the KV-store.- Implement functions in
kvstore.c
for a thread-safe concurrent KV-store.- When done, test your code by (1) running the server and loading workloads:
$ make; ./kvserver cs3650> load ops.txt
- (2) checking if the stats are what we expect
cs3650> stats queued: 0 writes: 1 reads: 2 deletes: 0 increases: 1
If you have modified
ops.txt
before, you will see something different.- and (3) checking if the kv-store works as expected:
cs3650> list // you will see someting like === kv-store === [0] hello: 18
Note: the
list
will call your implementation ofkv_dump()
inkvstore.c
. So, the output will be different from above. But, for the originalops.txt
, you should see one key “hello” with either value18
or10
(depending on which op runs first, the increase or the write).
Section 3. Concurrency testing
Debugging and testing a piece of concurrent code is hard. What you will do in Exercise 5 is design your concurrent test cases.
Exercise 5 write your own test cases.
- Read
ops.txt
andparsereq()
inkvserver.c
to understand how operations are defined and parsed.- Implement at least two test cases:
- large test cases (
large.txt
): the test case must be “large”. It should have about 10K operations (meaning the file has about 10K lines). The ops must include all four operations (read/write/increase/delete). Name the test caselarge.txt
.- pressure test cases (
pressure.txt
): the test case must insert more than 1K keys. The test case must have writes and deletes. Name the test casepressure.txt
.- You want to think about what “correctness conditions” the two test cases you will check. Namely, if your kv-store finishes, how do you know it works as expected?
- You’re encouraged to implement more test cases.
- You probably don’t want to write these large test cases out manually. You will need a program (not necessarily in C) to generate these test case files.
Note: you need to check in (meaning git add
) your test cases, at least the above two, to your repo, and submit them to the gradescope.
(optional) Challenge: faster key-value storage
This challenge is optional; successfully finishing challenges will give you extra credits for labs. The credits are transferable to other labs to cover up any point losses in this or other labs. But the credits cannot transfer outside labs. Meaning, if you got all points of all labs, the extra credits won’t improve your overall grade.
As mentioned in Exercise 4, we provide you a naive KV-storage implementation, which is two arrays. Of course, there are better data structures—hash table, binary-search tree, trie, just to name a few. In this challenge, you will re-implement the key-value storage in kvstore.h
and kvstore.c
for better performance.
Several notes if you take this challenge:
we will not check your coding style in
kvstore.c
. (meaning, you do not have to follow the “monitor” requirements inkvstore.h/c
; this will overwrite the requirement of Exercise 4.)your faster KV-store will be used for the non-challenge test cases (for correctness) as well. After all, you will only submit one pair of
kvstore.h/c
files. (meaning, if your faster version has a concurrency bug, you will lose points, which is more likely to happen than the naive version.)The extra points you will get depend on other challengers. The rules are:
- The max extra credits are 20 points.
- The fastest challenger will get 20 credits. (Say its running time is
min
sec.) - The baseline (our implementation) will be 0 credit. (Say baseline runs in
max
sec.) - For other challengers whose code running time sits between the two (say
x
sec) will get extra points of(max - x) / (max - min) x 20
. For example, ifmin=1
,max=101
, and your code runsx=21
, your will get(101-21) / (101-1) x 20 = 16
. - If your implementation is slower than baseline, you get 0 extra credit.
- If your implementation crashes or stops responding, you get 0 extra credit.
For the performance evaluation, we will discard the queue and run your code directly. So you must (1) write all your KV-storage code in
kvstore.h/c
, and (2) strictly follow the key-value store APIs defined inkvstore.h
(namely,kv_read/kv_write/kv_increase/kv_delete
).It is a good habit to develop performance measurement code first then start your performance tuning.
Here are some details of the benchmark that we run to evaluate performance:
The evaluation benchmark is a mixed workload of read/write/increase/delete operations. The ratio
#reads:#writes:#increase:#deletes
is8:1:0.5:0.5
.The evaluation metric (the “running time” mentioned above) is the total running time of all operations, which only include the running time of functions
kv_read/kv_write/kv_increase/kv_delete
.We will increase
MAX_TABLE
inlab3.h
to a much larger value. So, please do not expect there will only be200
keys.
Challenge faster key-value store.
- Write down
Yes
to accept the challenge inslack.txt
.- Re-implement
kvstore.h/c
.- Don’t forget to thoroughly test your new implementation for correctness.
Finally, submit your work
Submitting consists of three steps:
- Executing this checklist:
- Fill in
~/lab3/slack.txt
with (1) your name, (2) your NUID, (3) slack hours you used, and (4) acknowledgements. - Make sure that your code builds with no warnings.
note: we will apply a 10% penalty to the compilation warnings you have. - Make sure you have added (
git add
) all files that you created (if any). - If you took the challenge, write
Yes
for challenge inslack.txt
.
- Fill in
Push your code to GitHub:
$ cd ~/lab3 $ git commit -am 'submit lab3' $ git push origin Counting objects: ... .... To ssh://github.com/NEU-CS3650-labs/lab3-<username>.git 7337116..ceed758 main -> main
- Actually submit your lab via Gradescope:
- Navigate to https://www.gradescope.com/ and click on log in.
- Select login with “School Credentials” and select “Northeastern University”.
- Enter Northeastern SSO login information and you should be able to log in to your gradescope account.
- Now, on Canvas, go to the CS3650 course and click on “Gradescope 1.3” from the left navigation bar. You would then be asked to accept the course invitation after which you can access the course on Gradescope.
- On Gradescope, select the lab/assignment you wish to submit and click on “Upload Submission”.
- You would then be asked to upload a zip file consisting of the files the lab/assignment specifies.
Note: you can either zip all files within your lab folder, or zip your lab folder whose name must start with “lab3-“ (this is supposed to be your GitHub repo name,lab3-<username>
). If you zip a folder named, for example, “mysubmit”, the Gradescope will complain. - After uploading the zip file, the autograder will evaluate your submission and based on it provide a score for your submission.
- After the manual grading process is performed by the TAs, your final score for the lab/assignment will be released.
This completes the lab.
Acknowledgments
This lab is created by Peter Desnoyers, with modifications by the 24spring CS3650 staff.