CS5600 Lab3: Concurrent Key-Value Store

In Lab3, you will write a simple concurrent Key-Value store (short as KV-store). A KV-store (sometimes called key-value database) is a data storage that saves and retrieves data by keys.

Lab3 designs a toy KV-store that

contains at most a fixed number of keys (the number is defined by MAX_TABLE in lab3.h).
the key is a string.
each key has an associated value (another string).
the KV-store has three operations: read, write, and delete.
writes can insert new key-value pairs to the KV-store.

The KV-store has two components: a server (called dbserver) and a client (called dbclient). They talk through network. In particular, the dbclient sends an operation (e.g., a write) to the dbserver; dbserver parses the operation (e.g., “oh, a write”), executes it (e.g., write the value to storage), and returns the results (e.g., “okay”); finally, dbclient receives the result and print it on screen.

You will leverage multi-threading to accelerate the KV-store server. The server has a console thread (or main thread), a network listener thread, and multiple worker threads.

Challenge: again, Lab3’s challenge is optional. In this challenge, you will try your best to improve the performance of the KV-store, which is described at the end of the lab.

Section 0: Getting started

Click the GitHub Lab3 link on Canvas homepage to create your Lab3 clone on GitHub.
Start your VM and open the VM’s terminal.
Clone Lab3 repo to VM:
```
$ cd ~
$ git clone git@github.com:NEU-CS5600-21fall/lab3-<Your-GitHub-Username>.git lab3
```
Note that the repo address git@github.com:... can be obtained by going to GitHub repo page (your cloned lab3), clicking the green “Code” button, then choose “SSH”.

Check contents:

$ cd ~/lab3
$ ls
// you should see:
Makefile   dbserver.c  kvstore.h   queue.c   slack.txt   stats.h   worker.h
dbclient.c kvstore.c   lab3.h      queue.h   stats.c     worker.c

install a required library: zlib
```
$ sudo apt install zlib1g-dev
```

Note: all the shell commands below happen within the folder ~/lab3/.

Section 1. Multi-threaded key-value store

We provide you some skeleton code for the KV-store (in file dbserver.c). The KV-store server will have (by default) six threads:

The console (main) thread. This thread reads user’s inputs and has implemented three commands.
- stats: print KV-store’s statistics
- list: dump current key-value pairs
- quit: quit KV-store server
The listener thread. It opens a TCP socket, accept incoming connections, and add tasks to a queue for the worker threads.
Four (by default) worker threads. They wait for new tasks and handle requests to write, read, or delete individual database keys.

Exercise 0 Compile and play with the skeleton code.

You can compile the current KV-store and play with it.
 $ make
 $ ./dbserver
 cs5600>         // you should see this prompt
Type in help to see all three commands, and quit by typing in quit and pressing enter.
Note: program dbserver will listen on port 5600 for TCP connections. If port 5600 has been used, the program will refuse to start.

You will use pthreads (POSIX Threads) to create threads and their mutex and condition variable implementations for synchronization. You need to know pthreads to finish Lab3. Here is a pthread tutorial.

KV-store’s server code is in file dbserver.c, in which you need to create all the threads mentioned above.

Exercise 1 create listener and worker threads.

Read dbserver.c and understand how it works. (note: you don’t have to fully understand every details.)
Search “TODO” and read comments under “Exercise 1”.
Finish the task of creating threads by using pthread library.
Test your code by:
 $ make; ./dbserver
 ...
 // you should see:
 [INFO] worker thread dequeue NULL...
 [INFO] worker thread dequeue NULL...
 [INFO] worker thread dequeue NULL...
 [INFO] worker thread dequeue NULL...
 cs5600>
Note that the prompt “cs5600>” and the messages “[INFO] worker…” may intertwine. Why? because these are printed by different threads.

Section 2. Implement concurrent queue, statistics, and KV-store

In this section of Lab3 (Exercise 2/3/4), your coding style matters, which is different from prior labs. We (or a program) will read your code. If you violate the concurrency coding requirements below, we will deduct up to 20% of total lab points based on our subjective evaluation.

Requirement I: concurrency commandments (namely, six rules). You need to re-read Mike Dahlin’s Coding Standards for Programming with Threads. You are required to follow these standards for this lab. In the real world, unclear multi-threaded code is particularly dangerous—because it is difficult or impossible to test. Moreover, if it’s not clear, then the programmer who comes after you cannot debug it, maintain it, or add new features.

Requirement II: monitor. In Lab3, you must implement shared states as monitors. We covered monitors in class (see note week5.a), and here is a monitor chapter from OSTEP.

Notice that C language doesn’t have native monitor supports or object oriented programming. Hence, your are going to implement “manually constructed” monitors in C style, wherein:

all function calls are protected by a mutex (that is, the programmer inserts those acquire()/release() on entry and exit from every procedure that works with the shared state).
synchronization happens with condition variables whose associated mutex is the mutex that protects the functions.
Note that you do not have to use condition variables in every Monitor implementation; it’s just that whenever you use condition variables, you must follow the above point.

Lab3 has three pieces of state shared between different threads. They are (1) task queue, (2) KV-store statistics, and (3) KV-store storage. In this section, you will implement or fix them all.

Concurrent task queue

Task queue is shared between the listener thread and worker threads. The listener thread will read operations (read, write, or delete) from clients through network, create new tasks, and enqueue these tasks. Meanwhile, worker threads dequeue tasks and process them concurrently.

Exercise 2 implement concurrent task queue.

Read queue’s data structure in queue.h.
Read the listener thread (in dbserver.c) and worker threads (in worker.c) to understand how queue works.
Fill in TODO in queue.c to implement the queue. (Note: you should define your own mutexes and condition variables, and don’t forget to initialize them.)
When done, test your code by (1) running the server on one terminal:
 $ make; ./dbserver
 cs5600>
 // you should not see "[INFO] worker thread dequeue NULL..." anymore.
and (2) running client on another terminal:
 $ ./dbclient --set=KEY0 ABC
 // you should see:
 val set to ABC
 ok
 $ ./dbclient --get=KEY0
 // you should see:
 READ: FAILED (X)

You can run ./dbclient --help to see its detailed usage. Also notice that even the ./dbclient --set... returns ok, your read from the same key fails. This is because you haven’t implemented the key-value storage, which is your Exercise 4.

KV-store statistics

The KV-store keeps track of the queue length and the numbers of executed operations (reads/writes/deletes/failures). You can check these statistics by running stats in dbserver’s console:

$ ./dbserver
5600> stats
// you should see something like
queued:   0
writes:   0
reads:    0
deletes:  0
failures: 0

These statistics are shared across worker threads. Current implementation in stats.c is buggy and is not thread-safe.

Exercise 3 fix statistics tracking.

Read stats.h and workers.c to understand how worker threads update the statistics.
Fix functions in stats.c, and implement them to be thread-safe.

Key-value storage

Key-value storage is the core of the KV-store. Lab3 skeleton code (kvstore.h and kvstore.c) provides you a naive in-memory key-value storage. In this naive in-memory key-value storage:

keys and values are stored in two fix-sized arrays.
a key associates to a value by using the same index number. As an example, if key “KEY0” and value “ABC” are paired and the key “KEY0” is stored in the first slot of key array (i.e., keys[0]), then the value “ABC” should be stored in the first slot of value array (i.e., values[0]).
Update 5/4: clarification Keys are arbitrary strings; they can be “KEY0” or “KEY-0” or “WHATSOEVER” or “xi,0xw32x0s>J%” (meaning, they do not have certain format).

Exercise 4 finish the concurrent key-value storage.

Read kvstore.h and workers.c to understand how worker threads use the KV-store.
Implement functions in kvstore.c for a thread-safe concurrent KV-store.
When done, test your code by (1) running the server on one terminal:
 $ make; ./dbserver
 cs5600>
and (2) running client on another terminal:
 $ ./dbclient --set=KEY0 ABC
 // you should see:
 val set to ABC
 ok
 $ ./dbclient --get=KEY0
 // you should see:
 ="ABC"

Section 3. Concurrency testing

As mentioned in class, debugging and testing a piece of concurrent code is hard. What you will do in Exercise 5 is to write your own testing code, which is supposed to be multi-threading as well.

Your testing code should: (note: these are suggestions not requirements)

use multiple threads,
issue concurrent and random operations to dbserver,
and compare read/write/delete results to see if your dbserver is correct.

We provide some helper functions for you in dbclient.c.

Exercise 5 write your own concurrency testing code.

Read exercise 5’s description in dbclient.c.
Implement your test code.

Beyond dbclient side information (which is under your control), dbserver also produces a debug trace file dbserver.trace (in the same folder as dbserver). This trace file contains all operations executed by the current running dbserver. Note that the trace file will be overwritten when running dbserver another time.

We expect you to write reasonably comprehensive test cases. We will spend some efforts on analyzing your test cases. If your test cases are too simple (based on our subjective evaluation), you might get up to 5% penalty on your Lab3 scores.

(optional) Challenge: faster key-value storage

As always, the challenge is optional; successfully finishing challenges will give you extra credits.

As mentioned in Exercise 4, we provide you a naive KV-storage implementation, which is an array (in fact, two). Of course, there are better data structures—hash table, binary-search tree, neural networks, just to name a few. In this challenge, you will re-implement the key-value storage in kvstore.h and kvstore.c for better performance.

Several notes if you took this challenge:

we will not check your coding style in kvstore.c. (meaning, you do not have to follow “six rules” and “monitor” requirements in kvstore.h/c; this will overwrite the requirement of Exercise 4.)
your faster KV-store will be used for the non-challenge test cases (mostly for correctness) as well. After all, there is only one pair of kvstore.h/c files. (meaning, if your faster version has a concurrency bug, you lose points because of that, which is more likely to happen than the naive version.)
The extra points you will get depend on other challengers. The rules are:
1. The total extra credits are 20 points.
2. The fastest challenger will get 100% of the credits. (Say its running time is min sec.)
3. The slowest challenger will get 20% of the credits. (Say baseline runs max sec.)
4. For other challengers whose code running time sits between the two (say x sec) will get extra points of (max - x) / (max - min) x 80%. For example, if min=1, max=101, and your code runs x=21, your score is 64% = (101-21) / (101-1) x 80% of the total 20 points.
For the performance evaluation, we will discard the network part and run your code directly. So you must (1) write all your KV-storage code in kvstore.h/c, and (2) strictly follow the key-value store APIs defined in kvstore.h (namely, kv_read/kv_write/kv_delete).
It is a good habit to develop performance measurement code first then start your performance tuning.

Here are some details of the benchmark that we run to evaluate performance:

The evaluation benchmark is a mixed workload of read/write/delete operations. The ratio #reads:#writes:#deletes is 8:1.5:0.5.
The evaluation metric (the “running time” mentioned above) is the total running time of all operations, which only include the running time of functions kv_read/kv_write/kv_delete.
We will increase MAX_TABLE in lab3.h. So, please do not expect 200 keys at most.

Challenge faster key-value store.

Re-implement kvstore.h/c.
Don’t forget to thoroughly test your new implementation.

Finally, submit your work

Submitting consists of three steps:

Executing this checklist:
- Fill in slack.txt with (1) your name and NUID, (2) slack hours you used, (3) if you took the challenge, and (4) acknowledge your influences.
- Make sure that your code can build without problems.
- Make sure you have added (git add) all files that you created.
- Make sure you have committed all your changes (check with git status).

Push your code to GitHub:

 $ cd ~/lab3
 $ git commit -am 'submit lab3'
 $ git push origin

 Counting objects: ...
  ....
  To ssh://github.com/NEU-CS5600-21fall/lab3-<username>.git
   7447116..deadbeef  main -> main

Actually commit your lab (with timestamp and git commit id):
1. Get the git commit id of your work. A commit id is a 40-character hexadecimal string. You can obtain the commit id for the last commit by running the command git log -1 --format=oneline.
2. Paste both your git repo address and the commit id (the hexadecimal string) to Canvas. In Canvas, there will be an assignment for this lab. You should paste the git repo address and the commit id in two lines:
```
 git@github.com:NEU-CS5600-21fall/lab3-studentid.git
 7447116c788fade33421f242b5dd8312deadbeef
```
  Notice: the repo address must start with git@github.com:... (not https://...). You can get your repo address on GitHub repo page by clicking the green “Code” button, then choose “SSH”.
3. Note: You can submit as many times as you want; we will grade the last commit id submitted to Canvas. Also, you can submit any commit id in your pushed git history; again, we will grade the commit id submitted to Canvas.

NOTE: Ground truth is what and when you submitted to Canvas.

A non-existent commit id in Canvas means that you have not submitted the lab, regardless of what you have pushed to GitHub—we will not grade it. So, please double check your submitted commit id!
The time of your submission for the purposes of tracking lateness is the timestamp on Canvas, not the timestamp on GitHub.

This completes the lab.

Acknowledgments

This lab is created by Peter Desnoyers, with modifications by the CS5600 staff.