Lab1: NEU File System

In Lab1, you will implement a simplified Unix-like file system, called fs5600.

fs5600 uses the FUSE (File system in User SpacE) library, which allows us to implement a file system in the user space. This makes the development much easier than in the kernel space. FUSE exposes a few file system function interfaces that need to be instantiated and your task is to fill in these function bodies.

There is not too much code to write (if you organize your code nicely); instead, a lot of the work of the lab is understanding the system that you’ve been given.

Challenge: In these challenges, you will addresses some of the simplifications/limitations of fs5600 and make it more “advanced”. See challenges at the end of the lab.

Section 0: Getting started

We assume you have a Linux machine (or VM). If you don’t have, please contact the instructor to get one.
Click the GitHub Lab1 link on GitHub homepage to create your Lab1 clone on GitHub.
Open a Linux terminal.
Clone Lab1 repo to your local machine:
```
$ cd ~
$ git clone git@github.com:NEU-CS7670-labs/lab1-<Your-GitHub-Username>.git lab1
```
Note that the repo address git@github.com:... can be obtained by going to GitHub repo page (your cloned lab1), clicking the green “Code” button, then choose “SSH”.

Check contents:

$ cd ~/lab1; ls

// you should see:
Makefile    disk1.in  disk2.in     diskfmt.py fs5600.c  fs5600.h  gen-disk.py
lab1fuse.c  misc.c    read-img.py  slack.txt  test1.c   test2.c

Install required packages (Ubuntu):
```
 sudo apt install check
 sudo apt install libfuse-dev
 sudo apt install zlib1g-dev
 sudo apt install python2
```
(note: if you’re not using Ubuntu but other Linux distributions, please Google and install the corresponding packages in your platform.)

Section 1: FUSE

The file system that we will build is implemented as a user-level process. This file system’s storage will be a file (which is created by us called test.img) that lives in the normal file system of your devbox. Much of your code will treat this file as if it were a disk.

This entire arrangement (file system implemented in user space with arbitrary choice of storage) is due to software called FUSE (File system in User SpacE).

In order to really understand what FUSE is doing, we need to take a brief detour to describe VFS. Linux (like Unix) has a layer of kernel software called VFS; conceptually, every file system sits below this layer, and exports a uniform interface to VFS. (You can think of any potential file system as being a “driver” for VFS; VFS asks the software below it to do things like “read”, “write”, etc.; that software fulfills these requests by interacting with a disk driver, and interpreting the contents of disk blocks.)

The purpose of this architecture is to make it relatively easy to plug a new file system into the kernel: the file system writer simply implements the interface that VFS is expecting from it, and the rest of the OS uses the interface that VFS exports to it. In this way, we obtain the usual benefits of pluggability and modularity.

FUSE is just another “VFS driver”, but it comes with a twist. Instead of FUSE implementing a disk-based file system (the usual picture), it responds to VFS’s requests by asking a user-level process (which is called a “FUSE driver”) to respond to it. So the FUSE kernel module is an adapter that speaks “fuse” to a user-level process (and you will be writing your code in this user-level process) and “VFS” to the rest of the kernel.

Meanwhile, a FUSE driver can use whatever implementation it wants. It could store its data in memory, across the network, on Jupiter, whatever. In the setup in this lab, the FUSE driver will interact with a traditional Linux file (as noted above), and pretend that this file is a sector-addressable disk.

The FUSE driver registers a set of callbacks with the FUSE system (via libfuse and ultimately the FUSE kernel module); these callbacks are things like read, write, etc. A FUSE driver is associated with a particular directory, or mount point. The concept of mounting was explained in OSTEP 39 (see 39.17). Any I/O operations requested on files and directories under this mount point are dispatched by the kernel (via VFS, the FUSE kernel module, and libfuse) to the callbacks registered by the FUSE driver.

To recap all of the above: the file system user interacts with the file system roughly in this fashion:

When the file system user, Process A, makes a request to the system, such as listing all files in a directory via ls, the ls process issues one or more system calls (stat(), read(), etc.).
The kernel hands the system call to VFS.
VFS finds that the system call is referencing a file or directory that is managed by FUSE.
VFS then dispatches the request to FUSE, which dispatches it to the corresponding FUSE driver (which is where you will write your code).
The FUSE driver handles the request by interacting with the “disk”, which is implemented as an ordinary file. The FUSE driver then responds, and the responses go back through the chain.

Lab1 FUSE Diagram

Exercise 0 the first glance of the unimplemented fs5600.

Here’s an example to show how fs5600 works with FUSE, where test.img is a disk image that the lab creators created:
 $ cd ~/lab1
 $ make          // you should see:
 cc -ggdb3 -Wall -O0   -c -o misc.o misc.c
 ...

 $ mkdir fs     // create a directory to serve as a mount point

 $ df fs        // see what file system fs is associated with
 Filesystem     1K-blocks    Used Available Use% Mounted on
 /dev/sda1        7092728 4536616   2172780  68% /

 $ ls fs        // notice, "fs" is empty

 $ ./lab1fuse -image test.img fs    // mount test.img at fs

 $ df fs        // this ends up with an error because you haven't implemented fs5600.
                // but notice that fs's file system is different.
 df: fs: Operation not supported

 $ ls fs        // similarly, "ls" reports an error as well
 ls: cannot access 'fs': Operation not supported

 $ ls -l        // you will see that fs is in an unknown state
 ...
 d????????? ? ?      ?            ?            ? fs
 ...

 $ fusermount -u fs  // now unmount fs

 $ ls fs        // and "fs" is an empty dir again
Note that in the above example, after we run lab1fuse, the kernel is actually dispatching all the readdir(), statfs() etc. calls that ls and df make to fs5600 FUSE driver. Given that you haven’t implemented these functions, fs5600 returns error messages that these operations are not supported. Later, when fusermount is run, fs5600 is unmounted from fs, and then all I/O operations under fs return to being serviced normally by the kernel.

Section 2: NEU File System (fs5600)

How you should use fs5600 descriptions: below are the design of fs5600 and a bunch of facts about fs5600. Here is how you should use them: first, you should skim all the contents before you start coding. The point is that you should have a basic understanding of how things work and an impression of what information is where. Later, when you find something unclear or uncertain, you should come back and study the corresponding part carefully. I would expect you to come back often to understand how things work.

fs5600 disk
File system format
Superblock
Block bitmap
Block allocation (with block bitmap)
Inode
File mode
Directory

fs5600 disk

The fs5600 disk is abstracted as an array of blocks. The size of each block is 4KB (4096 bytes or FS_BLOCK_SIZE). You can read/write disk by two simple interfaces:

int block_read(void *buf, int lba, int nblks);
int block_write(void *buf, int lba, int nblks);

The buf is a pointer to memory where you want to store the data from disk (for reads) or contains the data to disk (for writes). The lba is the block id (starting from 0) you want to read/write. The nblks is the number of blocks you will read/write.

File system format

The format of fs5600 is as follows:

the first block (block 0) is the superblock;
the second block (block 1) is the bitmap block;
the third block (block 2) is the root directory;
other blocks are blocks that contain either inodes or data.

   +-------+--------+----------+------------------------+
   | super | block  | root dir |     data blocks ...    |
   | block | bitmap |   inode  |                        |
   +-------+--------+----------+------------------------+
 block     0        1         2          3 ...

Superblock

The superblock is the first block in the file system, and contains the information needed to find the rest of the file system structures. The following C structure (see fs5600.h) defines the superblock:

  struct fsx_superblock {
      uint32_t magic;             /* 0x30303635 - some magic number */
      uint32_t disk_size;         /* in 4096-byte blocks */
      char pad[4088];             /* to make size = 4096 */
  };

Note that uint32_t is a standard C type found in the <stdint.h> header file, and refers to an unsigned 32-bit integer. (similarly, uint16_t, int16_t and int32_t are unsigned/signed 16-bit ints and signed 32-bit ints)

Block bitmap

This block bitmap indicates which blocks have been used and which are free. One bit represents one block: the first bit represents the first block (block 0), the second bit represents the second block (block 1), and so forth. If a bit is 0, the corresponding block is free; otherwise, the block is used.

In fs5600, the first three bits are always 1 because they are (1) the super block, (2) the block bitmap itself, and (3) the root dir inode. (the bits for blocks 0, 1 and 2 will be set to 1 when the file system is created, so you don’t have to worry about excluding them when you search for a free block.)

Block allocation (with block bitmap)

To allocate blocks from disk, you need to manipulate block bitmap.

An easy way to achieve this is to cache the block bitmap in a 4KB memory buffer. (hint: you can make that buffer a global variable, and read it in fs_init function, and sync it back to disk after each block allocation and free.)

You’re given the following functions to handle the bitmap in memory:

  int bit_test((void*)map, int i);
  void bit_set(unsigned char *map, int i);
  void bit_clear(unsigned char *map, int i);

A side note: using a single block for the bitmap means that the maximum number of disk blocks fs5600 can track is 32768 (= 4096 * 8). Each block is 4KB. Hence the maximum size of an fs5600 is 128MB (= 32768 * 4KB).

Inodes

fs5600 simplifies the standard Unix inode.

fs5600 inode

In particular, inode is defined as follows (see also fs5600.h):

  struct fs_inode {
      uint16_t uid;      /* file owner */
      uint16_t gid;      /* group */
      uint32_t mode;     /* type + permissions (see below) */
      uint32_t ctime;    /* last status change time */
      uint32_t mtime;    /* modification time */
      int32_t  size;     /* size in bytes */
      uint32_t ptrs[FS_BLOCK_SIZE/4 - 5]; /* inode = 4096 bytes */
  };

Each inode corresponds to a file or directory; in a sense the inode is that file or directory. In fs5600, inode can be uniquely identified by its inode number, which is its block id. Again, fs5600 uses block ids as inode numbers. The root directory is always found in inode 2; inode 0 (the super block) is invalid and can be used as a NULL value.

Hint: a path lookup always begins with inode 2, the root directory.

File mode

The FUSE APIs (and Linux internals in general) mash together the concept of object type (file/directory/device/symlink…) and permissions. The result is called the file mode (the attribute mode in the fs_inode above), and looks like this:

  |<-- S_IFMT --->|           |<-- user ->|<- group ->|<- world ->|
  +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
  | F | D |   |   |   |   |   | R | W | X | R | W | X | R | W | X |
  +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+

Since it has multiple 3-bit fields, it is commonly displayed in base 8 (octal). For example, permissions allowing RWX for everyone (rwxrwxrwx) are encoded as 777 (an octal number). Note that, in C, octal numbers are indicated by a leading 0, e.g., 0777, so the expression 0777 == 511 is true.

The F and D bits correspond to 0100000 and 040000, indicating if the inode is a regular file (F bit is set) or a directory (D bit is set).

A side note: since the demise of 36-bit computers in the early 70s, Unix permissions are about the only thing in the world that octal is used for.

There are a bunch of macros (in <sys/stat.h>) you can use for looking at the mode; you can see the official documentation with the command man inode, but the important ones are:

S_ISREG(m) - is it (i.e. the inode with mode m) a regular file?
S_ISDIR(m) - is it a directory?
S_IFMT - bitmap for inode type bits.
You can get just the inode type with the expression m & S_IFMT, and just the permission bits with the expression m & ~S_IFMT. (note that ~ is bitwise NOT - i.e. 0s become 1s and 1s become 0s)

Appendix: fs5600 facts

Appendix has a bunch of fs5600 facts and some implementation hints. You can use them as references.

Section 3: fs5600 implementation, Part A

Before implementing anything, run:

$ make testa
...
0%: Checks: 12, Failures: 12, Errors: 0
...

And you can see that you pass 0 test in testa.

Now, get ready to code.

Exercise 1 implement fs_init and fs_statfs.

Read fs5600.h.
Implement fs_init, initializing your fs5600. If experiencing errors, then exit(1); otherwise, return NULL.
fs_statfs in fs5600.c, reporting file system statistics. You should always return 0 (in fs5600, return 0 means okay).
After implementing these two functions, you should pass 1 test:
$ make testa
...
8%: Checks: 12, Failures: 11, Errors: 0
gradetest-1.c:284:P:read_mostly:test_statvfs:0: Passed
...

Notes on testing I

When running make testa, under the hood, you’re running unit tests in test1.c.
When failing one test case, for example,
```
test1.c:279:F:read_mostly:test_statvfs:0: Assertion 'rv == 0' failed: rv == -95, 0 == 0
```
The message describes the failure with detailed information, including:
- file (test1.c),
- line number (279),
- failing test case (read_mostly:test_statvfs)
- and why failing (Assertion rv == 0 but got rv == -95).
You can use printf or gdb to debug (take a look at Makefile to see which binary to gdb).

Exercise 2 implement fs_getattr and fs_readdir.

fs_getattr is used to get attributes of a file/directory.
fs_readdir enumerates entries in a directory (think of ls).
implement these two functions in fs5600.c, and run make testa. Now you should pass 5 tests:
$ make testa
...
41%: Checks: 12, Failures: 7, Errors: 0
you can safely ignore struct fuse_file_info *fi (also apply to later Exercises); it is a FUSE data structure, see details here.
The ctime and mtime in fs5600’s inodes are seconds since 1970/1/1 (see appendix timestamps). When implementing inode2stat(), you should fill these (ctime/mtime) into timespec’s tv_sec and leave tv_nsec to be 0.

Notes on testing II

We use a predefined disk image named test.img for make testa.
The image has known contents (see below), and you can use this information to check if your code works properly.

test.img contents:

# path uid gid mode size ctime mtime
"/", 0, 0, 040777, 4096, 1565283152, 1565283167
"/file.1k", 500, 500, 0100666, 1000, 1565283152, 1565283152
"/file.10", 500, 500, 0100666, 10, 1565283152, 1565283167
"/dir-with-long-name", 0, 0, 040777, 4096, 1565283152, 1565283167
"/dir-with-long-name/file.12k+", 0, 500, 0100666, 12289, 1565283152, 1565283167
"/dir2", 500, 500, 040777, 8192, 1565283152, 1565283167
"/dir2/twenty-seven-byte-file-name", 500, 500, 0100666, 1000, 1565283152, 1565283167
"/dir2/file.4k+", 500, 500, 0100777, 4098, 1565283152, 1565283167
"/dir3", 0, 500, 040777, 4096, 1565283152, 1565283167
"/dir3/subdir", 0, 500, 040777, 4096, 1565283152, 1565283167
"/dir3/subdir/file.4k-", 500, 500, 0100666, 4095, 1565283152, 1565283167
"/dir3/subdir/file.8k-", 500, 500, 0100666, 8190, 1565283152, 1565283167
"/dir3/subdir/file.12k", 500, 500, 0100666, 12288, 1565283152, 1565283167
"/dir3/file.12k-", 0, 500, 0100777, 12287, 1565283152, 1565283167
"/file.8k+", 500, 500, 0100666, 8195, 1565283152, 1565283167

read-img.py: This is a utility tool that prints lots of information about a disk image, which may help figure out what you messed up.
- Try python2 read-img.py test.img (note that all python scripts in Lab1 requires Python2.7).
- The “blocks used” line shows the blocks (including inodes) marked as used in the bitmap;
- the “inodes found” line shows the inodes found by recursively descending from the root.
- It’s useful for debugging some things; not useful for others.

Exercise 3 finish part A

implement fs_read, read data from a file.
implement fs_rename, rename a file.
implement fs_chmod, change file permissions.
after implementing all these functions in fs5600.c, you should pass all 12 tests:
$ make testa
...
100%: Checks: 12, Failures: 0, Errors: 0
Note that both fs_chmod and fs_rename will (obviously) modify files/dirs, hence the disk image. But you will not see the modifications when running make testa because make testa recreates the original test.img every time.

Notes on testing III

We’re using Check, a unit testing framework for C (recall sudo apt install check that you run).
You can add debugging code (e.g., the beloved printf) or more test cases in test1.c.
Below is a crash course of the Check framework. (after reading this, you should be able to understand the code in test1.c.)

Crash course: Check framework

A unit test is defined and declared with the macro START_TEST and END_TEST as follows:

    START_TEST(test_name)
    {
        code;
        code;
    }
    END_TEST

Running a unit test requires to call suite_add_tcase (see examples in main of test1.c):

    tcase_add_test(tc, test_name);

To check if something is expected, here are several useful Check functions:

ck_abort("message");      /* fail unconditionally */
ck_assert(expr);          /* fail if expression is false */
ck_assert_int_eq(i1, i2); /* check if i1 == i2; if not, alarm! */
ck_assert_int_ne/lt/le/ge/gt(i1, i2); /* not eq, less than, etc. */
ck_assert_str_eq("string1", "string2");
ck_assert_ptr_eq(ptr, NULL);
ck_assert_ptr_ne(ptr, NULL); /* assert ptr is NOT null */

Section 4: fs5600 implementation, Part B

Now, let’s run another set of tests testb and you should see:

$ make testb
...
0%: Checks: 27, Failures: 27, Errors: 0
...

Exercise 4 implement fs_create and fs_mkdir.

implement fs_create, creating a new (empty) file
fs_create errors:
bad path /a/b/c - b doesn’t exist (should return -ENOENT)
bad path /a/b/c - b isn’t directory (-ENOTDIR)
bad path /a/b/c - c exists, is file (-EEXIST)
bad path /a/b/c - c exists, is directory (-EEXIST)
too-long name (more than 27 characters) (-EINVAL)
implement fs_mkdir, creating new (empty) directory
fs_mkdir errors:
bad path /a/b/c - b doesn’t exist (-ENOENT)
bad path /a/b/c - b isn’t directory (-ENOTDIR)
bad path /a/b/c - c exists, is file (-EEXIST)
bad path /a/b/c - c exists, is directory (-EEXIST)
too-long name (more than 27 characters) (-EINVAL)
after implementing, you should pass 4 cases:
$ make testb
...
14%: Checks: 27, Failures: 23, Errors: 0
Note: we’re using a test strategy called “mocking” to deal with the fuse_get_context call. We provide a fake version of the function in our test code, and in each test we can fill in any information we want it to return.

Notes on testing IV

In make testb (testing code in test2.c), we change to an empty disk image test2.img.
You don’t have to worry about scrabbling the disk (which you certainly did in Exercise 4) because make testb rebuilds the disk every time.
- in particular, make runs $ python2 gen-disk.py -q disk2.in test2.img
To test writing logic (including create, mkdir, etc.), we have to trust your readdir and read implementations: test2.c uses your Part A implementation as the ground truth.
- That’s the primary reason why the assignment is explicitly split into two parts.
- That said, if you have a bug in Part A, then you can get confusing results…
- …but you shouldn’t worry too much because we have a lot of tests in testa…
- …however, tests cannot guarantee absence of bugs…
- …but there is not much we can do about it.

A side note: we do know one way to prove absence of bugs, namely verificaition. Check out FSCQ, a formally certified crash-proof file system.

Exercise 5 implement fs_unlink and fs_rmdir.

implement fs_unlink, removing a file
fs_unlink errors:
bad path /a/b/c - b doesn’t exist (-ENOENT)
bad path /a/b/c - b isn’t directory (-ENOTDIR)
bad path /a/b/c - c doesn’t exist (-ENOENT)
bad path /a/b/c - c is directory (-EISDIR)
implement fs_rmdir, removing a directory
fs_rmdir errors:
bad path /a/b/c - b doesn’t exist (-ENOENT)
bad path /a/b/c - b isn’t directory (-ENOTDIR)
bad path /a/b/c - c doesn’t exist (-ENOENT)
bad path /a/b/c - c is file (-ENOTDIR)
directory not empty (-ENOTEMPTY)
after implementing, you should pass another 10 cases:
$ make testb
...
51%: Checks: 27, Failures: 13, Errors: 0

Exercise 6 finish Part B.

implement fs_write, writing to a file
implement fs_truncate, deleting the contents of a file
implement fs_utime, changing access and modification times
after implementing, you should pass all cases:
$ make testb
...
100%: Checks: 27, Failures: 0, Errors: 0

Put it all together—mount fs5600 and use it!

After finishing Part A and Part B, your fs5600 is now usable, namely you can actually use it!

Remember the first glance of fs5600 (the Exercise 0)? Now let’s redo it.

Exercise 7 (or 0+) rerun Exercise 0.

Let’s revisit Exercise 0:

 $ cd ~/lab1
 $ make          // you should see:
 cc -ggdb3 -Wall -O0   -c -o misc.o misc.c
 ...

 $ mkdir fs     // create a directory to serve as a mount point

 $ df fs        // see what file system fs is associated with
 Filesystem     1K-blocks    Used Available Use% Mounted on
 /dev/sda1        7092728 4536616   2172780  68% /

 $ ls fs        // notice, 'fs' is empty

 $ ./lab1fuse -image test.img fs    // mount test.img at fs

 $ df fs        // below you should see the information of your fs5600.
                // this indicates that fs5600 is working.
 Filesystem     1K-blocks  Used Available Use% Mounted on
 lab1fuse            1592   172      1420  11% /home/ubuntu/lab1/fs

 $ ls fs        // you will see the files in test.img
 dir-with-long-name  dir2  dir3  file.10  file.1k  file.8k+

 $ ls -l        // you will see that fs is in an valid state
 ...
 drwxrwxrwx 1 root   root      4096 Aug  8  2019 fs
 ...

 $ cd fs
 // do whatever you want!
 // try "mkdir", "touch", "cat", "ls", etc.

 // you can umount your fs5600 by:
 $ fusermount -u fs

Now, you can actually use your file system for day-to-day work! How exciting!!

Appendix: fs5600 facts

[A] Lab1 files and their meanings:

fs5600.h: fs5600 data structure definitions
fs5600.c: the core functions of fs5600 (your code here)
test1.c, test2.c: test cases
misc.c, lab1fuse.c: support code
gen-disk.py, disk1.in, disk2.in: scripts to generates disk images
read-img.py: a utility tool to print image information

[B] Your jobs:

you will implement all the functionalities of fs5600, which are classified into two parts.

In Part A, you will need to implement the following functions in fs5600.c:

fs_init - constructor (i.e. put your init code here)
fs_statfs - report file system statistics
fs_getattr - get attributes of a file/directory
fs_readdir - enumerate entries in a directory
fs_read - read data from a file
fs_rename - rename a file
fs_chmod - change file permissions

In Part B, you will need to implement these functions:

fs_create - create a new (empty) file
fs_mkdir - create new (empty) directory
fs_unlink - remove a file
fs_rmdir - remove a directory
fs_write - write to a file
fs_truncate - delete the contents of a file
fs_utime - change access and modification times

[C] mode

File type constants (see also man 2 stat)

S_IFMT - mask for type bits
S_IFREG - regular file
S_IFDIR - directory

A lot of times it’s easier to use:

S_ISDIR(mode) - true if it’s a directory
S_ISREG(mode) - true if it’s a regular file

[D] error code

When encountering errors, you need to return error codes. The comments in fs5600.c will tell you which errors are ought to return from which methods. Here is a list of error codes that you will use:

ENOENT - file/dir not found
EIO - if block_read or block_write fail
ENOMEM - out of memory
ENOTDIR - not a directory
EISDIR - is a directory
EINVAL - invalid parameter (see comments in fs5600.c)
EOPNOTSUPP - not implemented yet

Note that you can use strerror(e) to get a string representation of an error code. Note: you’ll need to use the positive error code for strerror() (e.g., ENOENT), whereas you’ll need to return the negative version in code (e.g., return -ENOENT;).

[E] fs5600 design choices

Here are the design choices that fs5600 made:

fs5600 chooses to use a single block (block 1) as the block bitmap, which results in the maximum file system size of 128MB.
fs5600 uses a block (4KB) as an inode.
fs5600 puts all block pointers in the inode. An inode can hold 1019 32-bit block pointers (see fs_inode in fs5600.h), for a max file size of ~4MB.
Directories are not nested more than 10 times. For example, there is no /a/b/c/d/e/f/g/h/i/j/k, which have a directory depth of 11. (note: the test and grading scripts will never create directories nested that deep.)
Directories only have one data block (or only inode.ptrs[0] contains valid block ids).
fs5600 does not support links.
rename is only used within the same directory, for example,rename("/dir/f1", "/dir/f2"), and rename("/dir1/f1", "/dir2/f2") is invalid.
truncate is only ever called with len=0, meaning deleting all the data in the file.

[F] implementation hints

1. reuse code. There will be a lot of code sharing between different fs_* functions. You should factor out the code that is used multiple times. It’s bad to duplicate the same piece of code in multiple functions (sometimes known as “copy-paste bugs”; see a research paper about this topic here). An obvious drawback is that you’ll find it hard to fix bugs—fixing one bug in some of the copies does not fix the same bug in other copies!

2. performance. Don’t worry about performance. The simplest possible implementation is okay, but an incorrect implementation is NOT okay.

3. copy paths. FUSE passes a path argument to your functions as const char *, instead of char *. It means that you cannot modify the path and you need to copy the string before operating on it (e.g., using strtok). Typically, people use strdup to duplicate a string.

      char *your_path = strdup(path);
      ...         // do whatever to your_path
      free(your_path);

4. split paths. When walking the hierarchical structure of fs5600, you need to split paths (e.g., /a/b/c) to tokens (e.g., a, b, and c). Here are some hints:

strtok is useful (what is this? man strtok).
you can safely assume that the number of tokens is <=10 because the maximum depth of nested dirs is 10.
you can safely assume that the length of each token is <=27 because the name of a file is stored in char name[28] (in fs_dirent).

5. path translation Assume you’ve split your path, and you have a count (int pathc) and an array of strings (char **pathv). As an example, for a path /home/cs5600, you will have pathc=2, pathv={"home","cs5600"}.

To find the file pointed by /home/cs5600, you need to translate this path (i.e., pathc and pathv) into an inode number. You should:

go to the root dir (the inode in block 2)
try to find dir “home” (i.e., pathv[0]) in the root dir
then, fetch the dir “/home”’s inode
try to find file “cs5600” (i.e., pathsv[1]) in the dir “/home”
finally, return the inode number of the file “cs5600”, or -ENOENT/-ENOTDIR if there was an error.

6. uid and gid. Here is how fs5600 gets uid and gid:

    struct fuse_context *ctx = fuse_get_context();
    uint16_t uid = ctx->uid;
    uint16_t gid = ctx->gid;

Since we’re not running FUSE as root, you’re only going to see your own uid and gid.

7. timestamps. You can get the timestamp in the proper format with time(NULL). Although the function returns a 64-byte integer, you can assign it to the unsigned 32-bit timestamps in the inode without worrying about overflow until year 2038 (unfortunately, spring 2038 will meet the notorious year 2038 problem, or Y2K38 superbug).

Challenge: learned fs5600

Below is an exploratory task that we do not have a canonical solution.

Challenge The current fs5600 is access pattern agnostic—no matter what application uses fs5600, the fs operations work exactly the same. If we assume some prior knowledge of the application, can we do better than the current fs5600?

In fact, such prior knowledge is quite common in practice. For example, files created by PyTorch to store training data will likely be accessed periodically and sequentially because the files are read once (sequentially) during every epoch. Can you propose a learned fs5600 that behaves differently for different workloads and apps?

A basic idea is to trace fs behaviors of an app (what does behavior mean here? many choices; you should decide) running on fs5600 by instrumenting fs5600. Then, train a neural network to learn access patterns of this app. (What are the inputs and outputs of the NN? You need to design.) Finally, apply the trained network to the fs5600 such that the fs operations are optimized for this app. (What are these optimizations? and what’s the NN’s role during running learned fs5600? Think about it.) Give it a try with your fs5600 code base.

This completes the lab.

Acknowledgments

This lab is created by Peter Desnoyers, with modifications by the CS5600 21fall staff. FUSE introduction is borrowed from Mike Walfish’s cs202 lab instructions.

Lab1: NEU File System

Section 0: Getting started

Section 1: FUSE

Section 2: NEU File System (fs5600)

Contents

fs5600 disk

File system format

Superblock

Block bitmap

Block allocation (with block bitmap)

Inodes

File mode

Directory

Appendix: fs5600 facts

Section 3: fs5600 implementation, Part A

Notes on testing I

Notes on testing II

Notes on testing III

Crash course: Check framework

Section 4: fs5600 implementation, Part B

Notes on testing IV

Put it all together—mount fs5600 and use it!

Appendix: fs5600 facts

[A] Lab1 files and their meanings:

[B] Your jobs:

[C] mode

[D] error code

[E] fs5600 design choices

[F] implementation hints

Challenge: learned fs5600

Acknowledgments