Link Search Menu Expand Document

Lab4: CS5600 File System

In Lab4, you will implement a simplified Unix-like file system, called fs5600 (CS5600 File System).

fs5600 uses the FUSE (File system in User SpacE) library, which allows us to implement a file system in the user space. This makes the development much easier than in the kernel space. FUSE exposes a few file system function interfaces that need to be instantiated and your task is to fill in these function bodies.

As always, there is not too much code to write (if you organize your code nicely); instead, a lot of the work of the lab is understanding the system that you’ve been given.

Challenge: again, Lab4’s challenges are optional. In these challenges, you will addresses some of the simplifications/limitations of fs5600 and make it more “advanced”. See challenges at the end of the lab.

Section 0: Getting started

  1. Click the GitHub Lab4 link on Canvas homepage to create your Lab4 clone on GitHub.
  2. Start your VM and open the VM’s terminal.
  3. Clone Lab4 repo to VM:
    $ cd ~
    $ git clone git@github.com:NEU-CS5600-21fall/lab4-<Your-GitHub-Username>.git lab4
    

    Note that the repo address git@github.com:... can be obtained by going to GitHub repo page (your cloned lab4), clicking the green “Code” button, then choose “SSH”.

    • Why “21fall” in the repo address? This is an accident. See why in a Piazza question here.
  4. Check contents:
    $ cd ~/lab4; ls
    
    // you should see:
    Makefile    disk1.in  disk2.in     diskfmt.py fs5600.c  fs5600.h  gen-disk.py
    lab4fuse.c  misc.c    read-img.py  slack.txt  test1.c   test2.c
    
  5. Install required packages (Ubuntu):
     sudo apt install check
     sudo apt install libfuse-dev
     sudo apt install zlib1g-dev
     sudo apt install python2
    

    (note: if you’re not using Ubuntu but other Linux distributions, please Google and install the corresponding packages in your platform.)

Section 1: FUSE

The file system that we will build is implemented as a user-level process. This file system’s storage will be a file (which is created by us called test.img) that lives in the normal file system of your devbox. Much of your code will treat this file as if it were a disk.

This entire arrangement (file system implemented in user space with arbitrary choice of storage) is due to software called FUSE (File system in User SpacE).

In order to really understand what FUSE is doing, we need to take a brief detour to describe VFS. Linux (like Unix) has a layer of kernel software called VFS; conceptually, every file system sits below this layer, and exports a uniform interface to VFS. (You can think of any potential file system as being a “driver” for VFS; VFS asks the software below it to do things like “read”, “write”, etc.; that software fulfills these requests by interacting with a disk driver, and interpreting the contents of disk blocks.)

The purpose of this architecture is to make it relatively easy to plug a new file system into the kernel: the file system writer simply implements the interface that VFS is expecting from it, and the rest of the OS uses the interface that VFS exports to it. In this way, we obtain the usual benefits of pluggability and modularity.

FUSE is just another “VFS driver”, but it comes with a twist. Instead of FUSE implementing a disk-based file system (the usual picture), it responds to VFS’s requests by asking a user-level process (which is called a “FUSE driver”) to respond to it. So the FUSE kernel module is an adapter that speaks “fuse” to a user-level process (and you will be writing your code in this user-level process) and “VFS” to the rest of the kernel.

Meanwhile, a FUSE driver can use whatever implementation it wants. It could store its data in memory, across the network, on Jupiter, whatever. In the setup in this lab, the FUSE driver will interact with a traditional Linux file (as noted above), and pretend that this file is a sector-addressable disk.

The FUSE driver registers a set of callbacks with the FUSE system (via libfuse and ultimately the FUSE kernel module); these callbacks are things like read, write, etc. A FUSE driver is associated with a particular directory, or mount point. The concept of mounting was explained in OSTEP 39 (see 39.17). Any I/O operations requested on files and directories under this mount point are dispatched by the kernel (via VFS, the FUSE kernel module, and libfuse) to the callbacks registered by the FUSE driver.

To recap all of the above: the file system user interacts with the file system roughly in this fashion:

  1. When the file system user, Process A, makes a request to the system, such as listing all files in a directory via ls, the ls process issues one or more system calls (stat(), read(), etc.).

  2. The kernel hands the system call to VFS.

  3. VFS finds that the system call is referencing a file or directory that is managed by FUSE.

  4. VFS then dispatches the request to FUSE, which dispatches it to the corresponding FUSE driver (which is where you will write your code).

  5. The FUSE driver handles the request by interacting with the “disk”, which is implemented as an ordinary file. The FUSE driver then responds, and the responses go back through the chain.

Lab4 FUSE Diagram

Exercise 0 the first glance of the unimplemented fs5600.

Here’s an example to show how fs5600 works with FUSE, where test.img is a disk image that CS5600 staff created:

 $ cd ~/lab4
 $ make          // you should see:
 cc -ggdb3 -Wall -O0   -c -o misc.o misc.c
 ...

 $ mkdir fs     // create a directory to serve as a mount point

 $ df fs        // see what file system fs is associated with
 Filesystem     1K-blocks    Used Available Use% Mounted on
 /dev/sda1        7092728 4536616   2172780  68% /

 $ ls fs        // notice, "fs" is empty

 $ ./lab4fuse -image test.img fs    // mount test.img at fs

 $ df fs        // this ends up with an error because you haven't implemented fs5600.
                // but notice that fs's file system is different.
 df: fs: Operation not supported

 $ ls fs        // similarly, "ls" reports an error as well
 ls: cannot access 'fs': Operation not supported

 $ ls -l        // you will see that fs is in an unknown state
 ...
 d????????? ? ?      ?            ?            ? fs
 ...

 $ fusermount -u fs  // now unmount fs

 $ ls fs        // and "fs" is an empty dir again

Note that in the above example, after we run lab4fuse, the kernel is actually dispatching all the readdir(), statfs() etc. calls that ls and df make to fs5600 FUSE driver. Given that you haven’t implemented these functions, fs5600 returns error messages that these operations are not supported. Later, when fusermount is run, fs5600 is unmounted from fs, and then all I/O operations under fs return to being serviced normally by the kernel.

Section 2: CS5600 File System (fs5600)

How you should use fs5600 descriptions: below are the design of fs5600 (we will study fs5600 in class in Week13 as well) and a bunch of facts about fs5600. Here is how you should use them: first, you should skim all the contents before you start coding. The point is that you should have a basic understanding of how things work and an impression of what information is where. Later, when you find something unclear or uncertain, you should come back and study the corresponding part carefully. I would expect you to come back often to understand how things work.

Contents

fs5600 disk

The fs5600 disk is abstracted as an array of blocks. The size of each block is 4KB (4096 bytes or FS_BLOCK_SIZE). You can read/write disk by two simple interfaces:

int block_read(void *buf, int lba, int nblks);
int block_write(void *buf, int lba, int nblks);

The buf is a pointer to memory where you want to store the data from disk (for reads) or contains the data to disk (for writes). The lba is the block id (starting from 0) you want to read/write. The nblks is the number of blocks you will read/write.

File system format

The format of fs5600 is as follows:

  • the first block (block 0) is the superblock;
  • the second block (block 1) is the bitmap block;
  • the third block (block 2) is the root directory;
  • other blocks are blocks that contain either inodes or data.
   +-------+--------+----------+------------------------+
   | super | block  | root dir |     data blocks ...    |
   | block | bitmap |   inode  |                        |
   +-------+--------+----------+------------------------+
 block     0        1         2          3 ...

Superblock

The superblock is the first block in the file system, and contains the information needed to find the rest of the file system structures. The following C structure (see fs5600.h) defines the superblock:

  struct fsx_superblock {
      uint32_t magic;             /* 0x30303635 - some magic number */
      uint32_t disk_size;         /* in 4096-byte blocks */
      char pad[4088];             /* to make size = 4096 */
  };

Note that uint32_t is a standard C type found in the <stdint.h> header file, and refers to an unsigned 32-bit integer. (similarly, uint16_t, int16_t and int32_t are unsigned/signed 16-bit ints and signed 32-bit ints)

Block bitmap

This block bitmap indicates which blocks have been used and which are free. One bit represents one block: the first bit represents the first block (block 0), the second bit represents the second block (block 1), and so forth. If a bit is 0, the corresponding block is free; otherwise, the block is used.

In fs5600, the first three bits are always 1 because they are (1) the super block, (2) the block bitmap itself, and (3) the root dir inode. (the bits for blocks 0, 1 and 2 will be set to 1 when the file system is created, so you don’t have to worry about excluding them when you search for a free block.)

Block allocation (with block bitmap)

To allocate blocks from disk, you need to manipulate block bitmap.

An easy way to achieve this is to cache the block bitmap in a 4KB memory buffer. (hint: you can make that buffer a global variable, and read it in fs_init function, and sync it back to disk after each block allocation and free.)

You’re given the following functions to handle the bitmap in memory:

  int bit_test((void*)map, int i);
  void bit_set(unsigned char *map, int i);
  void bit_clear(unsigned char *map, int i);

A side note: using a single block for the bitmap means that the maximum number of disk blocks fs5600 can track is 32768 (= 4096 * 8). Each block is 4KB. Hence the maximum size of an fs5600 is 128MB (= 32768 * 4KB).

Inodes

fs5600 simplifies the standard Unix inode.

fs5600 inode

In particular, inode is defined as follows (see also fs5600.h):

  struct fs_inode {
      uint16_t uid;      /* file owner */
      uint16_t gid;      /* group */
      uint32_t mode;     /* type + permissions (see below) */
      uint32_t ctime;    /* last status change time */
      uint32_t mtime;    /* modification time */
      int32_t  size;     /* size in bytes */
      uint32_t ptrs[FS_BLOCK_SIZE/4 - 5]; /* inode = 4096 bytes */
  };

Each inode corresponds to a file or directory; in a sense the inode is that file or directory. In fs5600, inode can be uniquely identified by its inode number, which is its block id. Again, fs5600 uses block ids as inode numbers. The root directory is always found in inode 2; inode 0 (the super block) is invalid and can be used as a NULL value.

Hint: a path lookup always begins with inode 2, the root directory.

File mode

The FUSE APIs (and Linux internals in general) mash together the concept of object type (file/directory/device/symlink…) and permissions. The result is called the file mode (the attribute mode in the fs_inode above), and looks like this:

  |<-- S_IFMT --->|           |<-- user ->|<- group ->|<- world ->|
  +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
  | F | D |   |   |   |   |   | R | W | X | R | W | X | R | W | X |
  +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+

Since it has multiple 3-bit fields, it is commonly displayed in base 8 (octal). For example, permissions allowing RWX for everyone (rwxrwxrwx) are encoded as 777 (an octal number). Note that, in C, octal numbers are indicated by a leading 0, e.g., 0777, so the expression 0777 == 511 is true.

The F and D bits correspond to 0100000 and 040000, indicating if the inode is a regular file (F bit is set) or a directory (D bit is set).

A side note: since the demise of 36-bit computers in the early 70s, Unix permissions are about the only thing in the world that octal is used for.

There are a bunch of macros (in <sys/stat.h>) you can use for looking at the mode; you can see the official documentation with the command man inode, but the important ones are:

  • S_ISREG(m) - is it (i.e. the inode with mode m) a regular file?
  • S_ISDIR(m) - is it a directory?
  • S_IFMT - bitmap for inode type bits.
    You can get just the inode type with the expression m & S_IFMT, and just the permission bits with the expression m & ~S_IFMT. (note that ~ is bitwise NOT - i.e. 0s become 1s and 1s become 0s)

Directory

Directory is a special file that holds an array of directory entries (fs_dirent).

fs5600 directory

Each directory entry maps a human-readable name (a string) to an inode number (again, an inode number is a block id). It is defined as follows (see also fs5600.h):

  struct fs_dirent {
      uint32_t valid : 1;  /* the first bit of uin32_t */
      uint32_t inode : 31; /* the next 31 bits of uint32_t */
      char name[28];       /* with trailing NULL */
  };

Each fs_dirent is 32 bytes, giving 128 (=4096/32) directory entries in one dir. Why? This is a design choice of fs5600: fs5600 directories always have only one data block, meaning only the first pointer (ptrs[0]) in a dir’s inode contains a valid block id; others are invalid (ptrs[i] == 0 where i>0). That said, fs5600’s dir has one data block, which is 4KB, and one direntry is 32B, hence a dir only supports a maximum of 128 files.

A directory contains (name, inode) pairs, which capture by fs_dirent. In particular,

  • the valid bit indicates if directory entries are being used (1 means yes; 0 means no).
  • the inode contains a block id on disk that is where the file’s inode locates. Note: in fs5600, an inode is a block (4KB in size) and a inode number is the block id on disk.
  • name is the file name. The maximum name length is 27 bytes, allowing entries to always have a terminating \0 so you can use string functions (e.g., strcmp, strdup) without any complications.

Appendix: fs5600 facts

Appendix has a bunch of fs5600 facts and some implementation hints. You can use them as references.

Section 3: fs5600 implementation, Part A

Before implementing anything, run:

$ make testa
...
0%: Checks: 12, Failures: 12, Errors: 0
...

And you can see that you pass 0 test in testa.

Now, get ready to code.

Exercise 1 implement fs_init and fs_statfs.

  • Read fs5600.h.
  • Implement fs_init, initializing your fs5600. If experiencing errors, then exit(1); otherwise, return NULL.
  • fs_statfs in fs5600.c, reporting file system statistics. You should always return 0 (in fs5600, return 0 means okay).
  • After implementing these two functions, you should pass 1 test:
    $ make testa
    ...
    8%: Checks: 12, Failures: 11, Errors: 0
    gradetest-1.c:284:P:read_mostly:test_statvfs:0: Passed
    ...
    

Notes on testing I

  • When running make testa, under the hood, you’re running unit tests in test1.c.
  • When failing one test case, for example,
    test1.c:279:F:read_mostly:test_statvfs:0: Assertion 'rv == 0' failed: rv == -95, 0 == 0
    

    The message describes the failure with detailed information, including:

    • file (test1.c),
    • line number (279),
    • failing test case (read_mostly:test_statvfs)
    • and why failing (Assertion rv == 0 but got rv == -95).
  • You can use printf or gdb to debug (take a look at Makefile to see which binary to gdb).


Exercise 2 implement fs_getattr and fs_readdir.

  • fs_getattr is used to get attributes of a file/directory.
  • fs_readdir enumerates entries in a directory (think of ls).
  • implement these two functions in fs5600.c, and run make testa. Now you should pass 5 tests:
    $ make testa
    ...
    41%: Checks: 12, Failures: 7, Errors: 0
    
  • you can ignore struct fuse_file_info *fi (also apply to later Exercises); it is a FUSE data structure, see details here.
  • clarify timespec (04/13) The ctime and mtime in fs5600’s inodes are seconds since 1970/1/1 (see appendix timestamps). When implementing inode2stat(), you should fill these (ctime/mtime) into timespec’s tv_sec and leave tv_nsec to be 0. (please ignore what I said in class today about milliseconds, which is incorrect; apologize.)

Notes on testing II

  • We use a predefined disk image named test.img for make testa.
  • The image has known contents (see below), and you can use this information to check if your code works properly.
  • test.img contents:
    # path uid gid mode size ctime mtime
    "/", 0, 0, 040777, 4096, 1565283152, 1565283167
    "/file.1k", 500, 500, 0100666, 1000, 1565283152, 1565283152
    "/file.10", 500, 500, 0100666, 10, 1565283152, 1565283167
    "/dir-with-long-name", 0, 0, 040777, 4096, 1565283152, 1565283167
    "/dir-with-long-name/file.12k+", 0, 500, 0100666, 12289, 1565283152, 1565283167
    "/dir2", 500, 500, 040777, 8192, 1565283152, 1565283167
    "/dir2/twenty-seven-byte-file-name", 500, 500, 0100666, 1000, 1565283152, 1565283167
    "/dir2/file.4k+", 500, 500, 0100777, 4098, 1565283152, 1565283167
    "/dir3", 0, 500, 040777, 4096, 1565283152, 1565283167
    "/dir3/subdir", 0, 500, 040777, 4096, 1565283152, 1565283167
    "/dir3/subdir/file.4k-", 500, 500, 0100666, 4095, 1565283152, 1565283167
    "/dir3/subdir/file.8k-", 500, 500, 0100666, 8190, 1565283152, 1565283167
    "/dir3/subdir/file.12k", 500, 500, 0100666, 12288, 1565283152, 1565283167
    "/dir3/file.12k-", 0, 500, 0100777, 12287, 1565283152, 1565283167
    "/file.8k+", 500, 500, 0100666, 8195, 1565283152, 1565283167
    
  • read-img.py: This is a utility tool that prints lots of information about a disk image, which may help figure out what you messed up.
    • Try python2 read-img.py test.img (note that all python scripts in Lab4 requires Python2.7).
    • The “blocks used” line shows the blocks (including inodes) marked as used in the bitmap;
    • the “inodes found” line shows the inodes found by recursively descending from the root.
    • It’s useful for debugging some things; not useful for others.


Exercise 3 finish part A

  • implement fs_read, read data from a file.
  • implement fs_rename, rename a file.
  • implement fs_chmod, change file permissions.
  • after implementing all these functions in fs5600.c, you should pass all 12 tests:
    $ make testa
    ...
    100%: Checks: 12, Failures: 0, Errors: 0
    
  • Note that both fs_chmod and fs_rename will (obviously) modify files/dirs, hence the disk image. But you will not see the modifications when running make testa because make testa recreates the original test.img every time.

Notes on testing III

  • We’re using Check, a unit testing framework for C (recall sudo apt install check that you run).
  • You can add debugging code (e.g., the beloved printf) or more test cases in test1.c.
  • Below is a crash course of the Check framework. (after reading this, you should be able to understand the code in test1.c.)

Crash course: Check framework

A unit test is defined and declared with the macro START_TEST and END_TEST as follows:

    START_TEST(test_name)
    {
        code;
        code;
    }
    END_TEST

Running a unit test requires to call suite_add_tcase (see examples in main of test1.c):

    tcase_add_test(tc, test_name);

To check if something is expected, here are several useful Check functions:

ck_abort("message");      /* fail unconditionally */
ck_assert(expr);          /* fail if expression is false */
ck_assert_int_eq(i1, i2); /* check if i1 == i2; if not, alarm! */
ck_assert_int_ne/lt/le/ge/gt(i1, i2); /* not eq, less than, etc. */
ck_assert_str_eq("string1", "string2");
ck_assert_ptr_eq(ptr, NULL);
ck_assert_ptr_ne(ptr, NULL); /* assert ptr is NOT null */

Section 4: fs5600 implementation, Part B

Now, let’s run another set of tests testb and you should see:

$ make testb
...
0%: Checks: 27, Failures: 27, Errors: 0
...

Exercise 4 implement fs_create and fs_mkdir.

  • implement fs_create, creating a new (empty) file
  • fs_create errors:
    • bad path /a/b/c - b doesn’t exist (should return -ENOENT)
    • bad path /a/b/c - b isn’t directory (-ENOTDIR)
    • bad path /a/b/c - c exists, is file (-EEXIST)
    • bad path /a/b/c - c exists, is directory (-EEXIST)
    • too-long name (more than 27 characters) (-EINVAL)
  • implement fs_mkdir, creating new (empty) directory
  • fs_mkdir errors:
    • bad path /a/b/c - b doesn’t exist (-ENOENT)
    • bad path /a/b/c - b isn’t directory (-ENOTDIR)
    • bad path /a/b/c - c exists, is file (-EEXIST)
    • bad path /a/b/c - c exists, is directory (-EEXIST)
    • too-long name (more than 27 characters) (-EINVAL)
  • after implementing, you should pass 4 cases:
    $ make testb
    ...
    14%: Checks: 27, Failures: 23, Errors: 0
    
  • Note: we’re using a test strategy called “mocking” to deal with the fuse_get_context call. We provide a fake version of the function in our test code, and in each test we can fill in any information we want it to return.

Notes on testing IV

  • In make testb (testing code in test2.c), we change to an empty disk image test2.img.
  • You don’t have to worry about scrabbling the disk (which you certainly did in Exercise 4) because make testb rebuilds the disk every time.
    • in particular, make runs $ python2 gen-disk.py -q disk2.in test2.img
  • To test writing logic (including create, mkdir, etc.), we have to trust your readdir and read implementations: test2.c uses your Part A implementation as the ground truth.
    • That’s the primary reason why the assignment is explicitly split into two parts.
    • That said, if you have a bug in Part A, then you can get confusing results…
    • …but you shouldn’t worry too much because we have a lot of tests in testa
    • …however, tests cannot guarantee absence of bugs…
    • …but there is not much we can do about it.

A side note: we do know one way to prove absence of bugs, namely verificaition. Check out FSCQ, a formally certified crash-proof file system.


Exercise 5 implement fs_unlink and fs_rmdir.

  • implement fs_unlink, removing a file
  • fs_unlink errors:
    • bad path /a/b/c - b doesn’t exist (-ENOENT)
    • bad path /a/b/c - b isn’t directory (-ENOTDIR)
    • bad path /a/b/c - c doesn’t exist (-ENOENT)
    • bad path /a/b/c - c is directory (-EISDIR)
  • implement fs_rmdir, removing a directory
  • fs_rmdir errors:
    • bad path /a/b/c - b doesn’t exist (-ENOENT)
    • bad path /a/b/c - b isn’t directory (-ENOTDIR)
    • bad path /a/b/c - c doesn’t exist (-ENOENT)
    • bad path /a/b/c - c is file (-ENOTDIR)
    • directory not empty (-ENOTEMPTY)
  • after implementing, you should pass another 10 cases:
    $ make testb
    ...
    51%: Checks: 27, Failures: 13, Errors: 0
    

Exercise 6 finish Part B.

  • implement fs_write, writing to a file
  • implement fs_truncate, deleting the contents of a file
  • implement fs_utime, changing access and modification times
  • after implementing, you should pass all cases:
    $ make testb
    ...
    100%: Checks: 27, Failures: 0, Errors: 0
    

Put it all together—mount fs5600 and use it!

After finishing Part A and Part B, your fs5600 is now usable, namely you can actually use it!

Remember the first glance of fs5600 (the Exercise 0)? Now let’s redo it.

Exercise 7 (or 0+) rerun Exercise 0.

Let’s revisit Exercise 0:

 $ cd ~/lab4
 $ make          // you should see:
 cc -ggdb3 -Wall -O0   -c -o misc.o misc.c
 ...

 $ mkdir fs     // create a directory to serve as a mount point

 $ df fs        // see what file system fs is associated with
 Filesystem     1K-blocks    Used Available Use% Mounted on
 /dev/sda1        7092728 4536616   2172780  68% /

 $ ls fs        // notice, 'fs' is empty

 $ ./lab4fuse -image test.img fs    // mount test.img at fs

 $ df fs        // below you should see the information of your fs5600.
                // this indicates that fs5600 is working.
 Filesystem     1K-blocks  Used Available Use% Mounted on
 lab4fuse            1592   172      1420  11% /home/ubuntu/lab4/fs

 $ ls fs        // you will see the files in test.img
 dir-with-long-name  dir2  dir3  file.10  file.1k  file.8k+

 $ ls -l        // you will see that fs is in an valid state
 ...
 drwxrwxrwx 1 root   root      4096 Aug  8  2019 fs
 ...

 $ cd fs
 // do whatever you want!
 // try "mkdir", "touch", "cat", "ls", etc.

 // you can umount your fs5600 by:
 $ fusermount -u fs

Now, you can actually use your file system for day-to-day work! How exciting!!

Appendix: fs5600 facts

[A] Lab4 files and their meanings:

  • fs5600.h: fs5600 data structure definitions
  • fs5600.c: the core functions of fs5600 (your code here)
  • test1.c, test2.c: test cases
  • misc.c, lab4fuse.c: support code
  • gen-disk.py, disk1.in, disk2.in: scripts to generates disk images
  • read-img.py: a utility tool to print image information

[B] Your jobs:

you will implement all the functionalities of fs5600, which are classified into two parts.

In Part A, you will need to implement the following functions in fs5600.c:

  • fs_init - constructor (i.e. put your init code here)
  • fs_statfs - report file system statistics
  • fs_getattr - get attributes of a file/directory
  • fs_readdir - enumerate entries in a directory
  • fs_read - read data from a file
  • fs_rename - rename a file
  • fs_chmod - change file permissions

In Part B, you will need to implement these functions:

  • fs_create - create a new (empty) file
  • fs_mkdir - create new (empty) directory
  • fs_unlink - remove a file
  • fs_rmdir - remove a directory
  • fs_write - write to a file
  • fs_truncate - delete the contents of a file
  • fs_utime - change access and modification times

[C] mode

File type constants (see also man 2 stat)

  • S_IFMT - mask for type bits
  • S_IFREG - regular file
  • S_IFDIR - directory

A lot of times it’s easier to use:

  • S_ISDIR(mode) - true if it’s a directory
  • S_ISREG(mode) - true if it’s a regular file

[D] error code

When encountering errors, you need to return error codes. The comments in fs5600.c will tell you which errors are ought to return from which methods. Here is a list of error codes that you will use:

  • ENOENT - file/dir not found
  • EIO - if block_read or block_write fail
  • ENOMEM - out of memory
  • ENOTDIR - not a directory
  • EISDIR - is a directory
  • EINVAL - invalid parameter (see comments in fs5600.c)
  • EOPNOTSUPP - not implemented yet

Note that you can use strerror(e) to get a string representation of an error code. Note: you’ll need to use the positive error code for strerror() (e.g., ENOENT), whereas you’ll need to return the negative version in code (e.g., return -ENOENT;).

[E] fs5600 design choices

Here are the design choices that fs5600 made:

  1. fs5600 chooses to use a single block (block 1) as the block bitmap, which results in the maximum file system size of 128MB.

  2. fs5600 uses a block (4KB) as an inode.

  3. fs5600 puts all block pointers in the inode. An inode can hold 1019 32-bit block pointers (see fs_inode in fs5600.h), for a max file size of ~4MB.

  4. Directories are not nested more than 10 times. For example, there is no /a/b/c/d/e/f/g/h/i/j/k, which have a directory depth of 11. (note: the test and grading scripts will never create directories nested that deep.)

  5. Directories only have one data block (or only inode.ptrs[0] contains valid block ids).

  6. fs5600 does not support links.

  7. rename is only used within the same directory, for example,rename("/dir/f1", "/dir/f2"), and rename("/dir1/f1", "/dir2/f2") is invalid.

  8. truncate is only ever called with len=0, meaning deleting all the data in the file.

[F] implementation hints

1. reuse code. There will be a lot of code sharing between different fs_* functions. You should factor out the code that is used multiple times. It’s bad to duplicate the same piece of code in multiple functions (sometimes known as “copy-paste bugs”; see a research paper about this topic here). An obvious drawback is that you’ll find it hard to fix bugs—fixing one bug in some of the copies does not fix the same bug in other copies!

2. performance. Don’t worry about performance. The simplest possible implementation is okay, but an incorrect implementation is NOT okay.

3. copy paths. FUSE passes a path argument to your functions as const char *, instead of char *. It means that you cannot modify the path and you need to copy the string before operating on it (e.g., using strtok). Typically, people use strdup to duplicate a string.

      char *your_path = strdup(path);
      ...         // do whatever to your_path
      free(your_path);

4. split paths. When walking the hierarchical structure of fs5600, you need to split paths (e.g., /a/b/c) to tokens (e.g., a, b, and c). Here are some hints:

  • strtok is useful (what is this? man strtok).
  • you can safely assume that the number of tokens is <=10 because the maximum depth of nested dirs is 10.
  • you can safely assume that the length of each token is <=27 because the name of a file is stored in char name[28] (in fs_dirent).

5. path translation Assume you’ve split your path, and you have a count (int pathc) and an array of strings (char **pathv). As an example, for a path /home/cs5600, you will have pathc=2, pathv={"home","cs5600"}.

To find the file pointed by /home/cs5600, you need to translate this path (i.e., pathc and pathv) into an inode number. You should:

  • go to the root dir (the inode in block 2)
  • try to find dir “home” (i.e., pathv[0]) in the root dir
  • then, fetch the dir “/home”’s inode
  • try to find file “cs5600” (i.e., pathsv[1]) in the dir “/home”
  • finally, return the inode number of the file “cs5600”, or -ENOENT/-ENOTDIR if there was an error.

6. uid and gid. Here is how fs5600 gets uid and gid:

    struct fuse_context *ctx = fuse_get_context();
    uint16_t uid = ctx->uid;
    uint16_t gid = ctx->gid;

Since we’re not running FUSE as root, you’re only going to see your own uid and gid.

7. timestamps. You can get the timestamp in the proper format with time(NULL). Although the function returns a 64-byte integer, you can assign it to the unsigned 32-bit timestamps in the inode without worrying about overflow until year 2038 (unfortunately, CS5600 spring 2038 will meet the notorious year 2038 problem, or Y2K38 superbug).

(optional) Challenge: bigger files and encrypted fs5600

Lab4 has two challenges. Each gives you 10 extra points if you pass all our tests.

Challenge I bigger files

As mentioned earlier, fs5600 supports a maximum file size of ~4MB. In this challenge, you will expand the maximum file size to be >=16MB.

  • We don’t dictate how to implement this, but the simplest extension is to add multiple indirect pointers (like Unix’s inode).
  • Our test will write exactly 16MB of data.

Challenge II enc-fs5600: encrypted fs5600

enc-fs5600 is a variant of fs5600 in which all file contents are encrypted. We use Caesar Cipher in Lab1 as our encryption mechanism. In particular,

  • enc-fs5600 has a special key file “/key”.
  • the content of “/key” is a string that represents a positive integer (namely, the Caesar cipher’s key in Lab1). As an example, “/key” may contain “123” (note: without tailing ‘\0’).
  • The data written to all files (except “/key”) will be encrypted by the current key (the key in “/key” when writing).
  • The data read from all files (except “/key”) will be decrypted by the current key.
  • The reads and writes from and to “/key” are normal reads and writes.
  • If “/key” doesn’t exist, reads and writes are normal, which is equivalent tokey=0.
  • Here is an example:
    // assume we're in an enc-fs5600
    $ echo 1 > /key      // set the key=1
    
    $ echo "abc" > /foo  // write "abc" to a file
    
    $ cat /foo           // the read will automatically decrypt
    abc
    
    $ rm /key            // remove the key (meaning key=0)
    
    $ cat /foo           // print what is actually stored in the file
    bcd
    

Finally, submit your work

Submitting consists of three steps:

  1. Executing this checklist:
    • Fill in slack.txt with (1) your name and NUID, (2) slack hours you used, (3) if you took the challenges, and (4) acknowledge your influences.
    • Make sure that your code can build without problems.
    • Make sure you have added (git add) all files that you created.
    • Make sure you have committed all your changes (check with git status).
  2. Push your code to GitHub:

     $ cd ~/lab4
     $ git commit -am 'submit lab4'
     $ git push origin
    
     Counting objects: ...
      ....
      To ssh://github.com/NEU-CS5600-21fall/lab4-<username>.git
       7447116..bebeebee  main -> main
    
  3. Actually commit your lab (with timestamp and git commit id):

    1. Get the git commit id of your work. A commit id is a 40-character hexadecimal string. You can obtain the commit id for the last commit by running the command git log -1 --format=oneline.

    2. Paste both your git repo address and the commit id (the hexadecimal string) to Canvas. In Canvas, there will be an assignment for this lab. You should paste the git repo address and the commit id in two lines:
       git@github.com:NEU-CS5600-21fall/lab4-studentid.git
       7447116c788fade33421f242b5dd8312beebeebe
      

      Notice: the repo address must start with git@github.com:... (not https://...). You can get your repo address on GitHub repo page by clicking the green “Code” button, then choose “SSH”.

    3. Note: You can submit as many times as you want; we will grade the last commit id submitted to Canvas. Also, you can submit any commit id in your pushed git history; again, we will grade the commit id submitted to Canvas.

NOTE: Ground truth is what and when you submitted to Canvas.

  • A non-existent commit id in Canvas means that you have not submitted the lab, regardless of what you have pushed to GitHub—we will not grade it. So, please double check your submitted commit id!

  • The time of your submission for the purposes of tracking lateness is the timestamp on Canvas, not the timestamp on GitHub.

This completes the lab.

Acknowledgments

This lab is created by Peter Desnoyers, with modifications by the CS5600 staff. FUSE introduction is borrowed from Mike Walfish’s cs202 lab instructions.