Lab1: NEU File System
In Lab1, you will implement a simplified Unix-like file system, called fs5600.
fs5600 uses the FUSE (File system in User SpacE) library, which allows us to implement a file system in the user space. This makes the development much easier than in the kernel space. FUSE exposes a few file system function interfaces that need to be instantiated and your task is to fill in these function bodies.
There is not too much code to write (if you organize your code nicely); instead, a lot of the work of the lab is understanding the system that you’ve been given.
Challenge: In these challenges, you will addresses some of the simplifications/limitations of fs5600 and make it more “advanced”. See challenges at the end of the lab.
Section 0: Getting started
- We assume you have a Linux machine (or VM). If you don’t have, please contact the instructor to get one.
- Click the GitHub Lab1 link on GitHub homepage to create your Lab1 clone on GitHub.
- Open a Linux terminal.
- Clone Lab1 repo to your local machine:
$ cd ~ $ git clone git@github.com:NEU-CS7670-labs/lab1-<Your-GitHub-Username>.git lab1
Note that the repo address
git@github.com:...
can be obtained by going to GitHub repo page (your cloned lab1), clicking the green “Code” button, then choose “SSH”. - Check contents:
$ cd ~/lab1; ls // you should see: Makefile disk1.in disk2.in diskfmt.py fs5600.c fs5600.h gen-disk.py lab1fuse.c misc.c read-img.py slack.txt test1.c test2.c
- Install required packages (Ubuntu):
sudo apt install check sudo apt install libfuse-dev sudo apt install zlib1g-dev sudo apt install python2
(note: if you’re not using Ubuntu but other Linux distributions, please Google and install the corresponding packages in your platform.)
Section 1: FUSE
The file system that we will build is implemented as a user-level process. This file system’s storage will be a file (which is created by us called test.img
) that lives in the normal file system of your devbox. Much of your code will treat this file as if it were a disk.
This entire arrangement (file system implemented in user space with arbitrary choice of storage) is due to software called FUSE (File system in User SpacE).
In order to really understand what FUSE is doing, we need to take a brief detour to describe VFS. Linux (like Unix) has a layer of kernel software called VFS; conceptually, every file system sits below this layer, and exports a uniform interface to VFS. (You can think of any potential file system as being a “driver” for VFS; VFS asks the software below it to do things like “read”, “write”, etc.; that software fulfills these requests by interacting with a disk driver, and interpreting the contents of disk blocks.)
The purpose of this architecture is to make it relatively easy to plug a new file system into the kernel: the file system writer simply implements the interface that VFS is expecting from it, and the rest of the OS uses the interface that VFS exports to it. In this way, we obtain the usual benefits of pluggability and modularity.
FUSE is just another “VFS driver”, but it comes with a twist. Instead of FUSE implementing a disk-based file system (the usual picture), it responds to VFS’s requests by asking a user-level process (which is called a “FUSE driver”) to respond to it. So the FUSE kernel module is an adapter that speaks “fuse” to a user-level process (and you will be writing your code in this user-level process) and “VFS” to the rest of the kernel.
Meanwhile, a FUSE driver can use whatever implementation it wants. It could store its data in memory, across the network, on Jupiter, whatever. In the setup in this lab, the FUSE driver will interact with a traditional Linux file (as noted above), and pretend that this file is a sector-addressable disk.
The FUSE driver registers a set of callbacks with the FUSE system (via libfuse
and ultimately the FUSE kernel module); these callbacks are things like read, write, etc. A FUSE driver is associated with a particular directory, or mount point. The concept of mounting was explained in OSTEP 39 (see 39.17). Any I/O operations requested on files and directories under this mount point are dispatched by the kernel (via VFS, the FUSE kernel module, and libfuse
) to the callbacks registered by the FUSE driver.
To recap all of the above: the file system user interacts with the file system roughly in this fashion:
When the file system user, Process A, makes a request to the system, such as listing all files in a directory via
ls
, thels
process issues one or more system calls (stat()
,read()
, etc.).The kernel hands the system call to VFS.
VFS finds that the system call is referencing a file or directory that is managed by FUSE.
VFS then dispatches the request to FUSE, which dispatches it to the corresponding FUSE driver (which is where you will write your code).
The FUSE driver handles the request by interacting with the “disk”, which is implemented as an ordinary file. The FUSE driver then responds, and the responses go back through the chain.
Exercise 0 the first glance of the unimplemented fs5600.
Here’s an example to show how fs5600 works with FUSE, where
test.img
is a disk image that the lab creators created:$ cd ~/lab1 $ make // you should see: cc -ggdb3 -Wall -O0 -c -o misc.o misc.c ... $ mkdir fs // create a directory to serve as a mount point $ df fs // see what file system fs is associated with Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda1 7092728 4536616 2172780 68% / $ ls fs // notice, "fs" is empty $ ./lab1fuse -image test.img fs // mount test.img at fs $ df fs // this ends up with an error because you haven't implemented fs5600. // but notice that fs's file system is different. df: fs: Operation not supported $ ls fs // similarly, "ls" reports an error as well ls: cannot access 'fs': Operation not supported $ ls -l // you will see that fs is in an unknown state ... d????????? ? ? ? ? ? fs ... $ fusermount -u fs // now unmount fs $ ls fs // and "fs" is an empty dir again
Note that in the above example, after we run
lab1fuse
, the kernel is actually dispatching all thereaddir()
,statfs()
etc. calls thatls
anddf
make to fs5600 FUSE driver. Given that you haven’t implemented these functions, fs5600 returns error messages that these operations are not supported. Later, whenfusermount
is run, fs5600 is unmounted fromfs
, and then all I/O operations underfs
return to being serviced normally by the kernel.
Section 2: NEU File System (fs5600)
How you should use fs5600 descriptions: below are the design of fs5600 and a bunch of facts about fs5600. Here is how you should use them: first, you should skim all the contents before you start coding. The point is that you should have a basic understanding of how things work and an impression of what information is where. Later, when you find something unclear or uncertain, you should come back and study the corresponding part carefully. I would expect you to come back often to understand how things work.
Contents
- fs5600 disk
- File system format
- Superblock
- Block bitmap
- Block allocation (with block bitmap)
- Inode
- File mode
- Directory
fs5600 disk
The fs5600 disk is abstracted as an array of blocks. The size of each block is 4KB
(4096 bytes or FS_BLOCK_SIZE
). You can read/write disk by two simple interfaces:
int block_read(void *buf, int lba, int nblks);
int block_write(void *buf, int lba, int nblks);
The buf
is a pointer to memory where you want to store the data from disk (for reads) or contains the data to disk (for writes). The lba
is the block id (starting from 0
) you want to read/write. The nblks
is the number of blocks you will read/write.
File system format
The format of fs5600 is as follows:
- the first block (block 0) is the superblock;
- the second block (block 1) is the bitmap block;
- the third block (block 2) is the root directory;
- other blocks are blocks that contain either inodes or data.
+-------+--------+----------+------------------------+
| super | block | root dir | data blocks ... |
| block | bitmap | inode | |
+-------+--------+----------+------------------------+
block 0 1 2 3 ...
Superblock
The superblock is the first block in the file system, and contains the information needed to find the rest of the file system structures. The following C structure (see fs5600.h
) defines the superblock:
struct fsx_superblock {
uint32_t magic; /* 0x30303635 - some magic number */
uint32_t disk_size; /* in 4096-byte blocks */
char pad[4088]; /* to make size = 4096 */
};
Note that uint32_t
is a standard C type found in the <stdint.h>
header file, and refers to an unsigned 32-bit integer. (similarly, uint16_t
, int16_t
and int32_t
are unsigned/signed 16-bit ints and signed 32-bit ints)
Block bitmap
This block bitmap indicates which blocks have been used and which are free. One bit represents one block: the first bit represents the first block (block 0), the second bit represents the second block (block 1), and so forth. If a bit is 0
, the corresponding block is free; otherwise, the block is used.
In fs5600, the first three bits are always 1
because they are (1) the super block, (2) the block bitmap itself, and (3) the root dir inode. (the bits for blocks 0, 1 and 2 will be set to 1 when the file system is created, so you don’t have to worry about excluding them when you search for a free block.)
Block allocation (with block bitmap)
To allocate blocks from disk, you need to manipulate block bitmap.
An easy way to achieve this is to cache the block bitmap in a 4KB memory buffer. (hint: you can make that buffer a global variable, and read it in fs_init
function, and sync it back to disk after each block allocation and free.)
You’re given the following functions to handle the bitmap in memory:
int bit_test((void*)map, int i);
void bit_set(unsigned char *map, int i);
void bit_clear(unsigned char *map, int i);
A side note: using a single block for the bitmap means that the maximum number of disk blocks fs5600 can track is 32768 (= 4096 * 8)
. Each block is 4KB. Hence the maximum size of an fs5600 is 128MB (= 32768 * 4KB)
.
Inodes
fs5600 simplifies the standard Unix inode.
In particular, inode is defined as follows (see also fs5600.h
):
struct fs_inode {
uint16_t uid; /* file owner */
uint16_t gid; /* group */
uint32_t mode; /* type + permissions (see below) */
uint32_t ctime; /* last status change time */
uint32_t mtime; /* modification time */
int32_t size; /* size in bytes */
uint32_t ptrs[FS_BLOCK_SIZE/4 - 5]; /* inode = 4096 bytes */
};
Each inode corresponds to a file or directory; in a sense the inode is that file or directory. In fs5600, inode can be uniquely identified by its inode number, which is its block id. Again, fs5600 uses block ids as inode numbers. The root directory is always found in inode 2
; inode 0
(the super block) is invalid and can be used as a NULL
value.
Hint: a path lookup always begins with inode 2
, the root directory.
File mode
The FUSE APIs (and Linux internals in general) mash together the concept of object type (file/directory/device/symlink…) and permissions. The result is called the file mode
(the attribute mode
in the fs_inode
above), and looks like this:
|<-- S_IFMT --->| |<-- user ->|<- group ->|<- world ->|
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| F | D | | | | | | R | W | X | R | W | X | R | W | X |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
Since it has multiple 3-bit fields, it is commonly displayed in base 8 (octal). For example, permissions allowing RWX
for everyone (rwxrwxrwx
) are encoded as 777
(an octal number). Note that, in C, octal numbers are indicated by a leading 0
, e.g., 0777
, so the expression 0777 == 511
is true.
The F
and D
bits correspond to 0100000
and 040000
, indicating if the inode is a regular file (F
bit is set) or a directory (D
bit is set).
A side note: since the demise of 36-bit computers in the early 70s, Unix permissions are about the only thing in the world that octal is used for.
There are a bunch of macros (in <sys/stat.h>
) you can use for looking at the mode; you can see the official documentation with the command man inode
, but the important ones are:
S_ISREG(m)
- is it (i.e. the inode with modem
) a regular file?S_ISDIR(m)
- is it a directory?S_IFMT
- bitmap for inode type bits.
You can get just the inode type with the expressionm & S_IFMT
, and just the permission bits with the expressionm & ~S_IFMT
. (note that~
is bitwise NOT - i.e.0
s become1
s and1
s become0
s)
Directory
Directory is a special file that holds an array of directory entries (fs_dirent
).
Each directory entry maps a human-readable name (a string) to an inode number (again, an inode number is a block id). It is defined as follows (see also fs5600.h
):
struct fs_dirent {
uint32_t valid : 1; /* the first bit of uin32_t */
uint32_t inode : 31; /* the next 31 bits of uint32_t */
char name[28]; /* with trailing NULL */
};
Each fs_dirent
is 32 bytes
, giving 128 (=4096/32)
directory entries in one dir. Why? This is a design choice of fs5600: fs5600 directories always have only one data block, meaning only the first pointer (ptrs[0]
) in a dir’s inode contains a valid block id; others are invalid (ptrs[i] == 0
where i>0
). That said, fs5600’s dir has one data block, which is 4KB
, and one direntry is 32B
, hence a dir only supports a maximum of 128 files.
A directory contains (name, inode)
pairs, which capture by fs_dirent
. In particular,
- the
valid
bit indicates if directory entries are being used (1
means yes;0
means no). - the
inode
contains a block id on disk that is where the file’s inode locates. Note: in fs5600, an inode is a block (4KB in size) and a inode number is the block id on disk. name
is the file name. The maximum name length is27 bytes
, allowing entries to always have a terminating\0
so you can use string functions (e.g.,strcmp
,strdup
) without any complications.
Appendix: fs5600 facts
Appendix has a bunch of fs5600 facts and some implementation hints. You can use them as references.
Section 3: fs5600 implementation, Part A
Before implementing anything, run:
$ make testa
...
0%: Checks: 12, Failures: 12, Errors: 0
...
And you can see that you pass 0 test in testa
.
Now, get ready to code.
Exercise 1 implement fs_init
and fs_statfs
.
- Read
fs5600.h
.- Implement
fs_init
, initializing your fs5600. If experiencing errors, thenexit(1)
; otherwise, returnNULL
.fs_statfs
infs5600.c
, reporting file system statistics. You should always return0
(in fs5600,return 0
means okay).- After implementing these two functions, you should pass 1 test:
$ make testa ... 8%: Checks: 12, Failures: 11, Errors: 0 gradetest-1.c:284:P:read_mostly:test_statvfs:0: Passed ...
Notes on testing I
- When running
make testa
, under the hood, you’re running unit tests intest1.c
. - When failing one test case, for example,
test1.c:279:F:read_mostly:test_statvfs:0: Assertion 'rv == 0' failed: rv == -95, 0 == 0
The message describes the failure with detailed information, including:
- file (
test1.c
), - line number (
279
), - failing test case (
read_mostly:test_statvfs
) - and why failing (
Assertion rv == 0
but gotrv == -95
).
- file (
- You can use
printf
orgdb
to debug (take a look atMakefile
to see which binary togdb
).
Exercise 2 implement fs_getattr
and fs_readdir
.
fs_getattr
is used to get attributes of a file/directory.fs_readdir
enumerates entries in a directory (think ofls
).- implement these two functions in
fs5600.c
, and runmake testa
. Now you should pass 5 tests:$ make testa ... 41%: Checks: 12, Failures: 7, Errors: 0
- you can safely ignore
struct fuse_file_info *fi
(also apply to later Exercises); it is a FUSE data structure, see details here.- The
ctime
andmtime
in fs5600’s inodes are seconds since 1970/1/1 (see appendix timestamps). When implementinginode2stat()
, you should fill these (ctime
/mtime
) intotimespec
’stv_sec
and leavetv_nsec
to be0
.
Notes on testing II
- We use a predefined disk image named
test.img
formake testa
. - The image has known contents (see below), and you can use this information to check if your code works properly.
test.img
contents:# path uid gid mode size ctime mtime "/", 0, 0, 040777, 4096, 1565283152, 1565283167 "/file.1k", 500, 500, 0100666, 1000, 1565283152, 1565283152 "/file.10", 500, 500, 0100666, 10, 1565283152, 1565283167 "/dir-with-long-name", 0, 0, 040777, 4096, 1565283152, 1565283167 "/dir-with-long-name/file.12k+", 0, 500, 0100666, 12289, 1565283152, 1565283167 "/dir2", 500, 500, 040777, 8192, 1565283152, 1565283167 "/dir2/twenty-seven-byte-file-name", 500, 500, 0100666, 1000, 1565283152, 1565283167 "/dir2/file.4k+", 500, 500, 0100777, 4098, 1565283152, 1565283167 "/dir3", 0, 500, 040777, 4096, 1565283152, 1565283167 "/dir3/subdir", 0, 500, 040777, 4096, 1565283152, 1565283167 "/dir3/subdir/file.4k-", 500, 500, 0100666, 4095, 1565283152, 1565283167 "/dir3/subdir/file.8k-", 500, 500, 0100666, 8190, 1565283152, 1565283167 "/dir3/subdir/file.12k", 500, 500, 0100666, 12288, 1565283152, 1565283167 "/dir3/file.12k-", 0, 500, 0100777, 12287, 1565283152, 1565283167 "/file.8k+", 500, 500, 0100666, 8195, 1565283152, 1565283167
read-img.py
: This is a utility tool that prints lots of information about a disk image, which may help figure out what you messed up.- Try
python2 read-img.py test.img
(note that all python scripts in Lab1 requires Python2.7). - The “blocks used” line shows the blocks (including inodes) marked as used in the bitmap;
- the “inodes found” line shows the inodes found by recursively descending from the root.
- It’s useful for debugging some things; not useful for others.
- Try
Exercise 3 finish part A
- implement
fs_read
, read data from a file.- implement
fs_rename
, rename a file.- implement
fs_chmod
, change file permissions.- after implementing all these functions in
fs5600.c
, you should pass all 12 tests:$ make testa ... 100%: Checks: 12, Failures: 0, Errors: 0
- Note that both
fs_chmod
andfs_rename
will (obviously) modify files/dirs, hence the disk image. But you will not see the modifications when runningmake testa
becausemake testa
recreates the originaltest.img
every time.
Notes on testing III
- We’re using Check, a unit testing framework for C (recall
sudo apt install check
that you run). - You can add debugging code (e.g., the beloved
printf
) or more test cases intest1.c
. - Below is a crash course of the Check framework. (after reading this, you should be able to understand the code in
test1.c
.)
Crash course: Check framework
A unit test is defined and declared with the macro START_TEST
and END_TEST
as follows:
START_TEST(test_name)
{
code;
code;
}
END_TEST
Running a unit test requires to call suite_add_tcase
(see examples in main
of test1.c
):
tcase_add_test(tc, test_name);
To check if something is expected, here are several useful Check functions:
ck_abort("message"); /* fail unconditionally */
ck_assert(expr); /* fail if expression is false */
ck_assert_int_eq(i1, i2); /* check if i1 == i2; if not, alarm! */
ck_assert_int_ne/lt/le/ge/gt(i1, i2); /* not eq, less than, etc. */
ck_assert_str_eq("string1", "string2");
ck_assert_ptr_eq(ptr, NULL);
ck_assert_ptr_ne(ptr, NULL); /* assert ptr is NOT null */
Section 4: fs5600 implementation, Part B
Now, let’s run another set of tests testb
and you should see:
$ make testb
...
0%: Checks: 27, Failures: 27, Errors: 0
...
Exercise 4 implement fs_create
and fs_mkdir
.
- implement
fs_create
, creating a new (empty) filefs_create
errors:
- bad path
/a/b/c
-b
doesn’t exist (should return-ENOENT
)- bad path
/a/b/c
-b
isn’t directory (-ENOTDIR
)- bad path
/a/b/c
-c
exists, is file (-EEXIST
)- bad path
/a/b/c
-c
exists, is directory (-EEXIST
)- too-long name (more than 27 characters) (
-EINVAL
)- implement
fs_mkdir
, creating new (empty) directoryfs_mkdir
errors:
- bad path
/a/b/c
-b
doesn’t exist (-ENOENT
)- bad path
/a/b/c
-b
isn’t directory (-ENOTDIR
)- bad path
/a/b/c
-c
exists, is file (-EEXIST
)- bad path
/a/b/c
-c
exists, is directory (-EEXIST
)- too-long name (more than 27 characters) (
-EINVAL
)- after implementing, you should pass 4 cases:
$ make testb ... 14%: Checks: 27, Failures: 23, Errors: 0
- Note: we’re using a test strategy called “mocking” to deal with the
fuse_get_context
call. We provide a fake version of the function in our test code, and in each test we can fill in any information we want it to return.
Notes on testing IV
- In
make testb
(testing code intest2.c
), we change to an empty disk imagetest2.img
. - You don’t have to worry about scrabbling the disk (which you certainly did in Exercise 4) because
make testb
rebuilds the disk every time.- in particular,
make
runs$ python2 gen-disk.py -q disk2.in test2.img
- in particular,
- To test writing logic (including
create
,mkdir
, etc.), we have to trust yourreaddir
andread
implementations:test2.c
uses your Part A implementation as the ground truth.- That’s the primary reason why the assignment is explicitly split into two parts.
- That said, if you have a bug in Part A, then you can get confusing results…
- …but you shouldn’t worry too much because we have a lot of tests in
testa
… - …however, tests cannot guarantee absence of bugs…
- …but there is not much we can do about it.
A side note: we do know one way to prove absence of bugs, namely verificaition. Check out FSCQ, a formally certified crash-proof file system.
Exercise 5 implement fs_unlink
and fs_rmdir
.
- implement
fs_unlink
, removing a filefs_unlink
errors:
- bad path
/a/b/c
-b
doesn’t exist (-ENOENT
)- bad path
/a/b/c
-b
isn’t directory (-ENOTDIR
)- bad path
/a/b/c
-c
doesn’t exist (-ENOENT
)- bad path
/a/b/c
-c
is directory (-EISDIR
)- implement
fs_rmdir
, removing a directoryfs_rmdir
errors:
- bad path
/a/b/c
-b
doesn’t exist (-ENOENT
)- bad path
/a/b/c
-b
isn’t directory (-ENOTDIR
)- bad path
/a/b/c
-c
doesn’t exist (-ENOENT
)- bad path
/a/b/c
-c
is file (-ENOTDIR
)- directory not empty (
-ENOTEMPTY
)- after implementing, you should pass another 10 cases:
$ make testb ... 51%: Checks: 27, Failures: 13, Errors: 0
Exercise 6 finish Part B.
- implement
fs_write
, writing to a file- implement
fs_truncate
, deleting the contents of a file- implement
fs_utime
, changing access and modification times- after implementing, you should pass all cases:
$ make testb ... 100%: Checks: 27, Failures: 0, Errors: 0
Put it all together—mount fs5600 and use it!
After finishing Part A and Part B, your fs5600 is now usable, namely you can actually use it!
Remember the first glance of fs5600 (the Exercise 0)? Now let’s redo it.
Exercise 7 (or 0+) rerun Exercise 0.
Let’s revisit Exercise 0:
$ cd ~/lab1 $ make // you should see: cc -ggdb3 -Wall -O0 -c -o misc.o misc.c ... $ mkdir fs // create a directory to serve as a mount point $ df fs // see what file system fs is associated with Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda1 7092728 4536616 2172780 68% / $ ls fs // notice, 'fs' is empty $ ./lab1fuse -image test.img fs // mount test.img at fs $ df fs // below you should see the information of your fs5600. // this indicates that fs5600 is working. Filesystem 1K-blocks Used Available Use% Mounted on lab1fuse 1592 172 1420 11% /home/ubuntu/lab1/fs $ ls fs // you will see the files in test.img dir-with-long-name dir2 dir3 file.10 file.1k file.8k+ $ ls -l // you will see that fs is in an valid state ... drwxrwxrwx 1 root root 4096 Aug 8 2019 fs ... $ cd fs // do whatever you want! // try "mkdir", "touch", "cat", "ls", etc. // you can umount your fs5600 by: $ fusermount -u fs
Now, you can actually use your file system for day-to-day work! How exciting!!
Appendix: fs5600 facts
[A] Lab1 files and their meanings:
fs5600.h
: fs5600 data structure definitionsfs5600.c
: the core functions of fs5600 (your code here)test1.c
,test2.c
: test casesmisc.c
,lab1fuse.c
: support codegen-disk.py
,disk1.in
,disk2.in
: scripts to generates disk imagesread-img.py
: a utility tool to print image information
[B] Your jobs:
you will implement all the functionalities of fs5600, which are classified into two parts.
In Part A, you will need to implement the following functions in fs5600.c
:
fs_init
- constructor (i.e. put your init code here)fs_statfs
- report file system statisticsfs_getattr
- get attributes of a file/directoryfs_readdir
- enumerate entries in a directoryfs_read
- read data from a filefs_rename
- rename a filefs_chmod
- change file permissions
In Part B, you will need to implement these functions:
fs_create
- create a new (empty) filefs_mkdir
- create new (empty) directoryfs_unlink
- remove a filefs_rmdir
- remove a directoryfs_write
- write to a filefs_truncate
- delete the contents of a filefs_utime
- change access and modification times
[C] mode
File type constants (see also man 2 stat
)
S_IFMT
- mask for type bitsS_IFREG
- regular fileS_IFDIR
- directory
A lot of times it’s easier to use:
S_ISDIR(mode)
- true if it’s a directoryS_ISREG(mode)
- true if it’s a regular file
[D] error code
When encountering errors, you need to return error codes. The comments in fs5600.c
will tell you which errors are ought to return from which methods. Here is a list of error codes that you will use:
ENOENT
- file/dir not foundEIO
- ifblock_read
orblock_write
failENOMEM
- out of memoryENOTDIR
- not a directoryEISDIR
- is a directoryEINVAL
- invalid parameter (see comments in fs5600.c)EOPNOTSUPP
- not implemented yet
Note that you can use strerror(e)
to get a string representation of an error code. Note: you’ll need to use the positive error code for strerror()
(e.g., ENOENT
), whereas you’ll need to return the negative version in code (e.g., return -ENOENT;
).
[E] fs5600 design choices
Here are the design choices that fs5600 made:
fs5600 chooses to use a single block (block 1) as the block bitmap, which results in the maximum file system size of
128MB
.fs5600 uses a block (
4KB
) as an inode.fs5600 puts all block pointers in the inode. An inode can hold
1019
32-bit block pointers (seefs_inode
infs5600.h
), for a max file size of~4MB
.Directories are not nested more than 10 times. For example, there is no
/a/b/c/d/e/f/g/h/i/j/k
, which have a directory depth of11
. (note: the test and grading scripts will never create directories nested that deep.)Directories only have one data block (or only
inode.ptrs[0]
contains valid block ids).fs5600 does not support links.
rename
is only used within the same directory, for example,rename("/dir/f1", "/dir/f2")
, andrename("/dir1/f1", "/dir2/f2")
is invalid.truncate
is only ever called withlen=0
, meaning deleting all the data in the file.
[F] implementation hints
1. reuse code. There will be a lot of code sharing between different fs_*
functions. You should factor out the code that is used multiple times. It’s bad to duplicate the same piece of code in multiple functions (sometimes known as “copy-paste bugs”; see a research paper about this topic here). An obvious drawback is that you’ll find it hard to fix bugs—fixing one bug in some of the copies does not fix the same bug in other copies!
2. performance. Don’t worry about performance. The simplest possible implementation is okay, but an incorrect implementation is NOT okay.
3. copy paths. FUSE passes a path
argument to your functions as const char *
, instead of char *
. It means that you cannot modify the path
and you need to copy the string before operating on it (e.g., using strtok
). Typically, people use strdup
to duplicate a string.
char *your_path = strdup(path);
... // do whatever to your_path
free(your_path);
4. split paths. When walking the hierarchical structure of fs5600, you need to split paths (e.g., /a/b/c
) to tokens (e.g., a
, b
, and c
). Here are some hints:
strtok
is useful (what is this?man strtok
).- you can safely assume that the number of tokens is
<=10
because the maximum depth of nested dirs is 10. - you can safely assume that the length of each token is
<=27
because the name of a file is stored inchar name[28]
(infs_dirent
).
5. path translation Assume you’ve split your path, and you have a count (int pathc
) and an array of strings (char **pathv
). As an example, for a path /home/cs5600
, you will have pathc=2
, pathv={"home","cs5600"}
.
To find the file pointed by /home/cs5600
, you need to translate this path (i.e., pathc
and pathv
) into an inode number. You should:
- go to the root dir (the inode in block 2)
- try to find dir “home” (i.e.,
pathv[0]
) in the root dir - then, fetch the dir “/home”’s inode
- try to find file “cs5600” (i.e.,
pathsv[1]
) in the dir “/home” - finally, return the inode number of the file “cs5600”, or
-ENOENT
/-ENOTDIR
if there was an error.
6. uid
and gid
. Here is how fs5600 gets uid
and gid
:
struct fuse_context *ctx = fuse_get_context();
uint16_t uid = ctx->uid;
uint16_t gid = ctx->gid;
Since we’re not running FUSE as root, you’re only going to see your own uid
and gid
.
7. timestamps. You can get the timestamp in the proper format with time(NULL)
. Although the function returns a 64-byte integer, you can assign it to the unsigned 32-bit timestamps in the inode without worrying about overflow until year 2038 (unfortunately, spring 2038 will meet the notorious year 2038 problem, or Y2K38 superbug).
Challenge: learned fs5600
Below is an exploratory task that we do not have a canonical solution.
Challenge The current fs5600 is access pattern agnostic—no matter what application uses fs5600, the fs operations work exactly the same. If we assume some prior knowledge of the application, can we do better than the current fs5600?
In fact, such prior knowledge is quite common in practice. For example, files created by PyTorch to store training data will likely be accessed periodically and sequentially because the files are read once (sequentially) during every epoch. Can you propose a learned fs5600 that behaves differently for different workloads and apps?
A basic idea is to trace fs behaviors of an app (what does behavior mean here? many choices; you should decide) running on fs5600 by instrumenting fs5600. Then, train a neural network to learn access patterns of this app. (What are the inputs and outputs of the NN? You need to design.) Finally, apply the trained network to the fs5600 such that the fs operations are optimized for this app. (What are these optimizations? and what’s the NN’s role during running learned fs5600? Think about it.) Give it a try with your fs5600 code base.
This completes the lab.
Acknowledgments
This lab is created by Peter Desnoyers, with modifications by the CS5600 21fall staff. FUSE introduction is borrowed from Mike Walfish’s cs202 lab instructions.