Lab5: CS5600 File System
In Lab5, you will implement a simplified Unix-like file system, called fs5600 (CS5600 File System).
fs5600 uses the FUSE (File system in User SpacE) library, which allows us to implement a file system in the user space. This makes the development much easier than in the kernel space. FUSE exposes a few file system function interfaces that need to be instantiated and your task is to fill in these function bodies.
As always, there is not too much code to write (if you organize your code nicely); instead, a lot of the work of the lab is understanding the system that you’ve been given.
Challenge: again, Lab5’s challenges are optional. In these challenges, you will addresses some of the simplifications/limitations of fs5600 and make it more “advanced”. See challenges at the end of the lab.
Section 0: Getting started
- Click the GitHub Lab5 link on Canvas homepage to create your Lab5 clone on GitHub.
- Start your VM and open the VM’s terminal.
- Clone Lab5 repo to VM:
$ cd ~ $ git clone git@github.com:NEU-CS5600-labs/lab5-<Your-GitHub-Username>.git lab5
Note that the repo address
git@github.com:...
can be obtained by going to GitHub repo page (your cloned lab5), clicking the green “Code” button, then choose “SSH”. - Check contents:
$ cd ~/lab5; ls // you should see: Makefile disk1.in disk2.in diskfmt.py fs5600.c fs5600.h gen-disk.py lab5fuse.c misc.c read-img.py slack.txt test1.c test2.c
- Install required packages (Ubuntu):
sudo apt install check sudo apt install libfuse-dev sudo apt install zlib1g-dev sudo apt install python2
(note: if you’re not using Ubuntu but other Linux distributions, please Google and install the corresponding packages in your platform.)
Section 1: FUSE
The file system that we will build is implemented as a user-level process. This file system’s storage will be a file (which is created by us called test.img
) that lives in the normal file system of your devbox. Much of your code will treat this file as if it were a disk.
This entire arrangement (file system implemented in user space with arbitrary choice of storage) is due to software called FUSE (File system in User SpacE).
In order to really understand what FUSE is doing, we need to take a brief detour to describe VFS. Linux (like Unix) has a layer of kernel software called VFS; conceptually, every file system sits below this layer, and exports a uniform interface to VFS. (You can think of any potential file system as being a “driver” for VFS; VFS asks the software below it to do things like “read”, “write”, etc.; that software fulfills these requests by interacting with a disk driver, and interpreting the contents of disk blocks.)
The purpose of this architecture is to make it relatively easy to plug a new file system into the kernel: the file system writer simply implements the interface that VFS is expecting from it, and the rest of the OS uses the interface that VFS exports to it. In this way, we obtain the usual benefits of pluggability and modularity.
FUSE is just another “VFS driver”, but it comes with a twist. Instead of FUSE implementing a disk-based file system (the usual picture), it responds to VFS’s requests by asking a user-level process (which is called a “FUSE driver”) to respond to it. So the FUSE kernel module is an adapter that speaks “fuse” to a user-level process (and you will be writing your code in this user-level process) and “VFS” to the rest of the kernel.
Meanwhile, a FUSE driver can use whatever implementation it wants. It could store its data in memory, across the network, on Jupiter, whatever. In the setup in this lab, the FUSE driver will interact with a traditional Linux file (as noted above), and pretend that this file is a sector-addressable disk.
The FUSE driver registers a set of callbacks with the FUSE system (via libfuse
and ultimately the FUSE kernel module); these callbacks are things like read, write, etc. A FUSE driver is associated with a particular directory, or mount point. The concept of mounting was explained in OSTEP 39 (see 39.17). Any I/O operations requested on files and directories under this mount point are dispatched by the kernel (via VFS, the FUSE kernel module, and libfuse
) to the callbacks registered by the FUSE driver.
To recap all of the above: the file system user interacts with the file system roughly in this fashion:
When the file system user, Process A, makes a request to the system, such as listing all files in a directory via
ls
, thels
process issues one or more system calls (stat()
,read()
, etc.).The kernel hands the system call to VFS.
VFS finds that the system call is referencing a file or directory that is managed by FUSE.
VFS then dispatches the request to FUSE, which dispatches it to the corresponding FUSE driver (which is where you will write your code).
The FUSE driver handles the request by interacting with the “disk”, which is implemented as an ordinary file. The FUSE driver then responds, and the responses go back through the chain.
Exercise 0 the first glance of the unimplemented fs5600.
Here’s an example to show how fs5600 works with FUSE, where
test.img
is a disk image that CS5600 staff created:$ cd ~/lab5 $ make // you should see: cc -ggdb3 -Wall -O0 -c -o misc.o misc.c ... $ mkdir fs // create a directory to serve as a mount point $ df fs // see what file system fs is associated with Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda1 7092728 4536616 2172780 68% / $ ls fs // notice, "fs" is empty $ ./lab5fuse -image test.img fs // mount test.img at fs $ df fs // this ends up with an error because you haven't implemented fs5600. // but notice that fs's file system is different. df: fs: Operation not supported $ ls fs // similarly, "ls" reports an error as well ls: cannot access 'fs': Operation not supported $ ls -l // you will see that fs is in an unknown state ... d????????? ? ? ? ? ? fs ... $ fusermount -u fs // now unmount fs $ ls fs // and "fs" is an empty dir again
Note that in the above example, after we run
lab5fuse
, the kernel is actually dispatching all thereaddir()
,statfs()
etc. calls thatls
anddf
make to fs5600 FUSE driver. Given that you haven’t implemented these functions, fs5600 returns error messages that these operations are not supported. Later, whenfusermount
is run, fs5600 is unmounted fromfs
, and then all I/O operations underfs
return to being serviced normally by the kernel.
Section 2: CS5600 File System (fs5600)
How you should use fs5600 descriptions: below are the design of fs5600 (which we studied in class) and a bunch of facts about fs5600. Here is how you should use them:
- First, you should skim all the contents before you start coding.
- After skimming, you should have a basic understanding of how things work and have an impression of what information is where.
- Later, when something is unclear or uncertain, you should come back and study the corresponding part carefully.
- I would expect you to come back often to understand how things work.
Contents
- fs5600 disk interface
- File system format
- Superblock
- Block bitmap
- Block allocation (with block bitmap)
- Inode
- File mode
- Directory
- Appendix introduction
fs5600 disk interface
The fs5600 disk is abstracted as an array of blocks. The size of each block is 4KB
(4096 bytes or FS_BLOCK_SIZE
). You can read/write disk by two simple interfaces:
int block_read(void *buf, int lba, int nblks);
int block_write(void *buf, int lba, int nblks);
The buf
is a pointer to memory where you want to store the data from disk (for reads) or contains the data to disk (for writes). The lba
is the block id (starting from 0
) you want to read/write. The nblks
is the number of blocks you will read/write.
File system format
The format of fs5600 is as follows:
- the first block (block 0) is the superblock;
- the second block (block 1) is the bitmap block;
- the third block (block 2) is the root directory;
- other blocks are blocks that contain either inodes or data.
+-------+--------+----------+------------------------+
| super | block | root dir | data blocks ... |
| block | bitmap | inode | |
+-------+--------+----------+------------------------+
block 0 1 2 3 ...
Superblock
The superblock is the first block in the file system, and contains the information needed to find the rest of the file system structures. The following C structure (see fs5600.h
) defines the superblock:
struct fsx_superblock {
uint32_t magic; /* 0x30303635 - some magic number */
uint32_t disk_size; /* in 4096-byte blocks */
char pad[4088]; /* to make size = 4096 */
};
Note that uint32_t
is a standard C type found in the <stdint.h>
header file, and refers to an unsigned 32-bit integer. (similarly, uint16_t
, int16_t
and int32_t
are unsigned/signed 16-bit ints and signed 32-bit ints)
Block bitmap
This block bitmap indicates which blocks have been used and which are free. One bit represents one block: the first bit represents the first block (block 0), the second bit represents the second block (block 1), and so forth. If a bit is 0
, the corresponding block is free; otherwise, the block is used.
In fs5600, the first three bits are always 1
because they are (1) the super block, (2) the block bitmap itself, and (3) the root dir inode. (the bits for blocks 0, 1 and 2 will be set to 1 when the file system is created, so you don’t have to worry about excluding them when you search for a free block.)
Block allocation (with block bitmap)
To allocate blocks from disk, you need to manipulate block bitmap.
The bitmap is cached in a 4KB memory buffer (a global variable block_bitmap
). You will need keep the cache in sync by writing it back to disk after each block allocation and free.
You’re given the following functions to handle the bitmap in memory:
int bit_test((void*)map, int i);
void bit_set(unsigned char *map, int i);
void bit_clear(unsigned char *map, int i);
A side note: using a single block for the bitmap means that the maximum number of disk blocks fs5600 can track is 32768 (= 4096 * 8)
. Each block is 4KB. Hence the maximum size of an fs5600 is 128MB (= 32768 * 4KB)
.
Inodes
fs5600 simplifies the standard Unix inode.
In particular, inode is defined as follows (see also fs5600.h
):
struct fs_inode {
uint16_t uid; /* file owner */
uint16_t gid; /* group */
uint32_t mode; /* type + permissions (see below) */
uint32_t ctime; /* last status change time */
uint32_t mtime; /* modification time */
int32_t size; /* size in bytes */
uint32_t ptrs[FS_BLOCK_SIZE/4 - 5]; /* inode = 4096 bytes */
};
Each inode corresponds to a file or directory; in a sense the inode is that file or directory. In fs5600, inode can be uniquely identified by its inode number, which is its block id. Again, fs5600 uses block ids as inode numbers. The root directory is always found in inode 2
; inode 0
(the super block) is invalid and can be used as a NULL
value.
Hint: a path lookup always begins with inode 2
, the root directory.
File mode
The FUSE APIs (and Linux internals in general) mash together the concept of object type (file/directory/device/symlink…) and permissions. The result is called the file mode
(the attribute mode
in the fs_inode
above), and looks like this:
|<-- S_IFMT --->| |<-- user ->|<- group ->|<- world ->|
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| F | D | | | | | | R | W | X | R | W | X | R | W | X |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
Since it has multiple 3-bit fields, it is commonly displayed in base 8 (octal). For example, permissions allowing RWX
for everyone (rwxrwxrwx
) are encoded as 777
(an octal number). Note that, in C, octal numbers are indicated by a leading 0
, e.g., 0777
, so the expression 0777 == 511
is true.
The F
and D
bits correspond to 0100000
and 040000
, indicating if the inode is a regular file (F
bit is set) or a directory (D
bit is set).
A side note: since the demise of 36-bit computers in the early 70s, Unix permissions are about the only thing in the world that octal is used for.
There are a bunch of macros (in <sys/stat.h>
) you can use for looking at the mode; you can see the official documentation with the command man inode
, but the important ones are:
S_ISREG(m)
- is it (i.e. the inode with modem
) a regular file?S_ISDIR(m)
- is it a directory?S_IFMT
- bitmap for inode type bits.
You can get just the inode type with the expressionm & S_IFMT
, and just the permission bits with the expressionm & ~S_IFMT
. (note that~
is bitwise NOT - i.e.0
s become1
s and1
s become0
s)
Directory
Directory is a special file that holds an array of directory entries (fs_dirent
).
Each directory entry maps a human-readable name (a string) to an inode number (again, an inode number is a block id). It is defined as follows (see also fs5600.h
):
struct fs_dirent {
uint32_t valid : 1; /* the first bit of uin32_t */
uint32_t inode : 31; /* the next 31 bits of uint32_t */
char name[28]; /* with trailing NULL */
};
Each fs_dirent
is 32 bytes
, giving 128 (=4096/32)
directory entries in one dir. Why? This is a design choice of fs5600: fs5600 directories always have only one data block, meaning only the first pointer (ptrs[0]
) in a dir’s inode contains a valid block id; others are invalid (ptrs[i] == 0
where i>0
). That said, fs5600’s dir has one data block, which is 4KB
, and one direntry is 32B
, hence a dir only supports a maximum of 128 files.
A directory contains (name, inode)
pairs, which capture by fs_dirent
. In particular,
- the
valid
bit indicates if directory entries are being used (1
means yes;0
means no). - the
inode
contains a block id on disk that is where the file’s inode locates. Note: in fs5600, an inode is a block (4KB in size) and a inode number is the block id on disk. name
is the file name. The maximum name length is27 bytes
, allowing entries to always have a terminating\0
so you can use string functions (e.g.,strcmp
,strdup
) without any complications.
Appendix: fs5600 facts
Appendix has a bunch of fs5600 facts and some implementation hints. You can use them as references.
Section 3: fs5600 implementation, Part A
Before implementing anything, run:
$ make testa
...
0%: Checks: 12, Failures: 12, Errors: 0
...
And you can see that you pass 0 test in testa
.
Now, get ready to code.
Exercise 1 implement fs_init
and fs_statfs
.
- Read
fs5600.h
.- Implement
fs_init
, initializing your fs5600. If experiencing errors, thenexit(1)
; otherwise, returnNULL
.fs_statfs
infs5600.c
, reporting file system statistics. You should always return0
(in fs5600,return 0
means okay).- After implementing these two functions, you should pass 1 test:
$ make testa ... 8%: Checks: 12, Failures: 11, Errors: 0 gradetest-1.c:284:P:read_mostly:test_statvfs:0: Passed ...
Notes on testing I
- When running
make testa
, under the hood, you’re running unit tests intest1.c
. - When failing one test case, for example,
test1.c:279:F:read_mostly:test_statvfs:0: Assertion 'rv == 0' failed: rv == -95, 0 == 0
The message describes the failure with detailed information, including:
- file (
test1.c
), - line number (
279
), - failing test case (
read_mostly:test_statvfs
) - and why failing (
Assertion rv == 0
but gotrv == -95
).
- file (
- You can use
printf
orgdb
to debug (take a look atMakefile
to see which binary togdb
).
Exercise 2 implement fs_getattr
and fs_readdir
.
fs_getattr
is used to get attributes of a file/directory.fs_readdir
enumerates entries in a directory (think ofls
).- implement these two functions in
fs5600.c
, and runmake testa
. Now you should pass 5 tests:$ make testa ... 41%: Checks: 12, Failures: 7, Errors: 0
- You can ignore
struct fuse_file_info *fi
(this applies to later exercises as well); it is a FUSE data structure, see details here.- The
ctime
andmtime
in fs5600’s inodes are seconds since 1970/1/1 (see appendix timestamps). When implementinginode2stat()
, you should fill these (ctime
/mtime
) intotimespec
’stv_sec
and leavetv_nsec
to be0
.
Notes on testing II
- We use a predefined disk image named
test.img
formake testa
. - The image has known contents (see below), and you can use this information to check if your code works properly.
test.img
contents:# path uid gid mode size ctime mtime "/", 0, 0, 040777, 4096, 1565283152, 1565283167 "/file.1k", 500, 500, 0100666, 1000, 1565283152, 1565283152 "/file.10", 500, 500, 0100666, 10, 1565283152, 1565283167 "/dir-with-long-name", 0, 0, 040777, 4096, 1565283152, 1565283167 "/dir-with-long-name/file.12k+", 0, 500, 0100666, 12289, 1565283152, 1565283167 "/dir2", 500, 500, 040777, 8192, 1565283152, 1565283167 "/dir2/twenty-seven-byte-file-name", 500, 500, 0100666, 1000, 1565283152, 1565283167 "/dir2/file.4k+", 500, 500, 0100777, 4098, 1565283152, 1565283167 "/dir3", 0, 500, 040777, 4096, 1565283152, 1565283167 "/dir3/subdir", 0, 500, 040777, 4096, 1565283152, 1565283167 "/dir3/subdir/file.4k-", 500, 500, 0100666, 4095, 1565283152, 1565283167 "/dir3/subdir/file.8k-", 500, 500, 0100666, 8190, 1565283152, 1565283167 "/dir3/subdir/file.12k", 500, 500, 0100666, 12288, 1565283152, 1565283167 "/dir3/file.12k-", 0, 500, 0100777, 12287, 1565283152, 1565283167 "/file.8k+", 500, 500, 0100666, 8195, 1565283152, 1565283167
read-img.py
: This is a utility tool that prints lots of information about a disk image, which may help figure out what you messed up.- Try
python2 read-img.py test.img
(note that all python scripts in Lab5 requires Python2.7). - The “blocks used” line shows the blocks (including inodes) marked as used in the bitmap;
- the “inodes found” line shows the inodes found by recursively descending from the root.
- It can be useful for debugging some certain bugs.
- Try
Exercise 3 finish part A
- implement
fs_read
, read data from a file.- implement
fs_rename
, rename a file.- implement
fs_chmod
, change file permissions.- after implementing all these functions in
fs5600.c
, you should pass all 12 tests:$ make testa ... 100%: Checks: 12, Failures: 0, Errors: 0
- Note that both
fs_chmod
andfs_rename
will (obviously) modify files/dirs, hence the disk image. But you will not see the modifications when runningmake testa
becausemake testa
recreates the originaltest.img
every time.
Notes on testing III
- We’re using Check, a unit testing framework for C (recall
sudo apt install check
that you run). - You can add debugging code (e.g., the beloved
printf
) or more test cases intest1.c
. - Below is a crash course of the Check framework. (after reading this, you should be able to understand the code in
test1.c
.)
Crash course: Check framework
A unit test is defined and declared with the macro START_TEST
and END_TEST
as follows:
START_TEST(test_name)
{
code;
code;
}
END_TEST
Running a unit test requires to call suite_add_tcase
(see examples in main
of test1.c
):
tcase_add_test(tc, test_name);
To check if something is expected, here are several useful Check functions:
ck_abort("message"); /* fail unconditionally */
ck_assert(expr); /* fail if expression is false */
ck_assert_int_eq(i1, i2); /* check if i1 == i2; if not, alarm! */
ck_assert_int_ne/lt/le/ge/gt(i1, i2); /* not eq, less than, etc. */
ck_assert_str_eq("string1", "string2");
ck_assert_ptr_eq(ptr, NULL);
ck_assert_ptr_ne(ptr, NULL); /* assert ptr is NOT null */
Section 4: fs5600 implementation, Part B
Now, let’s run another set of tests testb
and you should see:
$ make testb
...
0%: Checks: 27, Failures: 27, Errors: 0
...
Exercise 4 implement fs_create
and fs_mkdir
.
- implement
fs_create
, creating a new (empty) filefs_create
errors:
- bad path
/a/b/c
-b
doesn’t exist (should return-ENOENT
)- bad path
/a/b/c
-b
isn’t directory (-ENOTDIR
)- bad path
/a/b/c
-c
exists, is file (-EEXIST
)- bad path
/a/b/c
-c
exists, is directory (-EEXIST
)- too-long name (more than 27 characters) (
-EINVAL
)- implement
fs_mkdir
, creating new (empty) directoryfs_mkdir
errors:
- bad path
/a/b/c
-b
doesn’t exist (-ENOENT
)- bad path
/a/b/c
-b
isn’t directory (-ENOTDIR
)- bad path
/a/b/c
-c
exists, is file (-EEXIST
)- bad path
/a/b/c
-c
exists, is directory (-EEXIST
)- too-long name (more than 27 characters) (
-EINVAL
)- after implementing, you should pass 4 cases:
$ make testb ... 14%: Checks: 27, Failures: 23, Errors: 0
- Note: we’re using a test strategy called “mocking” to deal with the
fuse_get_context
call. We provide a fake version of the function in our test code, and in each test we can fill in any information we want it to return.
Notes on testing IV
- In
make testb
(testing code intest2.c
), we change to an empty disk imagetest2.img
. - You don’t have to worry about scrabbling the disk (which you certainly did in Exercise 4) because
make testb
rebuilds the disk every time.- in particular,
make
runs$ python2 gen-disk.py -q disk2.in test2.img
- in particular,
- To test writing logic (including
create
,mkdir
, etc.), we have to trust yourreaddir
andread
implementations:test2.c
uses your Part A implementation as the ground truth.- That’s the primary reason why the assignment is explicitly split into two parts.
- That said, if you have a bug in Part A, then you can get confusing results…
- …but you shouldn’t worry too much because we have a lot of tests in
testa
… - …however, tests cannot guarantee absence of bugs…
- …but there is not much we can do about it.
A side note: we do know one way to prove absence of bugs, namely verificaition. Check out FSCQ, a formally certified crash-proof file system.
Exercise 5 implement fs_unlink
and fs_rmdir
.
- implement
fs_unlink
, removing a filefs_unlink
errors:
- bad path
/a/b/c
-b
doesn’t exist (-ENOENT
)- bad path
/a/b/c
-b
isn’t directory (-ENOTDIR
)- bad path
/a/b/c
-c
doesn’t exist (-ENOENT
)- bad path
/a/b/c
-c
is directory (-EISDIR
)- implement
fs_rmdir
, removing a directoryfs_rmdir
errors:
- bad path
/a/b/c
-b
doesn’t exist (-ENOENT
)- bad path
/a/b/c
-b
isn’t directory (-ENOTDIR
)- bad path
/a/b/c
-c
doesn’t exist (-ENOENT
)- bad path
/a/b/c
-c
is file (-ENOTDIR
)- directory not empty (
-ENOTEMPTY
)- after implementing, you should pass another 10 cases:
$ make testb ... 51%: Checks: 27, Failures: 13, Errors: 0
Exercise 6 finish Part B.
- implement
fs_write
, writing to a file- implement
fs_truncate
, deleting the contents of a file- implement
fs_utime
, changing access and modification times- after implementing, you should pass all cases:
$ make testb ... 100%: Checks: 27, Failures: 0, Errors: 0
Put it all together—mount fs5600 and use it!
After finishing Part A and Part B, your fs5600 is now usable, namely you can actually use it!
Remember the first glance of fs5600 (the Exercise 0)? Now let’s redo it.
Exercise 7 (or 0+) rerun Exercise 0.
Let’s revisit Exercise 0:
$ cd ~/lab5 $ make // you should see: cc -ggdb3 -Wall -O0 -c -o misc.o misc.c ... $ mkdir fs // create a directory to serve as a mount point $ df fs // see what file system fs is associated with Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda1 7092728 4536616 2172780 68% / $ ls fs // notice, 'fs' is empty $ ./lab5fuse -image test.img fs // mount test.img at fs $ df fs // below you should see the information of your fs5600. // this indicates that fs5600 is working. Filesystem 1K-blocks Used Available Use% Mounted on lab5fuse 1592 172 1420 11% /home/ubuntu/lab5/fs $ ls fs // you will see the files in test.img dir-with-long-name dir2 dir3 file.10 file.1k file.8k+ $ ls -l // you will see that fs is in an valid state ... drwxrwxrwx 1 root root 4096 Aug 8 2019 fs ... $ cd fs // do whatever you want! // try "mkdir", "touch", "cat", "ls", etc. // you can umount your fs5600 by: $ fusermount -u fs
Now, you can actually use your file system for day-to-day work! How exciting!!
Appendix: fs5600 facts
[A] Lab5 files and their meanings:
fs5600.h
: fs5600 data structure definitionsfs5600.c
: the core functions of fs5600 (your code here)test1.c
,test2.c
: test casesmisc.c
,lab5fuse.c
: support codegen-disk.py
,disk1.in
,disk2.in
: scripts to generates disk imagesread-img.py
: a utility tool to print image information
[B] Your jobs:
you will implement all the functionalities of fs5600, which are classified into two parts.
In Part A, you will need to implement the following functions in fs5600.c
:
fs_init
- constructor (i.e. put your init code here)fs_statfs
- report file system statisticsfs_getattr
- get attributes of a file/directoryfs_readdir
- enumerate entries in a directoryfs_read
- read data from a filefs_rename
- rename a filefs_chmod
- change file permissions
In Part B, you will need to implement these functions:
fs_create
- create a new (empty) filefs_mkdir
- create new (empty) directoryfs_unlink
- remove a filefs_rmdir
- remove a directoryfs_write
- write to a filefs_truncate
- delete the contents of a filefs_utime
- change access and modification times
[C] mode
File type constants (see also man 2 stat
)
S_IFMT
- mask for type bitsS_IFREG
- regular fileS_IFDIR
- directory
A lot of times it’s easier to use:
S_ISDIR(mode)
- true if it’s a directoryS_ISREG(mode)
- true if it’s a regular file
[D] error code
When encountering errors, you need to return error codes. The comments in fs5600.c
will tell you which errors are ought to return from which methods. Here is a list of error codes that you will use:
ENOENT
- file/dir not foundEIO
- ifblock_read
orblock_write
failENOMEM
- out of memoryENOTDIR
- not a directoryEISDIR
- is a directoryEINVAL
- invalid parameter (see comments in fs5600.c)EOPNOTSUPP
- not implemented yet
Note that you can use strerror(e)
to get a string representation of an error code. Note: you’ll need to use the positive error code for strerror()
(e.g., ENOENT
), whereas you’ll need to return the negative version in code (e.g., return -ENOENT;
).
[E] fs5600 design choices
Here are the design choices that fs5600 made:
fs5600 chooses to use a single block (block 1) as the block bitmap, which results in the maximum file system size of
128MB
.fs5600 uses a block (
4KB
) as an inode.fs5600 puts all block pointers in the inode. An inode can hold
1019
32-bit block pointers (seefs_inode
infs5600.h
), for a max file size of~4MB
.Directories are not nested more than 10 times. For example, there is no
/a/b/c/d/e/f/g/h/i/j/k
, which have a directory depth of11
. (note: the test and grading scripts will never create directories nested that deep.)Directories only have one data block (or only
inode.ptrs[0]
contains valid block ids).fs5600 does not support links.
rename
is only used within the same directory, for example,rename("/dir/f1", "/dir/f2")
, andrename("/dir1/f1", "/dir2/f2")
is invalid.truncate
is only ever called withlen=0
, meaning deleting all the data in the file.
[F] implementation hints
1. reuse code. There will be a lot of code sharing between different fs_*
functions. You should factor out the code that is used multiple times. It’s bad to duplicate the same piece of code in multiple functions (sometimes known as “copy-paste bugs”; see a research paper about this topic here). An obvious drawback is that you’ll find it hard to fix bugs—fixing one bug in some of the copies does not fix the same bug in other copies!
2. performance. Don’t worry about performance. The simplest possible implementation is okay, but an incorrect implementation is NOT okay.
3. copy paths. FUSE passes a path
argument to your functions as const char *
, instead of char *
. It means that you cannot modify the path
and you need to copy the string before operating on it (e.g., using strtok
). Typically, people use strdup
to duplicate a string.
char *your_path = strdup(path);
... // do whatever to your_path
free(your_path);
4. split paths. When walking the hierarchical structure of fs5600, you need to split paths (e.g., /a/b/c
) to tokens (e.g., a
, b
, and c
). Here are some hints:
strtok
is useful (what is this?man strtok
).- you can safely assume that the number of tokens is
<=10
because the maximum depth of nested dirs is 10. - you can safely assume that the length of each token is
<=27
because the name of a file is stored inchar name[28]
(infs_dirent
).
5. path translation Assume you’ve split your path, and you have a count (int pathc
) and an array of strings (char **pathv
). As an example, for a path /home/cs5600
, you will have pathc=2
, pathv={"home","cs5600"}
.
To find the file pointed by /home/cs5600
, you need to translate this path (i.e., pathc
and pathv
) into an inode number. You should:
- go to the root dir (the inode in block 2)
- try to find dir “home” (i.e.,
pathv[0]
) in the root dir - then, fetch the dir “/home”’s inode
- try to find file “cs5600” (i.e.,
pathsv[1]
) in the dir “/home” - finally, return the inode number of the file “cs5600”, or
-ENOENT
/-ENOTDIR
if there was an error.
6. uid
and gid
. Here is how fs5600 gets uid
and gid
:
struct fuse_context *ctx = fuse_get_context();
uint16_t uid = ctx->uid;
uint16_t gid = ctx->gid;
Since we’re not running FUSE as root, you’re only going to see your own uid
and gid
.
7. timestamps. You can get the timestamp in the proper format with time(NULL)
. Although the function returns a 64-byte integer, you can assign it to the unsigned 32-bit timestamps in the inode without worrying about overflow until year 2038 (unfortunately, CS5600 spring 2038 will meet the notorious year 2038 problem, or Y2K38 superbug).
(optional) Challenge: bigger files and encrypted fs5600
Lab5 has two challenges. Each gives you 10 extra points if you pass all our tests.
Challenge I bigger files
As mentioned earlier, fs5600 supports a maximum file size of
~4MB
. In this challenge, you will expand the maximum file size to be>=16MB
.
- We don’t dictate how to implement this, but the simplest extension is to add multiple indirect pointers (like Unix’s inode).
- Our test will write exactly
16MB
of data.
Challenge II enc-fs5600: encrypted fs5600
enc-fs5600 is a variant of fs5600 in which all file contents are encrypted. We use Caesar Cipher in Lab1 as our encryption mechanism. In particular,
- enc-fs5600 has a special key file “/key”.
- the content of “/key” is a string that represents a positive integer (namely, the Caesar cipher’s key in Lab1). As an example, “/key” may contain “123” (note: without tailing ‘\0’).
- The data written to all files (except “/key”) will be encrypted by the current key (the key in “/key” when writing).
- The data read from all files (except “/key”) will be decrypted by the current key.
- The reads and writes from and to “/key” are normal reads and writes.
- If “/key” doesn’t exist, reads and writes are normal, which is equivalent to
key=0
.- Here is an example:
// assume we're in an enc-fs5600 $ echo 1 > /key // set the key=1 $ echo "abc" > /foo // write "abc" to a file $ cat /foo // the read will automatically decrypt abc $ rm /key // remove the key (meaning key=0) $ cat /foo // print what is actually stored in the file bcd
Finally, submit your work
Submitting consists of three steps:
- Executing this checklist:
- Fill in
slack.txt
with (1) your name and NUID, (2) slack hours you used, (3) if you took the challenges, and (4) acknowledge your influences. - Make sure that your code can build without problems.
- Make sure you have added (
git add
) all files that you created. - Make sure you have committed all your changes (check with
git status
).
- Fill in
Push your code to GitHub:
$ cd ~/lab5 $ git commit -am 'submit lab5' $ git push origin Counting objects: ... .... To ssh://github.com/NEU-CS5600-labs/lab5-<username>.git 7447116..bebeebee main -> main
Actually commit your lab (with timestamp and git commit id):
Get the git commit id of your work. A commit id is a 40-character hexadecimal string. You can obtain the commit id for the last commit by running the command
git log -1 --format=oneline
.- Submit a file named
git.txt
to Canvas. (there will be an assignment for this lab on Canvas.) The filegit.txt
contains two lines: the first line is your github repo url; the second line is the git commit id that you want us to grade. Here is an example:git@github.com:NEU-CS5600-labs/lab5-<username>.git 29dfdadeadbeefe33421f242b5ddeadbeeffd3c9
Notice: the repo address must start with
git@github.com:...
(nothttps://...
). You can get your repo address on GitHub repo page by clicking the green “Code” button, then choose “SSH”. - Note: You can submit as many times as you want; we will grade the last commit id submitted to Canvas. Also, you can submit any commit id in your pushed git history; again, we will grade the commit id submitted to Canvas.
Notice: if you submit multiple times, the file name (git.txt
) changes togit-N.txt
whereN
is an integer and represents how many times you’ve submitted. We will grade the file with the largestN
.
NOTE: Ground truth is what and when you submitted to Canvas.
alertA non-existent commit id in Canvas means that you have not submitted the lab, regardless of what you have pushed to GitHub—we will not grade it. So, please double check your submitted commit id!
The time of your submission for the purposes of tracking lateness is the timestamp on Canvas, not the timestamp on GitHub.
This completes the lab.
Acknowledgments
This lab is created by Peter Desnoyers, with modifications by the CS5600 staff. FUSE introduction is borrowed from Mike Walfish’s cs202 lab instructions.