OSI Lab7: File System

In Lab 7, you will implement a simplified Unix-like file system. The tasks include:

initializing the file system state
managing inodes
performing file read and write operations

As always, there is not too much code to write (if you organize your code nicely), yet you need to invest time in understanding and internalizing the file system design. All your implementation will be done in library/file/rwfs.c.

Fetch lab7 from upstream repo

fetch lab7 branch

$ git fetch upstream lab7

<you should see:>
...
 * [new branch]      lab7       -> upstream/lab7

create a local lab7 branch

$ git checkout --no-track -b lab7 upstream/lab7
<you should see:>
Switched to a new branch 'lab7'

confirm you’re on local branch lab7

$ git status
<you should see:>
On branch lab7
nothing to commit, working tree clean

rebase your lab6 to the current branch
```
<you're now on branch lab7>
$ git rebase lab6
```
You need to resolve any conflict caused by the rebase. Read how to resolve conflict here.
push branch lab7 to remote repo origin
```
$ git push -u origin lab7
```

confirm everything runs as expected (note: DISK=SDCARD in Makefile):

$ make qemu
...
// everything should work
➜ /home/cs6640 %

Then, try to run fstest:

➜ /home/cs6640 % fstest
// you should run into this error message
[FATAL] rwfs is off; should not be here

note please make sure you comment out sd_test() in earth/dev_sdcard.c.

Section 1: File system design

A) Disk interface

The disk (the SD card) is abstracted as an array of blocks, where the block number serves as the array index. For example, block number 0 refers to the first block on the disk. Each block has a fixed size of 512B (BLOCK_SIZE). The system reads and writes disk blocks through a simple interface.

void block_read(int block_no, block_t* dst);
void block_write(int block_no, block_t* src);

The dst is a pointer to memory where you want to store the data from disk (for block_read); the src points to the data you want to write to disk (for block_write). Both dst and src point to type block_t, which represent a 512B block. block_no specifies the block number (starting from 0) to read/write.

The block_read() and block_write() functions rely on your SD card driver implementation from Lab6. To understand how the file system works with your SD card driver, read earth/dev_disk.c, which shows how the file system uses your driver for block-level access.

(The real implementation is more complex than this abstraction: We manually partition the SD card into a pre-existing read-only file system and your new rwfs implementation. In the code, accesses from your Lab 7 file system are offset so that they behave as if they operate on an empty disk. In reality, the first block of rwfs begins at the offset RWFS_DISK_START. See library/file/disk.h for the disk layout.)

B) File system layout

Below we introduce how your read-write file system (rwfs) uses the disk blocks, namely its layout on disk:

Block 0: superblock
Block 1: bitmap block
Blocks 2–11: inode array
Blocks 12 and onward: data blocks

                    |<-inode arr->|<-data blocks->|
   +-------+--------+-------------+---------------+
   | super | block  |  |  | ...   |  |  | ...     |
   | block | bitmap |  |  |       |  |  |         |
   +-------+--------+-------------+---------------+
block  0       1     2  3    ...11 12...

Block 0: superblock

The superblock is the first block in the file system, and contains the information needed to find the rest of the file system structures. The following C structure (see library/file/fs.h) defines the superblock:

typedef struct fs_super {
    uint magic;       /*==0x6640*/
    uint total_blks;  /* number of blocks */

    /* pad out to an entire block */
    char pad[BLOCK_SIZE - 2 * sizeof(uint)];
} super_t;

The magic should be 0x6640. The rwfs uses the magic to tell if it locates the superblock correctly.

Block 1: bitmap block

This block bitmap indicates which blocks have been used and which are free. One bit represents one block: the first bit represents the first block (block 0), the second bit represents the second block (block 1), and so forth. If a bit is 0, the corresponding block is free; otherwise, the block is used.

The bitmap is cached in a memory buffer. When initializing the rwfs, you will need to read the bitmap block into memory, go through the bitmap, and count the available blocks. You’re given a function to test the bits in bitmap:

int bit_test((void*)map, int i);

It will return non-zero values if the bit is set and return zero if the bit is 0.

Block 2–11: inode array

The inode array is an array of inodes inode_t (described below). This array is stored in the blocks starting at block 2 to block 11; ten blocks in total. Given this fixed allocation, the maximum number of inodes (i.e., files and directories) is: 10*BLOCK_SIZE/sizeof(struct fs_inode). Using a fixed-size array of inodes starting at a known block simplifies inode lookup, as you will see later.

C) Inode

Our rwfs simplifies the standard Unix inode. Here is our inode’s definition (see also fs.h):

typedef struct fs_inode {
    uint mode;
    uint size;     /* file size in bytes */
    uint pads[3];
    uint ptrs[NUM_PTRS]; /* direct data block pointer */
    uint indirect_ptr; /* indirect pointer */
} inode_t;

Here is a visualized example of an inode with 11 data blocks (so it uses the indirect_ptr):

ERROR: the figure is missing

File metadata: In inode, mode tells fs if this inode is a file or a dir and contains the permission bits. size is the size of a file. ptrs is an array of data block pointers (pointing to data blocks). indirect_ptr points to a block containing block pointers, which then point to data block. All the pointers are block numbers.

Inode numbers: Each inode represents a file or a directory. In egos, files/dirs can be uniquely identified by its inode number. By design, inode numbers are unique in a machine; that means, multiple file systems share the same inode number space with different ranges. For your fs, the inode number starts at 128 (NINODES) and ends at 128 plus the maximum number of inodes.

Note: there is a one-to-one mapping between the global inode number and the index in the inode array (stored in blocks 2–11), which we call the local inode number. Given a global inode number X (e.g., 129), the local inode number is X - 128 (i.e., 1). This value is also the index of the inode in the inode array.

Section 2: rwfs implementation

You will implement multiple fs functions in the following exercises. All your code will be in library/file/rwfs.c. The fs is mounted on /home/cs6640/rwfs.

Before that, ensure your egos runs as expected:

Turn on the rwfs: in Makefile, change the line RWFS=RWFSOFF to RWFS=RWFSON
make and run:
$ make qemu
...
// you should see:
[SUCCESS] Enter kernel process GPID_FILE
[FATAL] fs_init is not implemented
This error occurs because the initialization of the rwfs has not been implemented.

Exercise 1 Initialize rwfs

Read data structure fs_t and super_t in library/file/fs.h
Implement fs_init() in library/file/rwfs.c
After finish, you should be able to boot.
Try cd into the fs and ls:
[CRITICAL] Welcome to the egos-2k+ shell!
...
➜ /home/cs6640 % cd rwfs
...
➜ /home/cs6640/rwfs % ls
[FATAL] fs_read is not implemented

The cmd ls fails because it needs to read the contents of the directory (here, /home/cs6640/rwfs) to list files and folders under it. Next, you will implement reading files.

Exercise 2 Implementing file read

Read data structure inode_t in library/file/fs.h
Implement __load_inode(), flush_inode(), and fs_read() in library/file/rwfs.c
After finish, you should be able to run ls and cat:
➜ /home/cs6640 % (cd fs;ls)
.   ..   file1.txt   dir1
➜ /home/cs6640/rwfs % cat file1.txt
hello world
Now, try append:
➜ /home/cs6640/rwfs % append file1.txt xyz
[FATAL] fs_fsize is not implemented

The cmd append will append a given string at the end of the file. It needs to read the size of the file first, and then write the string to the end of the file.

Exercise 3 Implementing file write

Implement fs_getsize() and fs_write() in library/file/rwfs.c
After finish, try:
➜ /home/cs6640 % (cd fs;cat file1.txt;append file1.txt xyz; cat file1.txt)
...
hello world
...
hello worldxyz
you should see “xyz” is appended to file1.txt
If you successfully implement everything, you should pass all tests in fstest:
$ make qemu
...
➜ /home/cs6640 % fstest
[SUCCESS] PASS test1
[SUCCESS] PASS test2
[SUCCESS] PASS test3
[SUCCESS] PASS test4
[SUCCESS] PASS test5
[SUCCESS] PASS test6
[SUCCESS] PASS test7
[SUCCESS] PASS test8
[SUCCESS] PASS test9
[SUCCESS] PASS test10
If you fail some tests, check why by reviewing the test cases in apps/user/fstest.c.
Note: fstest expects a clean disk. make qemu should regenerate the disk image, so you don’t have to worry about unexpectedly corrupting the disk.

Finally, submit your work

Submitting consists of three steps:

Executing this checklist:
- Fill in ~/osi/egos/slack/lab7.txt.
- Make sure that your code build with no warnings.

Push your code to GitHub:

 $ cd ~/osi/egos/
 $ git commit -am 'submit lab7'
 $ git push origin lab7

 Counting objects: ...
  ....
  To github.com/NEU-CS6640-labs/egos-<YOUR_ID>.git
     c1c38e6..59c0c6e  lab7 -> lab7

Actually commit your lab (with timestamp and git commit id):
1. Get the git commit id of your work. A commit id is a 40-character hexadecimal string. You can obtain the commit id for the last commit by running the command git log -1 --format=oneline.
2. Submit a file named git.txt to Canvas. (there will be an assignment for this lab on Canvas.) The file git.txt contains two lines: the first line is your github repo url; the second line is the git commit id that you want us to grade. Here is an example:
```
 git@github.com:NEU-CS6640-labs/egos-<YOUR_ID>.git
 29dfdadeadbeefe33421f242b5dd8312732fd3c9
```
  Notice: the repo address must start with git@github.com:... (not https://...). You can get your repo address on GitHub repo page by clicking the green “Code” button, then choose “SSH”.
3. Note: You can submit as many times as you want; we will grade the last commit id submitted to Canvas. Also, you can submit any commit id in your pushed git history; again, we will grade the commit id submitted to Canvas.
  Notice: if you submit multiple times, the file name (git.txt) changes to git-N.txt where N is an integer and represents how many times you’ve submitted. We will grade the file with the largest N.

NOTE: Ground truth is what and when you submitted to Canvas.

A non-existent repo address or a non-existent commit id in Canvas means that you have not submitted the lab, regardless of what you have pushed to GitHub—we will not grade it. So, please double check your submitted repo and commit id!
The time of your submission for the purposes of tracking lateness is the timestamp on Canvas, not the timestamp on GitHub.

This completes the lab.