Link Search Menu Expand Document

CS6640 Lab6: File System

In lab6, you will implement a simplified Unix-like file system:

  • initialize file system status
  • manage inodes
  • read and write files

As always, there is not too much code to write (if you organize your code nicely), yet you need to spend some time understanding the file system design. Your code will be in file library/file/fs_rw.c.

Fetch lab6 from upstream repo

  • fetch lab6 branch
    $ git fetch upstream lab6
    
    <you should see:>
    ...
     * [new branch]      lab6       -> upstream/lab6
    
  • create a local lab6 branch
    $ git checkout --no-track -b lab6 upstream/lab6
    <you should see:>
    Switched to a new branch 'lab6'
    
  • confirm you’re on local branch lab6
    $ git status
    <you should see:>
    On branch lab6
    nothing to commit, working tree clean
    
  • rebase your previous labs to the current branch
    <you're now on branch lab6>
    $ git rebase lab5
    

    You need to resolve any conflict caused by the rebase. Read how to resolve conflict here.

  • push branch lab6 to remote repo origin
    $ git push -u origin lab6
    

Update qemu

For this lab, we need a new version of qemu with SD card simulation. To deploy the new qemu:

  1. download the new qemu:
    • for Linux and WSL, download this.
    • for MacOS, dowload this.
  2. move the downloaded file (named riscv-qemu-....tar.gz) to your cs6640 folder (~/cs6640). This is the folder containing the old qemu folder (riscv-qemu-....).
  3. unzip the downloaded files in ~/cs6640
     $ cd ~/cs6640/
     // unzip the newly downloaded file
     // this will replace the old qemu with the same name
     $ tar -zxvf riscv-qemu-....tar.gz
    

  4. Update 11/22
    • We found a bug on Nov.22. (thanks, VJ!) Make sure you see this commit: 2e7360a9405db7e12c1144f9eaac2b8a133c0674.
    • Also, add setup_identity_region(0, 0x10024000, 1); to your system process mapping (in page_table_map in earth/cpu_vm.c). The page 0x10024000 is where SD card’s memory-mapped IO is located. The system processes should have the page mapped.
  5. confirm new qemu runs:
    • turn on using SD card: in Makefile, change the line IFSD=SDOFF to IFSD=SDON.
    • make and run
    • you should observe that the booting process is slower (due to loading data from the simulated SD card), and you will see:
       ...
       [SUCCESS] Enter kernel process GPID_FILE
       [FATAL] fs_init is not implemented
      

Section 1: File system design

disk interface

The disk (an SD card) is abstracted as an array of blocks. The block number is the index in this array. For example, block number 0 is the first block on disk. The size of each block is 512B (BLOCK_SIZE). You can read/write disk by a simple interface:

int block_read(int block_no, void* dst);
int block_write(int block_no, void* src);

The dst is a pointer to memory where you want to store the data from disk (for block_read); the src points to the data you want to write to disk (for block_write). The block_no is the block number (starting from 0) you want to read/write.

File system format

The format of fs is as follows:

  • the first block (block 0) is the superblock;
  • the second block (block 1) is the bitmap block;
  • the block 2–11 are the inode array;
  • other blocks (starting from block 12) are data blocks.
                    |<-inode arr->|<-data blocks->|
   +-------+--------+-------------+---------------+
   | super | block  |  |  | ...   |  |  | ...     |
   | block | bitmap |  |  |       |  |  |         |
   +-------+--------+-------------+---------------+
block  0       1     2  3    ...11 12...

Superblock

The superblock is the first block in the file system, and contains the information needed to find the rest of the file system structures. The following C structure (see library/file/fs.h) defines the superblock:

    struct fs_super {
        m_uint32 magic;       /*==0x6640*/
        m_uint32 disk_size;   /* in blocks */

        /* pad out to an entire block */
        char pad[BLOCK_SIZE - 2 * sizeof(m_uint32)];
    };

The magic should be 0x6640. The fs uses the magic to tell if it locates the superblock correctly.

Block bitmap

This block bitmap indicates which blocks have been used and which are free. One bit represents one block: the first bit represents the first block (block 0), the second bit represents the second block (block 1), and so forth. If a bit is 0, the corresponding block is free; otherwise, the block is used.

The bitmap is cached in a memory buffer. When initializing the fs, you will need to read the bitmap block into memory, go through the bitmap, and count the available blocks. You’re given a function to test the bits in bitmap:

  int bit_test((void*)map, int i);

It will return non-zero values if the bit is set and return zero if the bit is 0.

Inodes

fs simplifies the standard Unix inode. Here is an example inode:

ERROR: the figure is missing

In particular, inode is defined as follows (see also fs.h):

struct fs_inode {
    m_uint32 mode;
    m_uint32 size;     /* file size in bytes */
    m_uint32 pads[3];
    m_uint32 ptrs[NUM_PTRS]; /* direct data block pointer */
    m_uint32 indirect_ptr; /* indirect pointer */
};

File metadata: In inode, mode tells fs if this inode is a file or a dir and contains the permission bits. size is the size of a file. ptrs is an array of data block pointers (pointing to data blocks). indirect_ptr points to a block containing block pointers, which then point to data block. All the pointers are block numbers.

Inode numbers: Each inode represents a file or a directory. In egos-2k+, files/dirs can be uniquely identified by its inode number. By design, inode numbers are unique in a machine; that means, multiple file systems share the same inode number space with different ranges. For your fs, the inode number starts at 100 (FS_ROOT_INODE) and ends at 100 plus the maximum nmber of inodes.

Note: there is a simple one-to-one mapping between inode numbers and index in inode array (block 2–11). Given an inode number X (for example, 101), the inode can be located by X-100 (1) in the inode array.

Section 2: Implementation

You will implement multiple fs functions in the following exercises. All your code will be in library/file/fs_rw.c. The fs is mounted on /home/cs6640/fs.

Exercise 1 Initialize fs

  • Read data structure fs_t and super_t in library/file/fs.h
  • Implement fs_init() in library/file/fs_rw.c
  • After finish, you should be able to boot.
  • Try cd into the fs and ls:
    [CRITICAL] Welcome to the egos-2k+ shell!
    ...
    ➜ /home/cs6640 cd fs
    ...
    ➜ /home/cs6640/fs ls
    [FATAL] fs_read is not implemented
    

The cmd ls fails because it needs to read the contents of the directory (here, /home/cs6640/fs) to list files and folders under it. Next, you will implement reading files.

Exercise 2 Implementing file read

  • Read data structure inode_t in library/file/fs.h
  • Implement load_inode(), flush_inode(), and fs_read() in library/file/fs_rw.c
  • After finish, you should be able to run ls and cat:
    ➜ /home/cs6640 (cd fs;ls)
    .   ..   file1.txt   dir1
    ➜ /home/cs6640/fs cat file1.txt
    hello world
    
  • Now, try append:
    ➜ /home/cs6640/fs append file1.txt xyz
    [FATAL] fs_fsize is not implemented
    

The cmd append will append a given string at the end of the file. It needs to read the size of the file first, and then write the string to the end of the file.

Exercise 3 Implementing file write

  • Implement fs_fsize() and fs_write() in library/file/fs_rw.c
  • After finish, try:
    ➜ /home/cs6640 (cd fs;cat file1.txt;append file1.txt xyz; cat file1.txt)
    ...
    hello world
    ...
    hello worldxyz
    
    • you should see “xyz” is appended to file1.txt
  • If you successfully implement everything, you should pass all tests in fstest:
    ➜ /home/cs6640 fstest
    [SUCCESS] PASS test1
    [SUCCESS] PASS test2
    [SUCCESS] PASS test3
    [SUCCESS] PASS test4
    [SUCCESS] PASS test5
    [SUCCESS] PASS test6
    [SUCCESS] PASS test7
    [SUCCESS] PASS test8
    [SUCCESS] PASS test9
    [SUCCESS] PASS test10
    

    If you fail some tests, you can see the test case in apps/user/fstest.c.

Finally, submit your work

Submitting consists of three steps:

  1. Executing this checklist:
    • Fill in ~/cs6640/egos/slack/lab6.txt.
    • Make sure that your code build with no warnings.
  2. Push your code to GitHub:

     $ cd ~/cs6640/egos/
     $ git commit -am 'submit lab6'
     $ git push origin lab6
    
     Counting objects: ...
      ....
      To github.com/NEU-CS6640-labs/egos-<YOUR_ID>.git
         c1c38e6..59c0c6e  lab6 -> lab6
    
  3. Actually commit your lab (with timestamp and git commit id):

    1. Get the git commit id of your work. A commit id is a 40-character hexadecimal string. You can obtain the commit id for the last commit by running the command git log -1 --format=oneline.

    2. Submit a file named git.txt to Canvas. (there will be an assignment for this lab on Canvas.) The file git.txt contains two lines: the first line is your github repo url; the second line is the git commit id that you want us to grade. Here is an example:
       git@github.com:NEU-CS6640-labs/egos-<YOUR_ID>.git
       29dfdadeadbeefe33421f242b5dd8312732fd3c9
      

      Notice: the repo address must start with git@github.com:... (not https://...). You can get your repo address on GitHub repo page by clicking the green “Code” button, then choose “SSH”.

    3. Note: You can submit as many times as you want; we will grade the last commit id submitted to Canvas. Also, you can submit any commit id in your pushed git history; again, we will grade the commit id submitted to Canvas.
      Notice: if you submit multiple times, the file name (git.txt) changes to git-N.txt where N is an integer and represents how many times you’ve submitted. We will grade the file with the largest N.

NOTE: Ground truth is what and when you submitted to Canvas.

  • A non-existent repo address or a non-existent commit id in Canvas means that you have not submitted the lab, regardless of what you have pushed to GitHub—we will not grade it. So, please double check your submitted repo and commit id!

  • The time of your submission for the purposes of tracking lateness is the timestamp on Canvas, not the timestamp on GitHub.

This completes the lab.