os202

Mohammad Saddam Mashuri's OS202

Project maintained by saddamonpc Hosted on GitHub Pages — Theme by mattgraham

HOME

Top 10 List of Week 03

Why hello there! I’m thankful you have the time to read my top 10 of week 3 of this course. This week topic is File System and FUSE. Please do realize that this top 10 are my opinion and may not be the same with yours.

The way I grade this top 10 list are by 3 categories:

* how important it is,

* how simple it is,

* and how relevant the topic is to the operating system course.

Please enjoy your stay here, and good luck to my other mates doing this course!

File Concept
Surprised to see File Concept to be on the top? Well, in my experiece, to understand the good stuff, you need to understand the foundation / fundamentals of something. I think, people neglect the file concept (Again, I assume this statement), which is understandable since in this modern times, people already interact with files traditionally and digitally. But what is a file really?

A file is a named collection of related information that is recorded on secondary storage. In an OS, to a user, the file is the smallest unit of storage on a computer system. The computer store information into a storage media, such as disk, and optical drives. The user then performs file operations like read and write. After that, the information stored in files are non-volatile. There types of files such as a text file, image file, source file, executable file, and etc. These files have attributes, like names, type, and size. The user usually edits the name as it is easier for users to recognize said file.
File Access Methods
How are we going to create access the filesystem, if we don’t even the best practices to do it? There are three ways to access a file into a computer system:
- Sequential Access
  This is the simplest access method. Information in the file is processed in order, one record after the other. This mode of access is by far the most common, and editors and compilers usually access the file in this way.
- Direct Access
  Also known as relative access method, this method uses a filed-length logical record that allows the program to read and write record rapidly. in no particular order. The direct access is based on the disk model of a file since disk allows random access to any file block. This was taught to us in the IDS / PSD course (If I remember correctly).
- Index sequential method
  This method is built on top of sequential access. Firstly, This method construct an index for the file. The index, like an index in the back of a book, contains the pointer to the various blocks. To find a record in the file, we first search the index and then by the help of pointer we access the file directly.
FHS: Filesystem Hierarchy Standard
I’ll admit, I was quite biased in placing this topic in the top 3. Why? Although the FHS has a lot to cover and for general users, they don’t interact with 95% of the directories there (I made up this number), for the first time researching this, I was quite fascinated that the filesystem has many directories. I’ve always thought that, “There are only ‘/root’ and some other directory in order for the system to function.” To my surprise, there are LOTS of directory that exists there. Directories such as ‘/bin’, ‘/sbin’, ‘/home’, ‘/lib’, ‘/tmp’, and much more, are inside the root filesystem. I haven’t even mentioned the other hierarchies, such as the ‘/usr’ and ‘/var’ that exists within the Linux annex.

Anyway here’s a simple definition of the FHS. Filesystem Hierarchy Standard (FHS) is the directory structure of the Linux distributions.
Directory Implementation
Choosing a selection of algorithms on the way a directory is allocated and managed is vital to see the efficiency, performance, and reliability of the file system. Usually, there are two algorithms which are used in these days.
- Linear List
  In this algorithm, all the files in a directory are maintained as a single lined list. This is the simplest method implementing a directory, but it is inefficient, as the list needs to be traversed in every case of operation.
- Hash Table
  This data structure is used to overcome the drawbacks of a linear list. A key-value pair for each file in the directory gets generated and stored in the hash table. The key can be determined by applying the hash function on the file name while the key points to the corresponding file stored in the directory.
Filesystem in Userspace (FUSE)
FUSE lets us create our own file systems without editing kernel code. THAT is interesting. Non-priveleged and priveleged users can achieve this by running file system code in user space while the FUSE module provides only a “bridge” to the actual kernel interfaces. FUSE is a free software available for LINUX, macOS, and etc.
Virtual File System (VFS)
The virtual file system (VFS) interface, also known as the v-node interface, provides a bridge between the physical and logical file systems. The purpose of a VFS is to allow client applications to access different types of concrete file systems in a uniform way. A VFS can, for example, be used to access local and network storage devices transparently without the client application noticing the difference.
Data Integrity and Protection
Hard drives are not perfect and can fail (on occasion). Modern disks will occasionally seem to be mostly working, but have trouble successfuly accessing one or more blocks. These failures are common and there are two types of errors: latent-sector errors (LSEs) and block corruption.

LSEs arise when a disk sector has been damaged in some way. In contrast, a disk block becomes corrupt in cases, when a buggy disk firmware write to a block in the wrong location.

Handling an LSE is quite straightforward, as they are easily detected. To combat this issue, some systems add an extra degree of redundancy. When an LSE is discovered during reconstruction,the extra parity helps to reconstruct the missing block. To detect corruption, which is a key problem, use a checksum. A checksum is simply the result of a function that takes a chunk of data (say a 4KB block) as input andcomputes a function over said data, producing a small summary of thecontents of the data (say 4 or 8 bytes).
Network File System (NFS)
NFS? Need For Speed? (A racing video game if you’re wondering) Nah, it’s Network file System which is a mechanism for storing files on a network. I’ll be honest with you, I rarely store files on a closed network, but it’s probably used very often in an enterprise or a company. Anyway, NFS distributes file system that allows users to access files and directories located on remote computers and treat those files and directories as if they were local. Users may use operating system commands to create, remove, read, write, and set file attributes for remote files and directories.
File Allocation Methods
This one is quite complex, bear with me. In context of operating systems, file allocation refers to managing files on disk such that disk space is effectively utilized and files are accessed quickly. In general, there are 3 types of allocation methods:
- Contiguous Allocation
  Contiguous allocation requires that each file occupy a set of contiguous blocks on the device. All blocks of the file are consecutive on disk, so deleting files leaves gaps. Although, simple and efficient in indexing, this type of allocation is difficult to grow: need to preallocate which wastes space.
- Linked Allocation
  Linked allocation solves all the problem of contiguous allocation. The blocks are linked, so they can be scattered on disk, and there is no limit on file size. However, random access is not easy.
- Indexed Allocation
  In index allocation, each file has its own index block, which is an array of storage-block addresses. The directory has a pointer to index block, random access are easy, space utilization is good.
Free-Space Management
The system keeps tracks of the free disk blocks for allocating space to files when they are created. Also, to reuse the space released from deleting the files, free space management becomes crucial. The system maintains a free space list which keeps track of the disk blocks that are not allocated to some file or directory. The free space list can be implemented mainly as:
- Bitmap or Bit vector
  A Bitmap or Bit Vector is series or collection of bits where each bit corresponds to a disk block. The bit can take two values: 0 and 1: 0 indicates that the block is allocated and 1 indicates a free block.
- Linked List
  In this approach, the free disk blocks are linked together i.e. a free block contains a pointer to the next free block. The block number of the very first disk block is stored at a separate location on disk and is also cached in memory.
- Grouping
  This approach stores the address of the free blocks in the first free block. The first free block stores the address of some, say n free blocks. Out of these n blocks, the first n-1 blocks are actually free and the last block contains the address of next free n blocks.
- Counting
  This approach stores the address of the first free disk block and a number n of free contiguous disk blocks that follow the first block. Every entry in the list would contain an address of first free disk block and a number n.

Sources:
Abraham Silberschatz - Operating System Concepts -Wiley (2018)
http://pages.cs.wisc.edu/~remzi/OSTEP/
https://en.wikipedia.org/wiki/Virtual_file_system