08, 09, 10 - Input/Output, Secondary Storage, File Systems

Problem 1: Life Cycle of an I/O Request

Describe in detail the life cycle of an I/O request such as “read”. That is, explain the entire set of steps taken from the time the user app issues the “read” system call until the system call returns with the data from the kernel. Assume that the I/O request completes successfully.

Student Solution

Answer by Lauren:

alt text

Problem 2: Device Drivers

What’s a device driver? Why is it necessary to implement a new device driver for each OS?

Student Solution

Answer by Lauren:

A device driver translates between the OS and device hardware. Every OS has its own APIs that drivers must use to interact correctly, so you need a driver to translate to that specific OS’s environment.

Elaboration (by LLM):

Device drivers are the critical bridge between operating systems and hardware. Understanding this relationship helps explain why driver support is often the limiting factor when adopting new operating systems.

What a Device Driver Does:

A device driver is a software module that acts as a translator and controller:

The driver converts:

  • High-level OS commands → Device-specific hardware commands
  • Hardware interrupts/status → OS-compatible events/callbacks

Example: Printing “Hello”

Application: printf("Hello");
      ↓
OS write() syscall
      ↓
Printer Driver receives: write 5 bytes
      ↓
Driver translates to printer protocol:
  - Set paper size register
  - Load font table
  - Send character codes
  - Monitor status registers
  - Handle mechanical delays
      ↓
Physical Printer hardware executes

Why a New Driver Per OS?

Each OS has fundamentally different:

  1. Device Access Model:

    • Linux: /dev/lp0 (file interface)
    • Windows: \\.\LPT1 (device interface)
    • macOS: IOKit framework
  2. Interrupt Handling:

    • Linux: Uses IRQ numbers, tasklets
    • Windows: Uses ISRs, DPCs
    • Real-time OS: Preemption models differ
  3. Memory Access Patterns:

    • Linux: MMU with virtual addressing
    • Embedded OS: Direct physical memory access
    • NUMA systems: Multi-socket complexities
  4. Concurrency Model:

    • Single-threaded: Driver can be simpler
    • Multi-threaded: Requires locking, atomic operations
    • Kernel-level threads vs. user threads
  5. API/Calling Conventions:

    • Function signatures differ
    • Error codes differ (Linux errno vs. Windows NTSTATUS)
    • Power management interfaces completely different

Real-World Impact:

DeviceLinuxWindowsmacOS
NVIDIA GPUSeparate driversSeparate driversSeparate drivers
WiFi Cardath10k driverQualcomm driverAppleWifi driver
PrinterCUPS + ppdWindows Print SpoolerApple Print System

Each requires independent development, testing, and certification.

API Differences Example:

Linux (interrupt handler):
irqreturn_t handler(int irq, void *dev_id) {
    // Hardware communication
    return IRQ_HANDLED;
}

Windows (interrupt handler):
BOOLEAN InterruptHandler(PKINTERRUPT Interrupt, PVOID ServiceContext) {
    // Hardware communication
    return TRUE;
}

Why This Matters:

When a new OS is released or a new device is created:

  1. Driver source code for one OS cannot be ported directly to another
  2. Core logic might be 80% the same, but the OS interface layer (20%) is completely different
  3. Driver development typically takes months per OS

Vendors’ Solution:

  • Hardware abstraction layers (HALs) reduce per-OS code
  • Open standards (USB, PCIe, SATA) reduce driver complexity
  • Driver frameworks (Windows WDF, Linux driver framework) provide templates
  • POSIX standards help minimize differences between Unix-like OSes

Key Insight:

Device drivers are why hardware doesn’t automatically “just work” on a new OS. The driver is where hardware meets OS philosophy—and those philosophies are fundamentally different.

Problem 3: Linux I/O Device Classification

Briefly explain how Linux classifies all I/O devices. Give one example of an I/O device under each category.

Student Solution

Answer by Lauren:

Character devices: Anything that’s a bytestream of characters (e.g., printers, keyboards, mice)

Block devices: Disks that store information in fixed-sized addressed blocks

Network devices: NICs like Ethernet and WiFi adapters that can deliver or accept TCP/IP packets

Elaboration (by LLM):

Linux’s device classification is elegant because it maps the fundamental I/O abstraction to different hardware characteristics. Understanding this helps explain why different devices need different system calls and interfaces.

Why This Classification Matters:

The three categories reflect how each device fundamentally differs in how data flows and is accessed.

Character Devices (Stream-Based):

Data flows as an endless sequence of characters with no fixed positions:

Keyboard: a b c a a → [handler] → Application receives: aaaa

Properties:

  • No fixed addressing (can’t ask for “byte at position 1000”)
  • Sequential access only
  • Variable-length transfers
  • Often interactive (waiting for human input)

Examples:

  • /dev/tty* – Terminal/console
  • /dev/lp0 – Printer
  • /dev/null – Null device (discards all data)
  • /dev/random – Random number generator
  • /dev/zero – Infinite stream of zeros

Key Characteristic: Once you read a byte, it’s gone.

Block Devices (Position-Based):

Data is organized in fixed-size blocks with specific addresses:

Disk: [Block 0: 512B] [Block 1: 512B] [Block 2: 512B] ...
Request: "Give me block #42"

Properties:

  • Random access – can request any block
  • Fixed block size – typically 512B, 4KB, or 8KB
  • Persistent – read the same block again, get same data
  • Seekable – can access blocks in any order

Examples:

  • /dev/sda, /dev/sdb – Hard drives
  • /dev/nvme0 – NVMe SSD
  • /dev/mmcblk0 – SD card
  • /dev/loop0 – Loopback device

Key Characteristic: Can seek to any position and read/write repeatedly.

Network Devices (Packet-Based):

Data flows as discrete packets over network protocols:

Network: [Packet 1: 1500B] [Packet 2: 1500B] [Packet 3: 512B]
         Ethernet frame → IP datagram → TCP segment

Properties:

  • Packet-oriented – discrete units, not streams
  • Bidirectional – send and receive
  • Protocol-driven – follow TCP/IP, UDP, etc.
  • Remote – data crosses networks

Examples:

  • /dev/eth0, /dev/eth1 – Ethernet interfaces
  • /dev/wlan0 – WiFi interfaces
  • /dev/tun* – VPN tunnels

Key Characteristic: Packets, not bytes; protocol-aware.

How Linux Uses This Classification:

AspectCharacterBlockNetwork
Syscallsread(), write()read(), write(), lseek()send(), recv(), select()
BufferingUnbuffered usuallyHeavily bufferedBuffered with TCP windows
AddressingSequential onlyRandom accessAddress = IP+port
Driver APITTY layer, miscBlock I/O layerNetwork stack

Why This Design?

Instead of “one interface for all devices,” Linux provides specialized interfaces matching each device type:

  • No seeking on keyboard? Character interface doesn’t support it
  • Packets on network? Use network syscalls optimized for packets
  • Random block access? Block interface is optimized for this

Practical Example: Why It Matters

# Character device (sequential)
$ cat /dev/urandom | head -c 10
# Works: reads sequentially until 10 bytes
 
# Block device (random access)
$ dd if=/dev/sda bs=512 skip=1000 count=1
# Works: seeks to block 1000, reads 1 block
 
# Can you seek on keyboard?
$ dd if=/dev/stdin bs=1 skip=1000 count=1
# Mostly fails: keyboard is sequential

Key Insight:

This classification isn’t arbitrary—it maps directly to hardware realities. Character devices are serial in nature, block devices support random access, and network devices operate at the protocol level. Knowing which type your device is tells you exactly what operations are possible and efficient.

Problem 4: Device Naming in Linux vs Windows

How does Linux “name” I/O devices? Is it different than Windows? If so, how?

LLM Solution

Linux names I/O devices in the /dev directory using descriptive names based on device type and number. For example: /dev/sda (first disk detected), /dev/sdb (second disk detected), /dev/sdc (third disk detected), etc. Each disk—whether internal, USB drive, or external hard drive—gets the next available device name in sequence based on the order it’s detected by the system. Other examples include: /dev/tty0 (terminal), /dev/eth0 (first network interface), /dev/null (null device), etc.

Windows uses a different approach with drive letters (like C: and D:) for storage devices and UNC paths for network resources, rather than a unified /dev directory structure.

Problem 5: I/O Call Semantics in Linux

Briefly explain the I/O call semantics available in Linux. If you have a single threaded application but multiple file descriptors to read from, which I/O call semantic makes more sense? Would your answer change if you have a multi-threaded application?

LLM Solution

I/O Call Semantics Available in Linux:

Linux provides several I/O semantics:

  • Blocking I/O: The call blocks until data is available or the operation completes
  • Non-blocking I/O: The call returns immediately even if data is not available
  • Multiplexing: Using select(), poll(), or epoll() to monitor multiple file descriptors simultaneously

For a single-threaded application with multiple file descriptors:

Non-blocking I/O or multiplexing (via select(), poll(), or epoll()) makes more sense. Blocking I/O would cause the thread to get stuck waiting on one descriptor while others have data ready.

For a multi-threaded application:

Blocking I/O would be the simpler option with less overhead. Different threads can block on different file descriptors simultaneously, allowing the application to handle multiple I/O sources concurrently without the complexity of non-blocking calls or multiplexing.

Problem 6: Device Independent Software Layer Functions

List 4 functions of the Device Independent Software layer.

Student Solution

Answer by Lauren:

  1. Scheduling: I/O request queues
  2. Buffering: Storing data in memory while transferring between devices
  3. Caching: Fast memory holding copy of data
  4. Process I/O state: Kernel keeps state information for processes (open files, network connections, etc.)

Elaboration (by LLM):

The Device Independent Software layer is the OS’s abstraction boundary—it hides hardware complexity while providing unified services to applications. Think of it as “what the OS guarantees” independent of which specific device you’re using.

1. Scheduling: I/O Request Queues

When multiple processes want I/O, requests can’t all happen simultaneously. The OS queues them:

Application 1: read from disk
Application 2: write to disk
Application 3: read from network
        ↓
[OS Scheduler]
        ↓
Queue: [App1 read] → [App3 read] → [App2 write]
Order optimized by disk scheduling algorithm (SCAN, LOOK, FCFS)

Why it matters:

  • Prevents I/O chaos (all devices screaming at once)
  • Optimizes order (can reorder disk requests to minimize head movement)
  • Ensures fairness (prevents process starvation)

Example: SCAN algorithm might reorder disk requests to sweep across tracks instead of thrashing back and forth.

2. Buffering: Memory-Based Staging

Data is temporarily held in RAM during transfers:

Slow Disk (ms scale) → [RAM Buffer] → Fast App (ns scale)
                       (staging area)

Why it matters:

  • Speed mismatch: Disk reads in 5-10ms, but app wants data in microseconds
  • Decouples producer/consumer: Disk keeps filling buffer while app drains it
  • Allows partial transfers: Can read disk sector-by-sector into buffer

Example: Line Buffering

printf("x");  // Writes to 1KB buffer (fast)
printf("y");  // Adds to buffer (fast)
printf("z");  // Adds to buffer (fast)
fflush();     // ONE system call sends all 3 bytes (efficient)

Without buffering: 3 system calls. With buffering: 1 system call. 3× improvement!

3. Caching: Smart Memory Reuse

The OS keeps “hot” data in fast RAM instead of re-reading from slow disk:

First access:  /home/user/file.txt → [Disk read: 5ms] → RAM cache
Second access: /home/user/file.txt → [Cache hit: 0.1μs] ← 50,000× faster!

Why it matters:

  • Most programs reuse recent data (temporal locality)
  • Cache hits are nearly free (nanoseconds vs. milliseconds)
  • Dramatically improves performance for repeated access

Example:

$ time cat file.txt  # First time: 100ms (disk read)
$ time cat file.txt  # Second time: 1ms (cache hit)

Real-world: OS caches 30-50% of RAM for frequently accessed files.

4. Process I/O State: Bookkeeping

OS maintains state for each process’s I/O:

Process A:
  - File descriptor 3 → /home/user/data.txt (opened for reading)
  - Current file position: byte 1024
  - Outstanding I/O: 1 pending read request
  - Buffered data: 256 bytes waiting in buffer

Process B:
  - File descriptor 7 → /dev/sda (disk device)
  - Current position: sector 50000
  - Outstanding I/O: 3 pending requests
  - Error state: none

Why it matters:

  • Tracks which files processes have open
  • Maintains file positions (so each process has independent “file pointer”)
  • Remembers pending I/O (what’s in flight)
  • Enables proper cleanup (close all files on process exit)

How They Work Together:

Key Insight:

All four functions work together to hide the messy reality of hardware from applications. Without them:

  • Apps would deadlock waiting for I/O
  • Every duplicate read would hit slow disk
  • Files couldn’t have independent positions per process
  • System would be chaotic and slow

The Device Independent layer is “what makes the OS an OS”—not just passing requests through, but intelligently managing them.

Problem 7: High-level I/O Libraries vs System Calls

Given that the OS already provides “read” and “write” system calls to read from/write to an I/O device, why do most programmers still prefer using a high-level I/O library to perform I/O? Briefly explain.

Student Solution

Answer by Lauren:

High-level I/O libraries help to format I/O (like printf) and also buffer data before making system calls to improve performance. This reduces the overhead of frequent system calls and provides more convenient functionality.

Elaboration (by LLM):

This gets at a fundamental systems design principle: system calls are expensive, so minimize them. Understanding why requires looking at the cost of crossing the user-kernel boundary.

The Cost of System Calls

System calls switch CPU from user mode to kernel mode—a context switch with significant overhead:

Direct Hardware Access: 1 ns
Read from L1 cache:     0.5 ns
Read from RAM:          100 ns
System call:            1,000-10,000 ns ← Why? See below

Why System Calls Are Expensive (Simplified):

  1. Context switch (save/restore registers, memory state): ~1,000 ns
  2. Permission checks (is this process allowed?): ~100 ns
  3. Mode transition (kernel mode vs. user mode): ~500 ns
  4. Cache invalidation (security: clear sensitive data): ~500 ns
  5. Return to user mode (restore state): ~500 ns

Total: ~3,000-5,000 ns per system call

The Problem Without Buffering

// Bad: Direct system calls
for (int i = 0; i < 100; i++) {
    write(fd, "x", 1);  // 1 char = 1 syscall
}
// Cost: 100 syscalls × 5,000 ns = 500,000 ns = 0.5 ms

The Solution: Buffering

// Good: Buffered I/O
char buffer[1024];
int pos = 0;
for (int i = 0; i < 100; i++) {
    buffer[pos++] = 'x';
    if (pos >= 1024) {
        write(fd, buffer, 1024);  // 1 syscall for 1024 chars
        pos = 0;
    }
}
// Cost: 1 syscall × 5,000 ns ≈ 0.005 ms ← 100× faster!

Real-World Example: printf

// Without buffering (hypothetical direct syscalls)
printf("a");  // syscall
printf("b");  // syscall
printf("c");  // syscall
// Cost: 3 syscalls
 
// With standard buffering (typical case)
printf("a");  // buffered
printf("b");  // buffered
printf("c");  // buffered
printf("\n"); // Forces flush, 1 syscall
// Cost: 1 syscall ← 3× improvement

Formatting Benefits

Beyond performance, high-level libraries provide convenience:

// What you write:
printf("User %d logged in at %s", user_id, time_str);
 
// What the system call library does:
// 1. Format the string ("%d" → convert int to string, etc.)
// 2. Build output: "User 42 logged in at 14:30:00"
// 3. Buffer it
// 4. Maybe flush if full or newline encountered
// 5. Call write() syscall once
 
// Equivalent with raw syscalls would be:
char buffer[256];
sprintf(buffer, "User %d logged in at %s", user_id, time_str);
write(fd, buffer, strlen(buffer));
// Much more tedious!

Why Not Just Use System Calls?

ApproachPerformanceCode Complexity
Direct syscallsSlow (many calls)Simple but tedious
Buffered libraryFast (batched calls)Convenient, automatic
Manual bufferingFast (batched calls)Complex, error-prone

The Buffering Trade-offs

Advantage: 10-100× performance improvement Disadvantage: Data sits in buffer until flushed (latency)

printf("Processing...");  // Buffered, not yet written
// User sees nothing until buffer fills or \n arrives
fflush(stdout);  // Force write, user sees text immediately

When Buffering Matters

  • Text output: Buffered by default (line buffering)
  • Binary files: Often fully buffered (larger buffer)
  • Interactive: Line buffering (flush on newline)
  • Pipes: Fully buffered (no newlines to trigger flush)

Example: Why Interactive Programs Need Line Buffering

# Line-buffered output (default for terminals)
$ ./program
Please enter name: [user waits here—prompt visible]
 
# Fully buffered output (if stdout is piped)
$ ./program | cat
[nothing visible until buffer fills, user confused!]

Modern Complications

Modern systems add more layers:

  1. Application buffering (your code)
  2. libc buffering (C library)
  3. Kernel buffering (page cache)
  4. Disk controller buffering (hardware cache)

Each layer trades latency for throughput.

Key Insight:

System calls are the “expensive bridge” between user programs and kernel services. Any chance to cross the bridge fewer times is worth it. This is why buffering is ubiquitous—it’s one of the simplest, most effective performance optimizations in systems.

This principle extends beyond I/O:

  • Batching: collect operations, execute once
  • Coalescing: combine multiple requests into one
  • Queuing: accumulate work, process in bulk

All are variations on “fewer expensive transitions = better performance.”

Problem 8: Cylinder Definition in Disks

What’s is a cylinder (as it applies to disks). Briefly explain.

LLM Solution

Cylinder Definition:

A cylinder in disk terminology is the set of tracks with the same track number on all surfaces (platters) of a multi-platter hard disk.

Physical Structure:

In a hard disk with multiple spinning platters and read/write heads:

  • Each platter has concentric circular tracks (like rings on a tree)
  • Each head reads/writes from one track on one platter
  • When all heads are positioned at the same distance from the center, they form a cylinder

Example with 3 Platters:

Platter 1: Track 100 ←─┐
Platter 2: Track 100 ←─┼─ Cylinder 100
Platter 3: Track 100 ←─┘

Platter 1: Track 101 ←─┐
Platter 2: Track 101 ←─┼─ Cylinder 101
Platter 3: Track 101 ←─┘

Why Cylinders Matter:

  • Seek Time Reduction: All heads move together. Once heads reach a cylinder, all tracks in that cylinder can be accessed without additional head movement
  • Performance: Accessing all data in a cylinder is fastest since only one seek is needed
  • Disk Scheduling: Many scheduling algorithms (SCAN, C-SCAN) minimize head movement by accessing cylinders sequentially

Advantages of Cylinder Access:

  • Data layout can take advantage of cylinder structure
  • Minimize expensive mechanical head positioning

Problem 9: Rotational Latency

What’s rotational latency (as it applies to disks). Briefly explain.

LLM Solution

Rotational Latency Definition:

Rotational latency (also called rotational delay) is the time it takes for the desired sector of a rotating disk platter to move under the read/write head after the head has been positioned over the correct track.

Components of Disk Access Time:

Total Access Time = Seek Time + Rotational Latency + Transfer Time

Example with 7200 RPM disk:
- Seek time: 5 ms (head movement)
- Rotational latency: 0-4.17 ms (sector positioning)
- Transfer time: 0.1 ms (reading data)

How Rotational Latency Works:

  1. Head positions at correct track (seek time passes)
  2. Disk continues rotating
  3. Desired sector gradually moves under the head
  4. When sector arrives, reading/writing begins

Average Rotational Latency:

In worst case, need to wait full rotation. Average = half rotation:

For 7200 RPM: Rotation time = 60/7200 = 8.33 ms Average latency = 8.33/2 = 4.17 ms

Key Points:

  • Unavoidable for mechanical disks (sectors don’t move instantly)
  • Depends on disk rotation speed (RPM)
  • Average = 50% of one full rotation
  • Faster rotating disks have lower rotational latency
  • Cannot be eliminated, only minimized by increasing RPM

Problem 10: Average Rotational Latency at 7200 RPM

What’s the average rotational latency for a disk that rotates at 7200rpm?

LLM Solution

Calculation:

RPM = 7200 revolutions per minute

Step 1: Convert to rotations per second

Rotations per second = 7200 / 60 = 120 revolutions/second

Step 2: Calculate time for one full rotation

Time for one rotation = 1 / 120 = 0.00833 seconds = 8.33 milliseconds

Step 3: Calculate average rotational latency

Average = (1/2) × rotation time = 0.5 × 8.33 ms = 4.17 ms

Explanation:

Average rotational latency is half of one full rotation because:

  • Best case: Desired sector is already under the head (0 ms wait)
  • Worst case: Desired sector just passed the head (full rotation needed = 8.33 ms)
  • Average: Half the disk must rotate on average = 4.17 ms

Answer: 4.17 milliseconds (or approximately 4.2 ms)

Problem 11: Seek Time in Disks

What’s seek time (as it applies to disk)? Briefly explain.

LLM Solution

Seek Time Definition:

Seek time is the time required for the read/write head of a hard disk to move from its current position to a new track where data is located.

Physical Process:

  1. Current head position: Track X
  2. Requested data on: Track Y
  3. Head moves radially inward or outward
  4. Seeks to correct track Y
  5. Stabilizes and begins reading/writing

Seek Time Components:

Total Seek Time = Start time + Acceleration + Travel + Deceleration + Settling time

Typical seek time: 2-15 milliseconds (depends on distance)
Average seek time: ~5-7 milliseconds (for random access)

Factors Affecting Seek Time:

  • Distance: Longer distance = longer seek time
  • Track density: More tightly packed tracks = potentially longer seek
  • Head mechanics: Acceleration/deceleration capabilities
  • Settling time: Time for head to stabilize on target track

Example with 7200 RPM disk:

Minimum seek (adjacent track): ~0.5 ms
Full seek (track 0 to last track): ~14 ms
Average seek: ~7 ms (for random access across disk)

Why Seek Time Matters:

  • Major component of total disk access time
  • Total Access = Seek + Rotational Latency + Transfer
  • Seeking is the slowest component of disk I/O
  • Disk scheduling algorithms focus on minimizing total seek time

Problem 12: SSDs vs Hard Disks

Briefly explain the pros and cons of using Non-Volatile Memory (SSDs) compared to hard disks (HDDs)?

LLM Solution

Solid State Drives (SSDs) vs Hard Disk Drives (HDDs):

AspectSSDsHDDs
Speed✅ Much faster (100-550 MB/s)Slower (100-200 MB/s)
Seek Time✅ Minimal (<1 ms)Significant (5-10 ms)
Power✅ Lower power consumptionHigher power (moving parts)
Noise✅ Silent (no moving parts)Audible (mechanical)
Durability✅ No moving parts (more robust)Mechanical failure risk
Cost❌ Expensive per GB✅ Cheaper per GB
Capacity❌ Limited (1-8 TB common)✅ Larger (2-20+ TB)
Lifespan❌ Limited write cycles (wear out)✅ Longer lifespan (mechanical wear)
Write Amplification❌ 2-10x overhead✅ 1x (direct write)
Fragmentation✅ No impact on speed❌ Degrades performance

Advantages of SSDs:

  1. Performance: 10-100x faster than HDDs for random access
  2. Reliability: No mechanical failure risk from drops/vibration
  3. Energy: Lower power consumption (mobile devices)
  4. Noise: Silent operation
  5. Responsiveness: Superior system responsiveness

Disadvantages of SSDs:

  1. Cost: 5-10x more expensive per gigabyte
  2. Capacity: Maximum size smaller than HDDs
  3. Write Endurance: Limited write cycles (100,000-1,000,000)
  4. Wear Leveling: Complex firmware needed
  5. Full Drive: Performance degrades when full
  6. Write Amplification: Physical writes exceed logical writes (firmware overhead)

Disadvantages of HDDs:

  1. Speed: Much slower random access
  2. Mechanical Issues: Moving parts can fail
  3. Noise: Audible operation
  4. Power: Higher consumption
  5. Fragmentation: Impacts performance significantly

Best Use Cases:

SSDs: OS drives, databases, performance-critical applications, laptops, mobile devices

HDDs: Large-capacity storage, archival data, backup systems, cost-sensitive applications

Current Trend: SSDs for primary storage, HDDs for secondary/backup storage (hybrid approach)

Problem 13: SSD Physical Structure

Briefly describe the physical structure of an SSD. That is, what is a page, what is a block as they apply to SSDs. How does an SSD perform the “modify” operation?

LLM Solution

SSD Physical Structure:

Basic Components:

SSDs use NAND flash memory organized in a hierarchical structure:

SSD Memory Hierarchy:

Flash Memory Array
├── Die (4-8 dies per package)
│   ├── Plane (2-4 planes per die)
│   │   ├── Block (128-512 MB per block)
│   │   │   ├── Page (4-16 KB per page)
│   │   │   ├── Page
│   │   │   └── ...
│   │   ├── Block
│   │   └── ...

Page:

  • Smallest readable/writable unit (4 KB - 16 KB typical)
  • Can be read and written
  • Multiple pages per block
  • Basic unit for data access

Block:

  • Smallest erasable unit (128 KB - 4 MB typical)
  • Contains 128-256 pages
  • Must erase entire block before rewriting
  • This asymmetry causes “write amplification”

The Modify Problem:

Unlike HDDs, SSDs cannot directly overwrite data. Modifying a page requires:

Traditional Approach (Inefficient):

1. Mark old page as invalid
2. Write new page to empty page
3. Erase old block (if empty)

This causes write amplification (writing more than necessary).

SSD Modify Operation:

Original Block:
┌─────────────────────────────────┐
│ Page 0: [Valid data]            │
│ Page 1: [Data to modify]        │  Read
│ Page 2: [Valid data]            │
│ ...                             │
└─────────────────────────────────┘
                    ↓
        Read entire block into buffer
                    ↓
         Modify target page in buffer
                    ↓
        Write modified block to new location
                    ↓
     Mark original block for garbage collection
                    ↓
┌─────────────────────────────────┐
│ Page 0: [Valid data]            │
│ Page 1: [Modified data]  ← New  │  Write
│ Page 2: [Valid data]            │
│ Page 3: [Empty]                 │
│ ...                             │
└─────────────────────────────────┘

Key Points:

  1. Read: Entire block must be read from NAND flash
  2. Modify: Change target page in RAM buffer
  3. Write: Entire modified block written to new location
  4. Erase: Old block marked for garbage collection, erased later
  5. Result: All pages in block rewritten (write amplification)

Write Amplification Factor (WAF):

If modifying 1 page requires rewriting entire block of 256 pages:

  • WAF = 256:1 (256 physical writes for 1 logical write)
  • Actually: WAF = 2-10 in practice due to optimization

Problem 14: Flash Translation Layer in SSDs

What’s the purpose of the Flash Translation Layer (FTL) in SSDs? How does the SSD update the FTL during garbage collection?

LLM Solution

Flash Translation Layer (FTL) Purpose:

The FTL is firmware that manages the SSD at a low level. It acts as an intermediary between the operating system and raw flash memory.

Key Responsibilities:

  1. Logical to Physical Address Mapping

    • Maps logical block addresses (LBAs) from OS to physical NAND locations
    • OS thinks it’s writing to sequential locations (LBA 0, 1, 2, …)
    • FTL translates to actual NAND blocks (scattered locations)
    • Hides wear-out patterns from OS
  2. Wear Leveling

    • Distributes writes evenly across all NAND blocks
    • Prevents some blocks from wearing out faster
    • Extends SSD lifespan
  3. Bad Block Management

    • Tracks and avoids defective blocks
    • Reallocates to good blocks
  4. Garbage Collection

    • Reclaims space from invalidated pages
    • Erases empty blocks
    • Manages invalid page lists

FTL Structure:

┌─────────────────────────────────────┐
│  Mapping Table (FTL)               │
├─────────────────┬─────────────────┤
│ Logical Address │ Physical Address│
├─────────────────┼─────────────────┤
│ LBA 0           │ Block 47, Page 3│
│ LBA 1           │ Block 12, Page 8│
│ LBA 2           │ Block 89, Page 0│
│ ...             │ ...             │
└─────────────────┴─────────────────┘

FTL Update During Garbage Collection:

Scenario: Block 47 has invalid pages, needs garbage collection

Step 1: Identify Block for Garbage Collection
┌─────────────────────────────────┐
│ Block 47 (Candidate)            │
├─────────────────────────────────┤
│ Page 0: Invalid ❌               │
│ Page 1: Invalid ❌               │
│ Page 2: Valid (LBA 100) ✅       │
│ Page 3: Valid (LBA 0) ✅         │
│ ...                             │
└─────────────────────────────────┘

Step 2: Read Valid Pages into Buffer
┌──────────────────┐
│ Buffer           │
├──────────────────┤
│ LBA 100 data     │
│ LBA 0 data       │
└──────────────────┘

Step 3: Write Valid Pages to New Block
┌─────────────────────────────────┐
│ Block 55 (New, Empty)           │
├─────────────────────────────────┤
│ Page 0: LBA 100 data            │
│ Page 1: LBA 0 data              │
│ Page 2-255: Empty               │
└─────────────────────────────────┘

Step 4: Update FTL Mapping Table
OLD Mapping:
│ LBA 100 → Block 47, Page 2      │
│ LBA 0   → Block 47, Page 3      │

NEW Mapping:
│ LBA 100 → Block 55, Page 0      │  ← Updated
│ LBA 0   → Block 55, Page 1      │  ← Updated

Step 5: Mark Old Block for Erasure
Block 47: Mark as "Free" (all pages invalid now)
         Schedule for erase operation
         After erase: Available for new writes

Step 6: Update Valid Page Counts
Block 47: Valid pages: 2 → 0 (mark for erase)
Block 55: Valid pages: 0 → 2 (new data)

FTL Optimization:

  • Lazy Erase: Don’t erase immediately; queue for background erase
  • Batch Updates: Group FTL updates to reduce overhead
  • Incremental GC: Collect one block at a time vs. all blocks
  • Hotness Tracking: Separate hot/cold data to reduce GC impact

Implications:

  • FTL adds firmware overhead but essential for performance
  • Complex mapping table requires significant RAM
  • GC can cause write amplification (excessive NAND writes)
  • GC operations may cause latency spikes

Problem 15: TRIM Command

Briefly describe the role of the TRIM command sent from the OS to an SSD.

LLM Solution

TRIM Command Purpose:

The TRIM command allows the operating system to inform the SSD which data blocks are no longer needed, enabling the SSD firmware to proactively manage garbage collection and maintain performance.

Why TRIM is Needed:

Without TRIM:

File System Level:              SSD FTL Level:
┌──────────────────┐           ┌──────────────────┐
│ File deleted     │           │ Pages marked     │
│ from directory   │           │ "valid" still    │
│ (OS think it's   │           │                  │
│ gone)            │    ╳      │ SSD doesn't know │
│                  │           │ data is no longer│
│                  │           │ needed!          │
└──────────────────┘           └──────────────────┘

Result: SSD keeps "invalid" data until overwritten
        Wastes space, impacts GC performance

With TRIM:

File System Level:              SSD FTL Level:
┌──────────────────┐           ┌──────────────────┐
│ File deleted     │           │ Pages marked     │
│ from directory   │           │ "invalid"        │
│                  │           │                  │
│ TRIM command:    │    ✅       │ SSD knows data   │
│ "Blocks X-Y      │──────────→ │ can be garbage  │
│  are no longer   │           │ collected!      │
│  needed"         │           │                  │
└──────────────────┘           └──────────────────┘

Result: SSD can immediately reclaim space
        Better GC planning
        Improved long-term performance

TRIM Operations:

  1. OS sends TRIM command specifying LBAs (logical block addresses) of deleted data
  2. SSD FTL marks pages as invalid (not waiting for overwrite)
  3. Garbage collection can immediately use these blocks
  4. Performance maintained - avoid write amplification

Performance Impact:

Without TRIM (SSD fills up):

Days 1-5:   Fast performance (lots of free space)
Days 10-15: Slower performance (less free space for GC)
Days 20+:   Very slow (FTL must work harder)

With TRIM (enabled on Windows/Linux):

Days 1-5:   Fast performance
Days 10-15: Fast performance (space maintained)
Days 20+:   Fast performance (maintained)

TRIM Support:

  • Operating Systems: Windows Vista+, Linux kernel 2.6.33+, macOS 10.6.8+
  • File Systems: NTFS, ext4, APFS, HFS+
  • SSDs: Most modern SSDs support TRIM
  • Command: fstrim (Linux), automatic (Windows, macOS)

Historical Note:

Before TRIM, SSDs experienced significant performance degradation over time as deleted blocks couldn’t be distinguished from valid data. TRIM was crucial for making SSDs practical for consumer use.

Problem 16: Write Leveling in SSDs

What’s “write leveling” as it applies to SSDs and why is it needed?

LLM Solution

Write Leveling (Wear Leveling) Definition:

Write leveling is a technique used by SSD firmware to distribute write/erase operations evenly across all NAND flash blocks to prevent premature wear-out of specific blocks.

Why Write Leveling is Needed:

NAND flash cells have a finite number of write/erase cycles before they fail:

  • Typical endurance: 10,000 - 1,000,000 cycles per cell
  • Different blocks wear at different rates if writes are concentrated

Problem Without Wear Leveling:

SSD with Naive Write Strategy:

Block 0:  [Written 500,000 times] ← WORN OUT!
Block 1:  [Written 500,000 times] ← WORN OUT!
Block 2:  [Written 2,000 times]   ← Fresh
Block 3:  [Written 1,500 times]   ← Fresh
Block 4:  [Written 2,100 times]   ← Fresh
...

Result: Blocks 0-1 fail after 1 year
        Blocks 2-4 still have 99% lifespan left
        SSD dies prematurely!

Solution With Wear Leveling:

SSD with Write Leveling Enabled:

Block 0:  [Written 100,000 times]  ← Balanced
Block 1:  [Written 101,000 times]  ← Balanced
Block 2:  [Written 99,500 times]   ← Balanced
Block 3:  [Written 100,200 times]  ← Balanced
Block 4:  [Written 100,800 times]  ← Balanced
...

Result: All blocks wear evenly
        SSD lasts full rated lifespan

How Wear Leveling Works:

  1. Track Block Age: FTL maintains counter for write cycles per block

  2. Dynamic Data Movement:

    • When hot data (frequently written) exists on old block
    • Move it to younger block to balance wear
    • Update FTL mapping table
  3. Static Data Rotation:

    • Even static (rarely written) data gets moved periodically
    • Prevents some blocks from staying fresh
  4. Wear Counter Update:

    When writing to Block X:
    wear_count[X] += 1
    
    If wear_count[X] is significantly higher than average:
        - Move fresh data to Block X
        - Move old data to younger blocks
    

Example:

Block Age: 0-100,000 writes

Initial:
Block 0:  [Hot data, 95,000 writes] ← Nearly worn out
Block 1:  [Cold data, 5,000 writes] ← Very fresh

After Wear Leveling:
Block 0:  [Cold data, 95,000 writes] ← Stop using for writes
Block 1:  [Hot data, 5,000 writes]   ← Can handle more writes

Result: Extends life of both blocks

Techniques:

  1. Dynamic Wear Leveling:

    • Only moves hot data when wear discrepancy detected
    • Lower overhead, less GC activity
  2. Static Wear Leveling:

    • Periodically redistributes all data
    • More balanced wear, more overhead
  3. Hybrid Wear Leveling:

    • Dynamic for hot data + periodic static rotations
    • Best balance for performance and longevity

Performance Implications:

  • Background Operations: GC + wear leveling run during idle time
  • Write Amplification: Extra moves increase physical writes
  • Lifespan Trade-off: Better longevity requires slightly slower writes
  • OS Support: TRIM helps by freeing space, reducing data movement

Example Lifespan Impact:

Without wear leveling:     2-3 years (early failure)
With wear leveling:        5-10 years (rated spec)

Problem 17: Device Independent Layer vs File System

Given that the device independent layer allows a disk to be read in blocks, why is it still necessary to implement a file system?

LLM Solution

What the Device Independent Layer Provides:

The device independent layer (or block I/O layer) allows:

  • Reading/writing fixed-size blocks (512B - 4KB)
  • Block-level abstraction over raw disk hardware
  • Buffering, caching, disk scheduling

Why This Isn’t Enough - File System Necessity:

The device layer only handles where data goes physically; it doesn’t handle how users organize their data.

Problem Without File System:

User wants to:
1. Create a file "document.txt"
2. Store it somewhere on disk
3. Retrieve it later by name
4. Know how much space used
5. Prevent unauthorized access

Device layer can only do:
- Write block 512 to disk ✅
- Read block 512 from disk ✅
- But WHERE is my file? ❌
- WHAT data is mine? ❌
- HOW do I find "document.txt"? ❌

File System Responsibilities (Beyond Block Layer):

FunctionDevice LayerFile System
Read/write blocks(uses device layer)
Organize data logically
Name-based access
Directory hierarchy
File metadata (size, permissions)
Space allocation
Access control
Free space management

File System Layers:

┌─────────────────────────────────┐
│ User Application                │ "Save document.txt"
├─────────────────────────────────┤
│ File System (ext4, NTFS, APFS)  │ Maps filename → blocks
├─────────────────────────────────┤
│ Device Independent Layer (VFS)  │ Read/write blocks
├─────────────────────────────────┤
│ Device Drivers                  │ Hardware-specific
├─────────────────────────────────┤
│ Physical Disk Hardware          │ Raw NAND/magnetism
└─────────────────────────────────┘

Example: Writing a File

Without File System (raw blocks only):

User: "Save this 5KB file"
System: "I can write blocks to disk, but:
         - Which blocks? (1000-1003 available, but what about other files?)
         - How to find it later?
         - How to prevent accidental overwrite?
         → Chaos!"

With File System:

User: "Save to document.txt"
FS:   1. Allocate blocks (1000-1003) for this file
      2. Create inode with metadata
      3. Create directory entry mapping "document.txt" → inode
      4. Update bitmap of free blocks
      5. Write data blocks using device layer
      6. Track file size (5KB), permissions, dates
      → Organized and manageable!

Why File System is Essential:

  1. Naming: Users think in terms of filenames, not block numbers
  2. Organization: Hierarchical directories organize files logically
  3. Metadata: Permissions, timestamps, file size
  4. Allocation: Which blocks belong to which file?
  5. Sharing: Multiple users, multiple files without conflicts
  6. Durability: Recovery from crashes

Analogy:

Device Layer:    Like storage bins with numbers (Bin 1, Bin 2, ...)
File System:     Like a postal delivery system with addresses and names
                 Maps "123 Main St" → Bin 47

Without postal system, you can store things in bins, but finding them later is impossible.

Problem 18: Shortest Seek Time First Scheduling

Assume that the disk head is currently over cylinder 20 and you have disk requests for the following cylinders: 35, 22, 15, 55, 72, 8. In what order would you serve these requests if you were to employ Shortest Seek Time First (SSTF) scheduling algorithm?

Problem 19: C-SCAN Disk Scheduling

Suppose that a disk drive has 5000 cylinders, numbered 0 to 4999. The drive is currently serving a request at cylinder 143, and the previous request was at cylinder 125. The queue of pending requests, in FIFO order, is 86, 1470, 913, 1774, 948, 1509, 1022, 1750, 130. In what order would you serve these requests if you were to employ C-SCAN scheduling algorithm.

Problem 20: Disk Scheduling Algorithms

Assume that the disk request queue consists of the following disk requests each specified by the cylinder number: 111, 97, 123, 90, 102. Further assume that the disk arm is currently located over cylinder number 100. In what order would the requests be scheduled by this File System if the File System employs:

(a) First Come First Served (FCFS) policy?

(b) Shortest Seek Time First (SSTF) policy?

(c) SCAN policy? Assume the disk head is moving towards increasing cylinder numbers.

(d) C-SCAN policy? Assume the disk head is moving towards increasing cylinder numbers.

Problem 21: SSTF and C-SCAN Scheduling

Consider a disk with the disk head currently located on cylinder number 100 and moving outside (towards bigger numbered cylinder numbers). Assume that the current disk requests are for the following cylinders of the disk: 30, 12, 150, 120, 105, 160, 55, 77, 88. For the following disk scheduling algorithms, show in what order the requests will be served.

(a) Shortest Seek Time First

(b) C-SCAN

Problem 22: Disk Scheduling with Multiple Algorithms

Assume that the disk request queue consists of the following disk requests specified by a cylinder number: 45, 24, 123, 56, 124, 100, 80. Further assume that the disk arm is currently located over the cylinder number 88. In what order would the requests be scheduled by this file system if the file system employs:

(a) First Come First Served (FCFS) policy?

(b) Shortest Seek Time First (SSTF) policy?

(c) SCAN policy? Assume the disk head is moving towards increasing cylinder numbers.

(d) C-SCAN policy? Assume the disk head is moving towards increasing cylinder numbers.

Problem 23: SSD Lifespan Estimation

An SSD has the following characteristics:

(a) Write amplification factor (WAF): 2, i.e., each write causes a total of 2 writes on the SSD

(b) Endurance: 1000 write cycles per block

(c) Total capacity: 500 GB

(d) A workload writes 100 GB of data to the SSD every day.

Estimate the SSD’s lifespan in years, assuming all blocks are written evenly (wear leveling is perfect)

Instructor Solution

Total writes per day: 2x100 GB = 200 GB.

Total endurance: 500GBx1000cycles = 500000 GB

Lifespan: (Total endurance)/(Total writes per day) = 500x 103GB/200GB/day = 2500 days ~ 6.85 years

Elaboration (by LLM):

This problem ties together several SSD concepts: write amplification, endurance ratings, and wear leveling. Let me walk you through the reasoning step-by-step.

Step 1: Understanding Write Amplification

Write amplification factor (WAF) = 2 means:

  • Logical write: 100 GB (what the OS thinks it’s writing)
  • Physical write: 100 GB × 2 = 200 GB (what actually gets written to NAND flash)

Why? Because SSDs can’t overwrite directly. They must:

  1. Read entire block into buffer
  2. Modify data
  3. Write entire block to new location
  4. Mark old block for garbage collection

This causes more physical writes than logical writes.

Step 2: Calculate Total Physical Writes Per Day

Step 3: Calculate Total Endurance

The SSD can sustain a total number of writes before wearing out:

Why “cycles per block”? Each block in the SSD can be written to ~1000 times before the NAND flash cells degrade. With wear leveling (perfect distribution), all 500 GB worth of blocks get equal wear, so the total is capacity × endurance.

Step 4: Calculate Lifespan

Convert to Years:

Real-World Context:

This is the theoretical maximum lifespan assuming:

  • ✅ Perfect wear leveling (all blocks wear evenly)
  • ✅ 100 GB writes every single day (weekends, holidays, too)
  • ✅ No blocks fail before endurance limit
  • ❌ Reality: Typically 5-7 years for consumer SSDs with similar specs

Why the Real Lifespan is Often Better:

  1. Not all days have 100 GB writes - average user writes 10-50 GB/day
  2. NAND flash is conservative - rated 1000 cycles but often lasts longer
  3. SSD controller has error correction - can extend life beyond minimum
  4. Thermal factors - cooler operation extends lifespan

Key Insight:

SSD lifespan is fundamentally about total data written divided by per-block endurance. This is why:

  • High write workloads (databases, servers) wear out SSDs faster
  • Consumer use (mostly reading) extends lifespan
  • WAF matters - higher WAF reduces lifespan proportionally

Practical Example Comparisons:

WorkloadWrites/DayWAFSSD Lifespan
Light (web browsing)10 GB2~68 years
Medium (office work)50 GB2~14 years
Heavy (video editing)100 GB26.85 years
Extreme (data center)500 GB30.68 years

Another Way to Think About It:

This shows lifespan is:

  • ↑ Better with larger capacity (more blocks to distribute wear)
  • ↑ Better with higher endurance rating
  • ↓ Worse with higher write workload
  • ↓ Worse with higher WAF

Problem 24: File System Superblock Contents

List 2 items that are stored in the superblock of a file system.

LLM Solution

Superblock Contents:

The superblock is a special block (usually at block 0 or 1) that stores critical metadata about the entire file system.

Two Key Items Stored:

  1. File System Size Information:

    • Total number of blocks in the file system
    • Number of inodes
    • Block size (512B, 1KB, 4KB, etc.)
    • Inode size
  2. Free Space Information:

    • Pointer to free block list (or bitmap of free blocks)
    • Pointer to free inode list
    • Number of free blocks remaining
    • Number of free inodes remaining

Other Important Superblock Contents:

  • File System Type: ext4, NTFS, APFS, etc.
  • Creation Date/Time: When FS was formatted
  • Last Mount Time: When last mounted
  • Mount Count: How many times mounted
  • Block Group Descriptors: Pointers to block group metadata
  • Magic Number: Identifier (0xEF53 for ext4)
  • Journal Pointer: For journaling file systems

Problem 25: External Fragmentation in Contiguous Files

What is external fragmentation in a system with contiguous files?

LLM Solution

External Fragmentation in Contiguous File Allocation:

External fragmentation occurs when free space is scattered into many small, non-contiguous fragments, making it difficult to allocate space for new files even though total free space may be sufficient.

Visual Example:

Initial State (empty disk):
┌─────────────────────────────────────────────────────────┐
│ [Free: 1000 KB]                                         │
└─────────────────────────────────────────────────────────┘

After some files:
┌─────────────────────────────────────────────────────────┐
│ File A │ Free:100│ File B │ Free:150│ File C │ Free:120 │
│ 200 KB │  100 KB │ 300 KB │  150 KB │ 200 KB │  120 KB  │
└─────────────────────────────────────────────────────────┘

Total free: 100 + 150 + 120 = 370 KB (seems OK)

Try to allocate new 200 KB file:
- Fragment 1: Only 100 KB ❌ (too small)
- Fragment 2: Only 150 KB ❌ (too small)
- Fragment 3: Only 120 KB ❌ (too small)

Result: Allocation FAILS even though 370 KB is free!
        Cannot allocate 200 KB contiguous space.

Why External Fragmentation Happens:

  1. Files are created and deleted randomly
  2. Each deletion leaves a hole (free space)
  3. If file sizes vary, holes become scattered
  4. Over time, many small holes accumulate
  5. Even if total free space is large, no contiguous region is large enough

Impact on Performance:

  • Allocation: May require data relocation or rejection
  • Access Time: Fragmented files require multiple seeks (for files spanning fragments)
  • System Performance: Increased disk I/O, slower access

Comparison with Internal Fragmentation:

TypeLocationCauseExample
ExternalBetween filesDeletion patternsFile A deleted leaves hole between B and C
InternalWithin blocksBlock size4KB block holds 3.5KB file, wastes 0.5KB

Solutions:

  1. Defragmentation: Rearrange files to consolidate free space
  2. Paging/Segmentation: Don’t require contiguous allocation
  3. Best-Fit Allocation: Choose smallest fragment that fits
  4. Compaction: Periodically reorganize disk

Problem 26: Internal Fragmentation in File Systems

What’s internal fragmentation as it applies to a file system?

LLM Solution

Internal Fragmentation in File Systems:

Internal fragmentation occurs when a file uses less space than the block it’s allocated, wasting the unused portion of the block.

Example:

Block Size: 4 KB (4096 bytes)

File 1: 3.5 KB
Allocated: 4 KB block
Wasted: 0.5 KB (512 bytes) ← Internal fragmentation

File 2: 1 KB
Allocated: 4 KB block
Wasted: 3 KB (3072 bytes) ← Internal fragmentation

File 3: 4.2 KB
Allocated: 8 KB (2 blocks)
Wasted: 3.8 KB ← Internal fragmentation

Total wasted: 0.5 + 3 + 3.8 = 7.3 KB
Total allocated: 12 KB (3 blocks)
Fragmentation ratio: 7.3/12 = 60.8%

Why It Happens:

  1. Fixed Block Size: File system allocates in fixed-size units (4 KB, etc.)
  2. File Size Mismatch: Files rarely align perfectly with block boundaries
  3. Rounding Up: Must allocate entire block even if file doesn’t fill it

Impact:

  • Wasted Space: Dead storage capacity
  • Not Performance Related: Unlike external fragmentation, doesn’t affect access speed (file still in one block)
  • Capacity Related: Affects usable disk space

Internal vs External Fragmentation:

AspectInternalExternal
LocationWithin allocated blocksBetween allocated blocks
CauseBlock size vs file sizeDeletion/allocation patterns
ImpactWasted spaceAccess time + wasted space
SolutionSmaller blocksDefragmentation/paging
SeverityUsually <50%Can be >50%

Real-World Impact:

Hard Drive: 1 TB
Block Size: 4 KB
10 million files (average 10 KB)

With 4KB blocks:
Average waste per file: ~2 KB
Total internal fragmentation: ~20 GB (2% loss)

With 64 KB blocks:
Average waste per file: ~32 KB
Total internal fragmentation: ~320 GB (32% loss)

With 512 B blocks:
Average waste per file: ~256 B
Total internal fragmentation: ~2.5 GB (0.25% loss)
BUT: More blocks, more overhead, slower I/O

Block Size Trade-offs:

Small BlocksLarge Blocks
Less internal fragmentationMore internal fragmentation
More overheadLess overhead
More seeksFewer seeks
Slower for large filesSlower for small files

Problem 27: Contiguous Allocation

What is contiguous allocation? What difficulty occurs with contiguous allocation?

LLM Solution

Contiguous Allocation Definition:

Contiguous allocation is a file storage method where all blocks of a file are stored in consecutive positions on the disk. Files occupy one contiguous region of disk space.

How It Works:

File A: 5 blocks
┌───┬───┬───┬───┬───┐
│ 1 │ 2 │ 3 │ 4 │ 5 │  blocks 100-104
└───┴───┴───┴───┴───┘
      Contiguous region

File B: 3 blocks
        ┌───┬───┬───┐
        │ 1 │ 2 │ 3 │  blocks 110-112
        └───┴───┴───┘
          Contiguous region

Advantages:

  1. Fast Access: Sequential file access requires minimal seeking
  2. Simple: Easy to implement, easy to understand
  3. Low Overhead: Minimal metadata (just start block + length)

Directory Entry Example:

File A: Start block = 100, Length = 5 blocks
File B: Start block = 110, Length = 3 blocks

Difficulties with Contiguous Allocation:

1. External Fragmentation:

Disk layout after some allocations/deletions:

┌──────┬────────────┬──────┬──────┬──────┬──────┐
│ File │   Free     │ File │ Free │ File │ Free │
│  A   │ (8 blocks) │  B   │  3   │  C   │ 10   │
└──────┴────────────┴──────┴──────┴──────┴──────┘

Want to allocate new 6-block file:
- Free space 1: 8 blocks ✅ (fits)
- Free space 2: 3 blocks ❌ (too small)
- Free space 3: 10 blocks ✅ (fits)

Problem: Even with 21 blocks free, allocation difficult
if multiple files require exact sizes.

2. File Growth Problem:

Original:
File A: blocks 100-104 (5 blocks, currently using 4)

Now need to add 2 more blocks:

┌───────────────┬──┬────────┐
│   File A      │XX│ File B │
│ 100-104       │105│ 106-110
└───────────────┴──┴────────┘

Cannot grow in-place! Must:
1. Allocate new space (blocks 120-126, 7 blocks)
2. Copy File A data to new location
3. Update directory entry
4. Release old blocks (external fragmentation)

Result: High overhead, slow

3. Allocation Difficulty:

During allocation, must find contiguous free space large enough:

Worst case: Scan entire disk looking for contiguous region
            Expensive operation

Pre-allocation required: May allocate more than needed
                         Wastes space

4. Space Underutilization:

If file grows beyond pre-allocated space:

  • Must move entire file to larger contiguous region
  • Old space becomes fragmented
  • Can’t use partial regions

Solutions (used in practice):

  1. Extent-Based Allocation: File composed of multiple contiguous extents
  2. Linked Allocation: Blocks linked via pointers (eliminates fragmentation)
  3. Indexed Allocation: Index blocks point to file blocks
  4. Paging: Virtual memory style (modern approach)

Modern File Systems:

Modern systems don’t use pure contiguous allocation because difficulties outweigh benefits. Instead:

  • ext4/NTFS: Use extent-based (multiple contiguous regions)
  • Benefits: Fewer seeks than fully scattered, easier growth than pure contiguous

Problem 28: Linked Allocation Fragmentation

Can linked allocation in a file system have external fragmentation? Internal fragmentation?

LLM Solution

Analysis of Fragmentation in Linked Allocation:

Linked Allocation Method:

Each block contains:

  • Data (512 B - 4 KB)
  • Pointer to next block (4-8 B)

Example:

File A: blocks [100→105→110]
         ↓       ↓       ↓
    [Data...→105] [Data...→110] [Data...→-1(end)]

External Fragmentation: NO ❌

Linked allocation does NOT have external fragmentation because:

  1. Any free block can be used - blocks don’t need to be contiguous
  2. Scattered allocation is OK - pointers link them together
  3. No “large enough contiguous region” requirement - allocation always succeeds (until disk full)

Example:

Free blocks: scattered at positions 5, 47, 100, 200, 305, ...

Allocate new 5-block file:
Use any 5 free blocks: [5→47→100→200→305]

Disk layout:
┌────┬───┬─────┬───────┬───┬───┬─────┬────┐
│ FA │ * │ FB  │ Free  │ FA│ FC│ Free│ FA │
│ 1  │5  │ 2-3 │ 6-46  │100│47 │48-99│200 │
└────┴───┴─────┴───────┴───┴───┴─────┴────┘

*Pointer to next block
No external fragmentation!

Internal Fragmentation: YES ✅

Linked allocation DOES have internal fragmentation because:

  1. Fixed block size - still allocate in fixed-size blocks
  2. Last block may not fill completely - partial block waste

Example:

Block size: 4 KB (4096 bytes)
File size: 10.5 KB (10752 bytes)

Allocation:
Block 1: 4 KB (full)
Block 2: 4 KB (full)
Block 3: 2.5 KB (contains 2.5 KB data + 1.5 KB pointer)
       But allocated 4 KB → waste 1.5 KB

Internal fragmentation: 1.5 KB per file (on average 2 KB wasted per file)

Summary:

TypeLinked AllocationReason
ExternalNOAny free block can be used; contiguity not required
InternalYESFixed block size + last block may be partially full

Impact:

  • Better than contiguous: No external fragmentation (major advantage)
  • Still has waste: Internal fragmentation similar to other methods
  • Trade-off: Slower access (more seeks) for better allocation flexibility

Performance Comparison:

Contiguous:     Fast access, external fragmentation problem
Linked:         Slower access (more seeks), no external fragmentation
Indexed:        Better access patterns, no external fragmentation

Problem 29: Advantages and Disadvantages of Linked Allocation

List 1 advantage and 1 disadvantage of using linked allocation in a file system?

LLM Solution

Linked Allocation Characteristics:

Advantage:

No External Fragmentation - The biggest advantage.

Files can be stored in any available blocks scattered across the disk. Unlike contiguous allocation, there’s no need for large contiguous regions. Allocation always succeeds as long as free blocks exist. This avoids the fragmentation problems that plague contiguous allocation and eliminates the need for periodic defragmentation.

Disadvantage:

Random Access is Slow - The biggest disadvantage.

To access block N of a file, must follow the chain of pointers:

  • Block 0 → Block 1 → Block 2 → … → Block N

For large files, this becomes very slow. Reading block 1000 requires 1000 disk accesses (one per pointer follow). Sequential files are OK (decent caching), but random access patterns kill performance.

Example: Access byte at position 4,000,000 in a 5 MB file

Linked Allocation:
Block 0→Block 1→Block 2→...→Block 976 (1000 seeks!)
Time: ~1000 × 5ms = 5 seconds!

Indexed Allocation:
Look up index → go directly to Block 976 (1 seek)
Time: ~5ms

Problem 30: Single Directory Advantages and Disadvantages

List 1 advantage and 1 disadvantage of using a single directory.

LLM Solution

Single Directory File System:

All files stored in one flat directory (no subdirectories).

Advantage:

Simple Implementation - Easy to design and implement.

Only one directory to manage, linear search for files, minimal metadata overhead. No need for complex tree traversal algorithms or recursive directory operations. Users don’t need to remember complex path hierarchies.

Implementation:
Directory = { (filename, inode) }
Linear search: O(n) time

No need for:
- Path parsing
- Directory traversal
- Recursive operations

Disadvantage:

Naming Conflicts and Poor Organization - Multiple users can’t have files with same name.

With only one directory, all files share the same namespace. Two users can’t both create a file named “data.txt”. Files aren’t organized hierarchically, so thousands of files become unmanageable. Also, different file types (documents, images, programs) are all mixed together.

Single Directory:
data.txt          ← User A's data
document.txt      ← User B's data (CONFLICT!)
image.jpg         ← Image file
program.exe       ← Program file
temp.tmp          ← Temporary file
(1000 more files) ← Chaos!

vs. Hierarchical:
/users/alice/documents/data.txt
/users/bob/documents/document.txt
/images/photo.jpg
/programs/app.exe
/temp/tempfile.tmp
(organized!)

Problem 31: DAG vs Tree Directory Issues

List two issues that must be addressed to handle Directed Acyclic Graph (DAG) directories vs. simple tree directories.

Instructor Solution

(1) Backups need to be carefully written to not backup files with multiple names twice.

(2) Need to maintain reference counts in the i-nodes to know when the underlying file should be deleted.

Elaboration (by LLM):

Let me explain why these two issues are critical when you move from a simple tree directory structure to a DAG (Directed Acyclic Graph) structure.

Background: Tree vs DAG Directory Structure

In a simple tree directory:

  • Each file has exactly ONE parent directory
  • Each file has exactly ONE path
  • Deletion is straightforward: delete the directory entry
Tree Structure:
/
├── alice/
│   └── thesis.pdf
└── bob/
    └── project.txt

Each file has ONE parent, ONE path

In a DAG directory structure:

  • Each file can have MULTIPLE parents (hard links)
  • Each file can be accessed via MULTIPLE paths
  • Deletion becomes complicated
DAG Structure:
/
├── alice/
│   └── thesis.pdf ──┐
├── bob/             ├─→ [inode 5000] → actual data
│   └── backup.pdf ──┘
│
Same file, TWO paths:
  /alice/thesis.pdf
  /bob/backup.pdf

Issue (1): Backup Problem

When you backup a filesystem with hard links (DAG), you must be careful not to backup the same file data twice.

Why this is a problem:

Scenario: Backup /alice/ and /bob/

Naive approach:
  1. Backup /alice/thesis.pdf → Save inode 5000 data
  2. Backup /bob/backup.pdf → Save inode 5000 data AGAIN!

Result: Backup has duplicate data, wasting backup space!

The solution:

Backup software must track inode numbers and ensure each inode is backed up only once:

Smart backup approach:
  1. Encounter /alice/thesis.pdf (inode 5000)
     → Back up data once
     → Record: "inode 5000 backed up"

  2. Encounter /bob/backup.pdf (inode 5000)
     → Already in backup set
     → Skip data, just record hard link

Result: Data saved once, backup efficient

Real-world impact:

Without this consideration, a filesystem with many hard links could double or triple backup size. Example:

Filesystem: 1 TB actual data
With many hard links, naive backup: 3 TB (3 copies!)
With smart backup: 1 TB (1 copy, hard link references)

Issue (2): Reference Counting for Deletion

In a DAG structure, you can’t simply delete a file when a user runs rm. You must track how many paths lead to that file.

Why reference counting is necessary:

Scenario: Multiple hard links to same file

/alice/thesis.pdf → [inode 5000] → [data]
/bob/backup.pdf ──→ [inode 5000] → [data]
/charlie/copy.pdf → [inode 5000] → [data]

Reference count: 3

When user deletes files:

Step 1: User deletes /alice/thesis.pdf
  → Remove directory entry
  → Decrement reference count: 3 → 2
  → Inode still exists (other paths depend on it!)
  → Data still on disk

Step 2: User deletes /bob/backup.pdf
  → Remove directory entry
  → Decrement reference count: 2 → 1
  → Inode still exists
  → Data still on disk

Step 3: User deletes /charlie/copy.pdf
  → Remove directory entry
  → Decrement reference count: 1 → 0
  → NOW reference count = 0!
  → Delete inode and data from disk

Without reference counting:

If we deleted data immediately on first `rm`:

User deletes /alice/thesis.pdf
  → File deleted from disk

But /bob/backup.pdf and /charlie/copy.pdf STILL POINT TO DELETED DATA!
  → "Dangling pointers"
  → Reading those paths: CORRUPTION or error

How reference count is stored:

In the inode itself, the OS maintains a link count:

struct inode {
    int mode;           // File permissions
    int uid;            // Owner user ID
    long size;          // File size
    int link_count;     // ← HOW MANY PATHS POINT HERE
    char data_blocks[]; // Actual file data location
}

When creating a hard link:

ln /alice/thesis.pdf /bob/backup.pdf
 
OS Action:
  1. Create new directory entry in /bob/: "backup.pdf" inode 5000
  2. Increment link_count in inode 5000: 1 2
  3. File now has 2 paths

When deleting:

rm /alice/thesis.pdf
 
OS Action:
  1. Remove directory entry from /alice/: "thesis.pdf" deleted
  2. Decrement link_count in inode 5000: 2 1
  3. link_count > 0, so keep inode + data
  4. /bob/backup.pdf still works!

Verification in Unix/Linux:

$ echo "important" > file.txt
$ ls -l file.txt
-rw-r--r-- 1 user group 10 Jan 4 12:00 file.txt

         link_count = 1
 
$ ln file.txt link1.txt
$ ln file.txt link2.txt
 
$ ls -l file.txt link1.txt link2.txt
-rw-r--r-- 3 user group 10 Jan 4 12:00 file.txt
-rw-r--r-- 3 user group 10 Jan 4 12:00 link1.txt
-rw-r--r-- 3 user group 10 Jan 4 12:00 link2.txt

         All show link_count = 3
 
$ rm file.txt
$ rm link1.txt
$ rm link2.txt Only NOW is data actually deleted

Summary:

IssueTree DirectoriesDAG Directories
BackupSimple: Each file onceMust check inodes to avoid duplicates
DeletionDelete immediatelyMust use reference count to track when actual deletion happens
ComplexityLowHigh - must manage reference counts
BenefitSimpleAllows hard links, shared files, efficient storage

These constraints are why modern filesystems with DAG support (ext4, NTFS, APFS) must implement reference counting and why backup tools need to be “inode-aware” to avoid data duplication.

Explain the difference between a hard-link and a soft-link.

LLM Solution

Hard Links vs Soft (Symbolic) Links:

AspectHard LinkSoft Link
Points toSame inodeFilename/path string
TypeDirectory entrySpecial file
Link countIncrements inode link countDoesn’t affect original
Survives deletionYES - inode still existsNO - dangling link
FilesystemSame filesystem onlyCan cross filesystems
Circular refsCan’t create cycleCan create cycles
OverheadMinimalExtra inode + file

Hard Link:

Original File:
/home/alice/data.txt → [inode 5000] → actual data

Hard Link:
/home/alice/backup.txt → [inode 5000] → same data

Link count: 2

Both point to SAME inode!
Deleting one file leaves the other intact
(data only deleted when link count = 0)

Creation:
ln original.txt hardlink.txt

Soft (Symbolic) Link:

Original File:
/home/alice/data.txt → [inode 5000] → actual data

Symbolic Link:
/home/alice/shortcut.txt → [inode 6001] → "/home/alice/data.txt" (path string)

Inode 6001 is a special "symlink" file containing path string
Dereferencing:
1. Read symlink content → get path "/home/alice/data.txt"
2. Look up path in filesystem
3. Access original inode 5000

If original deleted:
Symlink becomes "dangling" (broken)

Creation:
ln -s original.txt symlink.txt

Visual Comparison:

Hard Link Example:
User A's File:     /home/alice/thesis.pdf → [Inode 5000] ← /home/alice/backup_thesis.pdf
                                                          ↓
                                                    [PDF Data]

Both paths point to same inode
Delete thesis.pdf: backup_thesis.pdf still works

Soft Link Example:
User A's File:     /home/alice/thesis.pdf → [Inode 5000] → [PDF Data]
User B's Link:     /home/bob/view_thesis → [Inode 6001] → "/home/alice/thesis.pdf"
                                            (symlink file)

Symlink contains path string, not direct inode reference
Delete thesis.pdf: symlink becomes broken link

Practical Differences:

Hard Links:

  • Both names are equally valid
  • Can’t tell which is “original”
  • Useful for backups (transparent)
  • Can’t span filesystems
  • Can’t link to directories (filesystem integrity)

Soft Links:

  • One is original, one is link
  • Can see it’s a link (ls -l)
  • Useful for shortcuts
  • Can span filesystems
  • Can link to directories

Assume that user A owns a file named “file.txt” and user B has a hard link to this file named “file1.txt”. Assume user A removes “file.txt” by issuing the Linux command “rm file.txt”. Now user B tries to access “file1.txt” by executing the command “file1.txt”. Will user B be able to access this file? Why or why not? Briefly explain.

LLM Solution

Answer: YES, user B can still access file1.txt

Explanation:

Hard links point to the same inode, not through filenames. When user A deletes “file.txt”:

  1. OS removes the directory entry “file.txt”
  2. Link count decreases from 2 to 1 (file1.txt still references it)
  3. Inode and data remain because link count > 0
  4. User B’s “file1.txt” still points to the same inode
  5. Data is still readable through file1.txt

Mechanism:

Before deletion:
Directory /home/alice/:
  "file.txt"  → [inode 5000]

Directory /home/bob/:
  "file1.txt" → [inode 5000]

Inode 5000:
  Link count: 2
  Data: [actual file content]

User A: rm file.txt

After deletion:
Directory /home/alice/:
  (file.txt entry removed)

Directory /home/bob/:
  "file1.txt" → [inode 5000]

Inode 5000:
  Link count: 1  ← Decreased by 1
  Data: [actual file content]  ← Still exists!

User B: access file1.txt
Result: SUCCESS - can read data through inode 5000

Key Point:

Hard links are at the inode level, not the filename level. The data is only deleted when:

  • Link count becomes 0 (all directory entries removed)

Since file1.txt still references inode 5000, the data persists.

Contrast with Soft Link:

If file1.txt were a soft link:

Before deletion:
file.txt → [inode 5000] → [data]
file1.txt → [inode 6001] → "/home/alice/file.txt"

After deletion of file.txt:
file1.txt → [inode 6001] → "/home/alice/file.txt" (BROKEN PATH!)

Result: user B gets "file not found" error

Assume that user A owns a file named “file.txt” and user B has a soft link to this file named “file1.txt”. Assume user A removes “file.txt” by issuing the Linux command “rm file.txt”. Now user B tries to access “file1.txt” by executing the command “file1.txt”. Will user B be able to access this file? Why or why not? Briefly explain.

LLM Solution

Answer: NO, user B cannot access file1.txt - it becomes a dangling/broken link

Explanation:

Soft links contain the pathname of the original file, not the inode. When user A deletes “file.txt”, the pathname becomes invalid:

  1. Soft link file1.txt contains: “/home/alice/file.txt”
  2. User A removes file.txt → directory entry deleted
  3. Inode becomes unreachable
  4. When user B accesses file1.txt:
    • OS reads symlink → gets path “/home/alice/file.txt”
    • Tries to look up path → “file not found”
    • Error: “file1.txt: No such file or directory”

Mechanism:

Before deletion:
/home/alice/file.txt → [inode 5000] → [data]
/home/bob/file1.txt → [inode 6001] → "/home/alice/file.txt" (symlink)

User A: rm file.txt

After deletion:
/home/alice/file.txt (REMOVED)
/home/bob/file1.txt → [inode 6001] → "/home/alice/file.txt" (BROKEN!)

Inode 5000: Unreachable (no directory entry points to it)
           Link count: 0 (if no other links)
           Data: Deleted

User B: access file1.txt
Step 1: Read symlink content → "/home/alice/file.txt"
Step 2: Try to look up path → Not found!
Step 3: Error! "file1.txt: No such file or directory"

Visual Diagram:

Soft Link (Pathname-based):

Before:
file1.txt ──→ [symlink inode 6001]
              ↓
              "/home/alice/file.txt"  ✅ Valid
              ↓
              [inode 5000] ✅
              ↓
              [data]

After rm file.txt:
file1.txt ──→ [symlink inode 6001]
              ↓
              "/home/alice/file.txt"  ❌ Invalid!
              ↓
              [doesn't exist]

Result: Dangling link (broken)

Verification in Linux:

# Create original file
$ echo "content" > file.txt
 
# Create soft link
$ ln -s file.txt link.txt
 
# Access soft link - works
$ cat link.txt
content
 
# Delete original
$ rm file.txt
 
# Try to access soft link - fails!
$ cat link.txt
cat: link.txt: No such file or directory
 
# But symlink file itself still exists!
$ ls -l link.txt
lrwxrwxrwx  1 user user 8 Jan  4 12:34 link.txt -> file.txt
 
Notice the arrow still points to deleted file

Key Difference from Hard Links:

OperationHard LinkSoft Link
Delete originalStill accessibleBroken link
Works becauseSame inodeContains path string
If inode existsData persistsPath invalid

Problem 35: UNIX File System Protection Bits

What are the 9 protection bits used in a UNIX filesystem?

Instructor Solution

Read, write, and execute for each of user, group, and other/world

Elaboration (by LLM):

Let me break down these 9 protection bits and show you how they work in practice.

The 9 Protection Bits Structure

UNIX file permissions are organized as 3 groups of 3 bits each:

  User    Group    Other
  ─────   ─────    ─────
  r w x | r w x  | r w x
  7 5 1 | 7 5 1  | 7 5 1

r = read (4)    w = write (2)    x = execute (1)

Breaking it down:

  1. User (Owner): 3 bits

    • r (read): Can owner read the file? (4)
    • w (write): Can owner modify the file? (2)
    • x (execute): Can owner execute the file? (1)
  2. Group: 3 bits

    • r (read): Can group members read the file? (4)
    • w (write): Can group members modify the file? (2)
    • x (execute): Can group members execute the file? (1)
  3. Other (World): 3 bits

    • r (read): Can everyone else read the file? (4)
    • w (write): Can everyone else modify the file? (2)
    • x (execute): Can everyone else execute the file? (1)

Practical Examples

$ ls -l
-rw-r--r-- alice staff 1024 Jan 4 thesis.pdf
 ^ ^^^^^^^ ^^^^^ ^^^^^
 | ||||||| ||||| |||||
 | ||||||| ||||| └─ Other: r-- (read only)
 | ||||||| └──────────── Group: r-- (read only)
 | └────────────────────── User: rw- (read+write)
 └─ File type: - (regular file)
 
Permissions: 644 (rw- r-- r--)

How to read octal notation (644):

644 = 6 (user) + 4 (group) + 4 (other)

6 = 4 (read) + 2 (write) + 0 (no execute) = rw-
4 = 4 (read) + 0 (no write) + 0 (no execute) = r--
4 = 4 (read) + 0 (no write) + 0 (no execute) = r--

Result: rw- r-- r--

Common Permission Patterns

OctalSymbolicFile TypeMeaning
644rw-r—r—Regular fileOwner: read+write, Group: read, Other: read
755rwxr-xr-xProgramOwner: full access, Others: read+execute
700rwx------SensitiveOwner only (privacy)
777rwxrwxrwxScriptEveryone: full access
600rw-------PrivateOwner reads+writes, nobody else

Understanding Each Permission

1. Read (r = 4)

On regular file:
  -rw-r--r-- file.txt
  User can read? YES (r)
  Group can read? YES (r)
  Other can read? YES (r)

  $ cat file.txt  (owner) ✅
  $ cat file.txt  (group member) ✅
  $ cat file.txt  (stranger) ✅

On directory:
  drwxr-xr-x folder/
  User can list contents? YES (r on dir)
  $ ls folder/  ✅

2. Write (w = 2)

On regular file:
  -rw-r--r-- file.txt
  User can modify? YES (w)
  Group can modify? NO (-)
  Other can modify? NO (-)

  $ echo "new" > file.txt  (owner) ✅
  $ echo "new" > file.txt  (group) ❌ Permission denied!
  $ echo "new" > file.txt  (stranger) ❌ Permission denied!

On directory:
  drwxr-xr-x folder/
  User can add/delete files? YES (w on dir)
  $ touch folder/newfile  ✅
  $ rm folder/oldfile     ✅

3. Execute (x = 1)

On regular file (script):
  -rwxr-xr-x script.sh
  User can execute? YES (x)
  Group can execute? YES (x)
  Other can execute? YES (x)

  $ ./script.sh  (owner) ✅
  $ ./script.sh  (group member) ✅
  $ ./script.sh  (stranger) ✅

On regular file (data):
  -rw-r--r-- data.txt
  User can execute? NO (-) - data file, no execution

  $ ./data.txt  (owner) ❌ Permission denied!

On directory:
  drwxr-xr-x folder/
  User can enter directory? YES (x on dir)
  User can access files inside? YES

  $ cd folder/  ✅
  $ ls folder/  ✅ (need x permission)

The Directory Permission Subtlety

For directories, permissions work differently:

Directory permissions:
  drwxr-xr-x scripts/

  r (read):    Can list directory contents? (ls)
  w (write):   Can create/delete files inside? (touch, rm)
  x (execute): Can enter directory? (cd)

Example:
  dr-xr-xr-x scripts/
  Permissions: r-- is missing on user
  $ ls scripts/  ❌ Can't list
  $ cd scripts/  ✅ Can enter (have x)
  $ ls *.sh      ❌ Can't list files

Practical Scenarios

Scenario 1: Create a private document

$ echo "secret data" > secret.txt
$ chmod 600 secret.txt
$ ls -l secret.txt
-rw------- alice staff secret.txt

      Only owner (alice) can read/write
  Group can't read, other can't read

Scenario 2: Share a program with your team

$ chmod 755 program.sh
$ ls -l program.sh
-rwxr-xr-x alice staff program.sh
       ↑↑↑
  User: rwx (full control)
  Group: r-x (can read and execute)
  Other: r-x (can read and execute)

Scenario 3: Shared team directory

$ mkdir team_project
$ chmod 770 team_project
$ ls -ld team_project
drwxrwx--- alice staff team_project
        ↑↑↑
  User: rwx (can add/delete files)
  Group: rwx (team members can add/delete)
  Other: --- (nobody else can access)

How the OS Checks Permissions

When user alice tries to read file.txt:

1. Check if alice is the owner
   YES → Check user bits (rwx------)
        Can read? YES ✅
        Allow access

   NO → Check if alice is in the file's group
        YES → Check group bits (-rw------)
              Can read? YES ✅
              Allow access

        NO → Check other bits (---r--r--)
             Can read? YES ✅
             Allow access

The Inode Storage

These 9 bits are stored in the file’s inode:

struct inode {
    mode_t mode;        // 16 bits total
                        // 3 bits: file type (-, d, l, etc.)
                        // 3 bits: special (setuid, setgid, sticky)
                        // 9 bits: permissions (rw-rw-r--)
    uid_t uid;          // Owner user ID
    gid_t gid;          // Group ID
    off_t size;         // File size
    // ... more fields
}
 
Permission bits (lower 9 bits of mode):
User: bits 6-8   (rwx)
Group: bits 3-5  (rwx)
Other: bits 0-2  (rwx)

Remember This

UNIX Permissions = 3 × 3 bits

┌──────────────────────────────────────────┐
│ User (Owner)   Group      Other (World)  │
│ r w x        r w x      r w x            │
│ ─────        ─────      ─────            │
│  7 5 1        7 5 1      7 5 1            │
│  (for octal notation)                    │
└──────────────────────────────────────────┘

Once you master these 9 bits, you understand
the foundation of UNIX file security!

Problem 36: Write and Close System Calls

Explain why the write and close system calls may not result in data immediately being written to disk.

LLM Solution

Buffering and Caching in Write Operations:

Both write() and close() system calls involve buffering/caching that can delay actual disk writes:

Write() System Call:

write(fd, buffer, 1000);  // Write 1000 bytes

This does NOT guarantee immediate disk write:

  1. Page Cache: Data goes into OS buffer cache (RAM)
  2. Returns immediately with bytes written to cache
  3. Actual disk write happens later (asynchronously)

Why delayed:

  • Performance: Batching multiple writes is faster than each write immediately going to disk
  • Efficiency: Can combine related writes into fewer disk operations
  • Throughput: User process doesn’t wait for slow disk I/O
Timeline:
User Process:    write() → [Returns after 1µs]
OS:              [Buffer in RAM cache]
Disk:            [Silent... waiting]
                 [Background: ~5ms later] Write to disk

Close() System Call:

close(fd);  // Close file

Even closing doesn’t guarantee flush:

  1. Standard behavior: close() flushes kernel buffers
  2. But: Application-level buffers may NOT flush
  3. fsync() or fsync fd required for guaranteed sync
C Standard I/O:
FILE *fp = fopen("file.txt", "w");
fprintf(fp, "data");  // Buffered in stdio buffer
close(fp);            // Flushes stdio + kernel buffers

vs.

Raw System Calls:
int fd = open("file.txt", O_WRONLY);
write(fd, buffer, 1000);  // In kernel page cache
close(fd);                // Flushes kernel buffers

But what if process crashes? Data lost if not yet on disk!

Multiple Levels of Buffering:

Application Buffer → Kernel Buffer → Disk Cache → Disk Hardware
    (stdio)          (page cache)    (device)     (NAND/Platters)

write() flushes:  Application → Kernel

close() flushes:  Application + Kernel → Device

But system crash before flush = data loss!

Solutions for Guaranteed Writes:

  1. fsync(fd): Force sync to disk
  2. fdatasync(fd): Sync data (not metadata)
  3. sync(): Sync all buffered data
  4. O_SYNC flag: Open file in synchronous mode
// Guarantee write to disk:
write(fd, buffer, 1000);
fsync(fd);  // Now actually on disk

Real-World Impact:

Databases use fsync() for ACID compliance - critical data must be on disk immediately. General applications can tolerate delayed writes for performance.

Problem 37: Seek System Call and Disk Head Movement

Does the seek system call result in a disk head moving (explain your answer)?

LLM Solution

Answer: NO, seek() does not immediately move the disk head

Explanation:

The seek() system call only updates the file position in memory. It doesn’t initiate any disk I/O operation.

FILE *fp = fopen("largefile.bin", "rb");
seek(fp, 1000000);  // Move to byte 1,000,000
                    // ↓
                    // Updates file position in kernel
                    // ↓
                    // Disk head DOES NOT move yet!

Why Seek Doesn’t Move the Head:

  1. Lazy Evaluation: OS doesn’t know if next operation will be read or write
  2. No Immediate I/O: Seek is a logical operation, not physical
  3. Efficiency: Unnecessary head movement if no subsequent read/write

Timeline:

seek(fp, 1000000)
↓
OS updates file offset: 1,000,000
↓
[Disk head continues wherever it was]

Nothing happens on disk!

Contrast with Read/Write:

After seek():
read(fd, buffer, 100)  ← This DOES cause head movement!
                       ← Seek + Read together trigger I/O

Timeline:
1. seek() sets offset → no action
2. read() called → THEN disk head moves to correct position
3. Rotational latency → positioning
4. Transfer data

Why This Design:

Example: Process seeks back and forth many times

Without lazy seek:
seek(0)     → head move (5 ms)
seek(1000)  → head move (5 ms)
seek(0)     → head move (5 ms)
seek(1000)  → head move (5 ms)
read()      → head move (5 ms) + read (1 ms)
Total: 21 ms of useless head movement!

With lazy seek (only move for actual I/O):
seek(0)           → no action
seek(1000)        → no action
seek(0)           → no action
seek(1000)        → no action
read() → move once → 1 seek (5 ms) + read (1 ms)
Total: 6 ms (much better!)

Key Point:

Seek is logical positioning, not physical disk operation. The actual head movement happens when:

  • read() is called
  • write() is called
  • Actual data transfer needed

Problem 38: I-node Contents

List 2 items that are stored in an i-node?

LLM Solution

I-node (Index Node) Contents:

The i-node is a data structure that stores metadata and pointers about a file.

Two Key Items:

  1. File Size: Total number of bytes in the file

    • Allows OS to know file extent
    • Used in read operations to prevent reading beyond EOF
  2. File Permissions and Owner Information:

    • User ID (UID) - who owns the file
    • Group ID (GID) - group ownership
    • Permission bits (read/write/execute for user/group/others)
    • File type (regular file, directory, symlink, etc.)

Other Typical I-node Contents:

  • Link Count: Number of hard links to this inode
  • Timestamps: Last access, modification, change times
  • Block Pointers: Direct pointers (10-15) and indirect pointers to file blocks
  • Device Number: Which device/filesystem
  • Inode Number: Unique ID within filesystem

Example I-node Structure (ext4):

Inode 5000:
├─ File size: 4096 bytes
├─ Permissions: 644 (rw-r--r--)
├─ Owner UID: 1000 (alice)
├─ Owner GID: 1000 (alice group)
├─ Link count: 1
├─ Timestamps:
│  ├─ Access time: 2024-01-04 12:30:00
│  ├─ Modify time: 2024-01-04 12:25:00
│  └─ Change time: 2024-01-04 12:25:00
├─ Block pointers:
│  ├─ Direct: [block 1000, 1001, ...] (10 direct)
│  ├─ Single indirect: [block 2000]
│  └─ Double indirect: [block 3000]
└─ File type: Regular file

Problem 39: Directory Entry Information in I-node File Systems

In a file system that uses i-nodes, what information do we store in each directory entry?

LLM Solution

Directory Entry in I-node File Systems:

Each directory is itself a special file containing directory entries. Each entry maps a filename to its inode.

Information Stored per Entry:

FieldSizePurpose
Filename/NameVariable (max 255B)Human-readable name of file/directory
Inode Number4 bytesIndex into inode table; points to file’s metadata
Filename Length2 bytesLength of filename (for variable-length names)
File Type (optional)1 byteRegular file, directory, symlink, etc.

Example Directory Entry:

Directory: /home/alice/
├─ ["file.txt", inode: 5000]      ← filename + inode number
├─ ["documents", inode: 5001]     ← subdirectory
├─ ["photo.jpg", inode: 5002]     ← another file
└─ ["link.txt", inode: 5003]      ← symlink

Each line is a directory entry!
Only stores: name and inode number (small!)

Actual file metadata (size, permissions, etc.)
is stored in the inode, NOT in directory entry!

Key Point:

Directory entry is MINIMAL because i-nodes handle metadata:

Directory Entry: Small
  filename + inode number

I-node: Large
  All metadata (size, permissions, timestamps, blocks, etc.)

Benefit:
- Multiple directory entries can point to same inode (hard links)
- I-node updated once, affects all directory entries

Why Separate Directory Entries from I-nodes:

Without i-nodes (FAT):

Directory Entry:
filename + size + permissions + timestamps + block pointers
= Large directory entries (hard to manage)
= Can't have hard links (name not separated from metadata)

With i-nodes (ext4, APFS):

Directory Entry:
filename + inode number
= Small directory entries
= Can have hard links (multiple names → same inode)
= Efficient metadata management

Problem 40: Block Size Advantages and Disadvantages

List 1 advantage and 1 disadvantage of making the block size big in a file system.

LLM Solution

Large Block Size (e.g., 64 KB vs 4 KB):

Advantage:

Better Throughput for Large Files

Larger blocks mean fewer I/O operations needed to transfer data. For sequential access patterns, larger blocks reduce the number of disk seeks and improve data transfer rates. Fewer block accesses = faster bulk data transfers.

Reading a 1 MB file:

With 4 KB blocks:
  Number of reads: 1 MB / 4 KB = 256 reads
  Seeks: 256 × 5ms = 1280 ms (overhead)
  Transfer: 256 × 0.01ms = 2.56 ms (actual data)
  Total: ~1280 ms

With 64 KB blocks:
  Number of reads: 1 MB / 64 KB = 16 reads
  Seeks: 16 × 5ms = 80 ms (overhead)
  Transfer: 16 × 0.16ms = 2.56 ms (actual data)
  Total: ~82 ms

Speedup: 1280/82 ≈ 15.6x faster!

Disadvantage:

Greater Internal Fragmentation

Larger blocks increase the average wasted space within the last block of each file. Since most files don’t fill the last block exactly, the waste per file increases with block size.

Average internal fragmentation:

4 KB blocks:
  Expected waste per file: ~2 KB (half a 4 KB block)

64 KB blocks:
  Expected waste per file: ~32 KB (half a 64 KB block)

Example with 1 million small files (1 KB each):

4 KB blocks:  1,000,000 × 2 KB = 2 GB wasted
64 KB blocks: 1,000,000 × 32 KB = 32 GB wasted

16x more waste!

Trade-off:

Large blocks: ✅ Fast I/O, ❌ Wasted space Small blocks: ✅ Efficient space, ❌ Slow I/O

Modern filesystems use 4-8 KB as sweet spot, with dynamic adjustments.

Problem 41: Block Cache Purpose and Management

What’s the purpose of having “block cache” in I/O systems. How is it managed by the OS? Briefly explain.

LLM Solution

Block Cache (Buffer Cache) Purpose:

The block cache is a region of RAM that caches recently accessed disk blocks, avoiding repeated disk I/O for the same data.

Key Purposes:

  1. Performance: RAM access (microseconds) vs disk access (milliseconds) - 1000x faster
  2. Reduce Disk I/O: Avoid repeated reads from disk
  3. Write Buffering: Batch writes for efficiency
  4. Shared Resource: Multiple processes share cached blocks

Block Cache Mechanism:

User Process:     read() → "I want block 100"
                          ↓
OS Check Cache:   Is block 100 in cache?
                  ├─ YES (Cache hit) → Return from RAM ✅ Fast
                  └─ NO (Cache miss) → Read from disk, cache it

Next read of block 100:
                  Cache hit → Return from RAM immediately

How OS Manages Block Cache:

1. Allocation & Replacement:

When cache is full and new block needed:

  • Use LRU (Least Recently Used) algorithm
  • Evict block unused for longest time
  • Load new block into freed space
  • Simple and effective

2. Write-back vs Write-through:

Write-back (Delayed):
write(block) → [Cache] → [return to user]
              → [background: eventually write to disk]
              ✅ Fast writes, ❌ Risk of data loss on crash

Write-through (Immediate):
write(block) → [Cache] → [Disk] → [return to user]
              ❌ Slow, ✅ Safe from crashes

Modern systems use write-back with journaling.

3. Cache Coherency:

Ensure consistency:

Process A reads block 100 from cache (v1)
Process B reads block 100 from cache (v1)
Process C writes block 100 to disk (v2)

Without coherency: A and B still see v1 (stale!)
With coherency: Invalidate cache, next read gets v2

4. Dirty Block Tracking:

Inode Block Management:
[Cache]
├─ Block 100: clean (matches disk)
├─ Block 101: dirty (modified, not yet written)
├─ Block 102: clean
└─ Block 103: dirty

Periodically flush dirty blocks to disk

5. Flushing Strategy:

Writeback occurs:

  • Periodically: Every 5-30 seconds (configurable)
  • On sync(): User-requested sync
  • On demand: When cache pressure requires eviction
  • On fsync(): Process-requested sync

Real-World Example:

File write sequence:

write() 1:
  Block 100: "data1" → [Cache, dirty]
  Returns in 1 µs

write() 2:
  Block 100: updated with "data1+" → [Cache, dirty]
  Returns in 1 µs

[5 seconds later, OS background process]
  Flush dirty blocks:
  Block 100 → Disk (5 ms)

Total time: ~5 seconds to disk
User perceived: Instant (1 µs × 2 = 2 µs)

Benefit: 2500x speedup through caching!

Cache Configuration (Linux):

cat /proc/sys/vm/dirty_writeback_centisecs   # Flush interval
cat /proc/sys/vm/dirty_ratio                 # Max dirty % before forced flush
cat /proc/sys/vm/drop_caches                 # Manual cache clear

Problem 42: Sequential Read Operations with Caching

Consider a file system that uses 4096 byte blocks. Assume that a program opens a file stored in this file system and issues the following system calls in the given sequence: read(100); seek(2000); read(700); read(100); seek(5300); read(100); read(200). Assume that seek(location) moves the file pointer to the given location on the file. In response to this sequence of operations, how many blocks are read from the disk? Justify your answer.

LLM Solution

Analysis:

Block size = 4096 bytes Each block contains bytes at positions: [block# × 4096, block# × 4096 + 4095]

Block Mapping:

Block 0: bytes 0-4095
Block 1: bytes 4096-8191
Block 2: bytes 8192-12287
...

Tracing Each Operation:

OpCurrent PosTargetBlock#New Block?Cumulative
read(100)00-990YES (miss)1 block
seek(2000)10020000same block-
read(700)20002000-26990- (already loaded)1 block
read(100)27002700-27990- (already loaded)1 block
seek(5300)280053001YES (miss)2 blocks
read(100)53005300-53991- (already loaded)2 blocks
read(200)54005400-55991- (already loaded)2 blocks

Detailed Breakdown:

  1. read(100): Read first 100 bytes

    • File pointer at 0
    • Bytes needed: 0-99
    • Blocks needed: Block 0 (contains bytes 0-4095)
    • Disk read: 1 block (Block 0 from cache MISS)
  2. seek(2000): Move to byte 2000

    • File pointer now at 2000
    • Block 0 already in cache (no action)
  3. read(700): Read 700 bytes starting at 2000

    • Bytes needed: 2000-2699
    • All within Block 0 (0-4095)
    • Disk read: 0 blocks (CACHE HIT - Block 0 already cached)
  4. read(100): Read 100 bytes starting at 2700

    • Bytes needed: 2700-2799
    • All within Block 0 (0-4095)
    • Disk read: 0 blocks (CACHE HIT)
  5. seek(5300): Move to byte 5300

    • File pointer now at 5300
    • Which block? 5300 / 4096 = 1.29 → Block 1 (4096-8191)
    • Block 1 NOT in cache
  6. read(100): Read 100 bytes starting at 5300

    • Bytes needed: 5300-5399
    • Blocks needed: Block 1
    • Disk read: 1 block (Block 1 from cache MISS)
  7. read(200): Read 200 bytes starting at 5400

    • Bytes needed: 5400-5599
    • All within Block 1 (4096-8191)
    • Disk read: 0 blocks (CACHE HIT - Block 1 already loaded)

Answer: 2 blocks read from disk

Justification: Block cache retains recently accessed blocks. Only when accessing a new block not yet cached (Block 0, then Block 1) does a disk read occur. Subsequent reads within the same block hit the cache.

Problem 43: Block Reads for Multiple Operations

Consider a file system that uses 1024 byte blocks. Assume that a program opens a file stored in this file system and issues the following system calls in the given sequence: read(500); seek(700); read(200); read(100); seek(3000); read(200). Assume that seek(location) moves the file pointer to the given location. In response to this sequence of operations, how many blocks are read from the disk? Justify your answer.

LLM Solution

Analysis:

Block size = 1024 bytes

Block Mapping:

Block 0: bytes 0-1023
Block 1: bytes 1024-2047
Block 2: bytes 2048-3071
Block 3: bytes 3072-4095
...

Tracing Each Operation:

OpCurrent PosTarget BytesBlock(s)New Block?Total Blocks Read
read(500)00-4990YES1
seek(700)500700-(no I/O)1
read(200)700700-8990NO (cached)1
read(100)900900-9990NO (cached)1
seek(3000)10003000-(no I/O)1
read(200)30003000-31992YES2

Detailed Breakdown:

  1. read(500): Read first 500 bytes

    • File pointer at 0
    • Bytes needed: 0-499
    • Block needed: Block 0 (bytes 0-1023)
    • Disk read: 1 block (Block 0 MISS)
  2. seek(700): Move to byte 700

    • File pointer now at 700
    • No I/O operation occurs (logical seek only)
    • Block 0 still in cache
  3. read(200): Read 200 bytes starting at 700

    • Bytes needed: 700-899
    • All within Block 0 (0-1023)
    • Disk read: 0 blocks (Block 0 CACHE HIT)
  4. read(100): Read 100 bytes starting at 900

    • Bytes needed: 900-999
    • All within Block 0 (0-1023)
    • Disk read: 0 blocks (Block 0 CACHE HIT)
  5. seek(3000): Move to byte 3000

    • File pointer now at 3000
    • Which block? 3000 / 1024 = 2.93 → Block 2 (bytes 2048-3071)
    • No I/O occurs
  6. read(200): Read 200 bytes starting at 3000

    • Bytes needed: 3000-3199
    • Block needed: Block 2 (bytes 2048-3071) - contains 3000-3071
    • Block 2 NOT in cache (different from Block 0)
    • Disk read: 1 block (Block 2 MISS)

    Note: Bytes 3072-3199 might span into Block 3, but we’re asked how many blocks are accessed for reading, and the primary operation reads from Block 2.

Answer: 2 blocks read from disk

Justification:

  • Block 0 accessed at position 0 (read 500 bytes, then seek to 700)
  • All subsequent operations at positions 700-999 stay within Block 0 (cache hit)
  • Seek to position 3000 moves to Block 2 (new block, cache miss)
  • Read from position 3000 requires Block 2 (1 disk read)

Total: Block 0 (1st read) + Block 2 (6th read) = 2 blocks from disk

Problem 44: Minimum Disk Blocks to Access File Byte

A UNIX system has just been rebooted (i.e., the file buffer and name translation caches are empty). What is the minimum number of disk blocks that must be read to get to the millionth byte of the file “/a/big/file” assuming the file-system uses 4096 byte blocks and 32-bit i-node numbers?

Instructor Solution

Read root i-node, read root directory and find “a”, read i-node for “/a”, read directory chunk and find “big”, read i-node for “/a/big”, read directory chunk and find “file”, read i-node for “/a/big/file”, read indirect block for the file (since 4KB blocks hold 1024 i-nodes, files up to 4MB are in direct or indirect block), read the data block.

Elaboration (by LLM):

Let me walk through this path lookup step-by-step, explaining what the instructor’s solution is really doing.

The Challenge

We need to access the millionth byte (byte 1,000,000) in file “/a/big/file”. The system is cold-started, meaning:

  • No caches
  • No i-node caches
  • No name translation caches
  • Every lookup requires disk reads

With 4096 byte blocks and 32-bit i-node numbers, let me trace through the path.

Step-by-Step Breakdown

Step 1: Read Root I-node (1 block)

The root directory i-node location is hardcoded in UNIX systems. We must read it first to know where the root directory is stored.

Disk Read: Root i-node
Contains: Block pointers for root directory "/"
Block pointers tell us where to find directory entries

Step 2: Read Root Directory Data (1 block)

The root i-node has block pointers. Following the first direct pointer, we read the root directory block containing all entries (files/directories in ”/”).

Disk Read: Root directory block
Content: [("a", inode_num_a), ("bin", inode_num_bin), ...]
Goal: Find entry "a"
Result: Found! inode_num_a = some number (e.g., 5000)

Step 3: Read “/a” I-node (1 block)

Now we know the i-node number for directory “a”. We read its i-node to get block pointers.

Disk Read: I-node for "/a"
Contains: Block pointers for contents of "/a" directory

Step 4: Read “/a” Directory Data (1 block)

Follow the i-node’s block pointers to read the actual directory data containing entries in “/a”.

Disk Read: Directory block for "/a"
Content: [("big", inode_num_big), ("small", inode_num_small), ...]
Goal: Find entry "big"
Result: Found! inode_num_big = some number (e.g., 5001)

Step 5: Read “/a/big” I-node (1 block)

Disk Read: I-node for "/a/big"
Contains: Block pointers for contents of "/a/big" directory

Step 6: Read “/a/big” Directory Data (1 block)

Disk Read: Directory block for "/a/big"
Content: [("file", inode_num_file), ...]
Goal: Find entry "file"
Result: Found! inode_num_file = some number (e.g., 5002)

Step 7: Read “/a/big/file” I-node (1 block)

Now we have the i-node for the actual file. This is critical because it contains the block pointers to the file’s data.

Disk Read: I-node for "/a/big/file"
Contains:
  - Direct pointers (usually 12-15)
  - Single indirect pointer
  - Double indirect pointer
  - File size, permissions, timestamps, etc.

For 4KB blocks and 32-bit block numbers:
Each block holds 4096 / 4 = 1024 block pointers

Step 8: Calculate Block Location for Byte 1,000,000

We need byte 1,000,000. Let’s see where it is:

Block size: 4096 bytes

Byte 1,000,000 is in which block?
1,000,000 / 4096 = 244.14... → Block 244

So we need to access Block 244 of the file.

Question: Can Block 244 be accessed via direct pointers?

Most UNIX systems have 12-15 direct pointers per i-node. These point to blocks 0-11 or 0-14.

Block 244 is BEYOND the direct pointers
So we need the indirect pointer!

Step 9: Read Single Indirect Block (1 block)

Since block 244 > 14 (max direct), we use the single indirect pointer.

Single indirect pointer (in i-node) → Block X
Block X contains 1024 block pointers (for 4KB blocks)
These pointers point to file blocks 12-1035 (approximately)

Block 244 falls within this range!
Specifically, in the single indirect block:
  Position = 244 - 12 = 232 (approximately, depends on direct count)

Disk Read: Single indirect block
Content: [block_ptr_0, block_ptr_1, ..., block_ptr_232, ...]
Goal: Find block pointer at position 232
Result: block_ptr_232 points to the actual data block

Step 10: Read the Data Block (1 block)

Finally, follow the block pointer we found to read the actual data block containing byte 1,000,000.

Disk Read: Data block containing byte 1,000,000

Total Count

Let me count all the blocks read:

1. Root i-node
2. Root directory block
3. "/a" i-node
4. "/a" directory block
5. "/a/big" i-node
6. "/a/big" directory block
7. "/a/big/file" i-node
8. Single indirect block
9. Data block

Total: 9 blocks

Why This Many?

3 path components (a, big, file) × 2 blocks each = 6 blocks
  (1 i-node + 1 directory data per component)

+ 1 file i-node
+ 1 indirect block (to reach block 244)
+ 1 data block

= 9 blocks total

Why Single Indirect, Not Double Indirect?

With single indirect:

  • Can address blocks 12 to 12 + 1024 = 1036 directly
  • Byte 1,000,000 / 4096 = block 244 ✅ Within range

Why not double indirect?

Double indirect would be needed if:
  Block number >= 12 + 1024 = 1036

For byte > 1,000,000:
  Block > 244 (within single indirect range)

For byte > 4,000,000:
  Block > 976 (still within single indirect range ~1036)

For byte > 4,096,000:
  Block > 1000 (approaching limit)

For byte > 4,100,000:
  Block > 1001 (exceeds single indirect ~1036)
  ← Would need double indirect

Key Insight

The instructor’s solution correctly identifies that:

  1. Path traversal requires 6 blocks (3 path components × 2)
  2. The millionth byte is in a block reachable via single indirect
  3. We need 1 indirect block + 1 data block = 2 more reads
  4. Total: 6 + 1 + 1 + 1 = 9 blocks

The “1 + 1” for indirect and data comes from the fact that to access block 244 of the file, we must:

  • Read the single indirect block to find the pointer
  • Read the actual data block

This is a great example of understanding filesystem internals and indirect pointer mechanics!

Problem 45: Bit Vector vs Free Block List

Explain why a bit vector implementation of a free block list can provide increased reliability and performance compared with keeping a list of free blocks where the first few bytes of each free block provide the logical sector number of the next free block.

Instructor Solution

Performance: Bit vectors provide fast access to find clusters of adjacent free blocks.

Reliability: If an item in a linked list is lost, the rest of the list is lost. With a bit vector only the items are lost. Also, it’s possible to have multiple copies of the bit vector since it is a more compact representation.

Elaboration (by LLM):

Let me explain both the linked list approach and the bit vector approach, and why bit vectors are superior.

The Problem: Tracking Free Blocks

A filesystem needs to track which blocks are free (available for allocation) and which are occupied. There are two main approaches:

Approach 1: Linked List of Free Blocks (The Problematic Way)

In this method, free blocks form a chain where each free block contains a pointer to the next free block in its first few bytes.

Disk Layout:

Allocated Block:
┌──────────────────┐
│   File Data      │
│   (in use)       │
└──────────────────┘

Free Block (Link Node):
┌──────────────────┐
│ Next block: 500  │  ← First 4 bytes store pointer
│ Free space...    │     to next free block
└──────────────────┘

Free Block:
┌──────────────────┐
│ Next block: 700  │  ← Point to next free block
│ Free space...    │
└──────────────────┘

Free Block:
┌──────────────────┐
│ Next block: -1   │  ← -1 means end of list
│ Free space...    │
└──────────────────┘

Superblock tracks: Head of list = Block 200
Free list: 200 → 500 → 700 → END

Approach 2: Bit Vector (The Better Way)

A bit vector uses one bit per block to track free/allocated status.

Bit Vector in Superblock/Inode Table:

Bit position:  0 1 2 3 4 5 6 7 8 9 10 11 ...
Bit value:     1 0 1 1 0 0 1 0 1 0  1  1 ...
               │ │ │ │ │ │ │ │ │ │  │  │
Meaning:    alloc free alloc alloc free free alloc ...

Legend:
  1 = Block allocated (in use)
  0 = Block free (available)

Free blocks: 1, 5, 6, 8, 10, ...

Comparison: Performance

Linked List Performance (Slow):

To find free blocks for a 4-block file:

Scenario: Need 4 contiguous free blocks

Start at head: Block 200
  Pointer says next free: Block 500
  Are 200-203 contiguous? NO

Follow chain: Block 500
  Pointer says next free: Block 700
  Are 500-503 contiguous? NO

Follow chain: Block 700
  Pointer says next free: Block 1000
  Are 700-703 contiguous? NO

Continue searching...

Time: Have to follow many pointers!
Each pointer follow = disk read
O(n) time to search for free blocks

Bit Vector Performance (Fast):

Scenario: Need 4 contiguous free blocks

Scan bit vector: 1 0 1 1 0 0 1 0 1 0 1 1 ...
Look at bits: [0 0 0 0] ← 4 contiguous zeros!
Found free blocks: 5, 6, 7, 8
Allocate immediately!

Time: Single scan through vector
O(log n) to O(1) with clever bit manipulation
Much faster!

Real-World Impact:

Disk with 1 TB (250 million blocks):

Linked List Free Block Lookup:
  Average: Follow 100,000 pointers
  Each follow: 5ms disk access
  Total: 500 seconds = 8+ minutes! (Unacceptable)

Bit Vector Free Block Lookup:
  Vector size: 250 million bits = 31 MB
  Scan time: Milliseconds
  Total: < 0.1 seconds

Speed difference: 5000x faster!

Comparison: Reliability

Linked List Reliability (Fragile):

Free blocks are linked in a chain. Each free block contains a pointer to the next.

Chain: Block 200 → Block 500 → Block 700 → Block 1000 → ...

Scenario: Disk corruption hits Block 500

Before corruption:
  Block 200: Next = 500 ✅
  Block 500: Next = 700 ✅
  Block 700: Next = 1000 ✅

Disk error corrupts Block 500!

After corruption:
  Block 200: Next = 500 ✅
  Block 500: Next = 99999 ❌ (corrupted pointer)
  Block 700: Next = 1000 (unreachable!)

Result: Lost blocks 700, 1000, and everything after!
Entire rest of free list is gone!

Cascade Failure:

Linked List free pool has 1,000,000 blocks

One pointer corrupted:
  → Only 1 block is bad
  → But 500,000 subsequent blocks become inaccessible
  → 50% data loss potential!

This is a CASCADE FAILURE:
Single error propagates to destroy large parts of list

Bit Vector Reliability (Robust):

Bit vector: [1 0 1 1 0 0 1 0 1 0 1 1 ...]

Scenario: Bit corruption at position 5

Before corruption:
  Bit 5: 0 (block 5 is free) ✅

Corruption:
  Bit 5: 1 (block 5 now marked allocated)

After corruption:
  Block 5 appears allocated, can't be allocated again
  But only Block 5 is affected!
  Blocks 6, 8, 10 are still correctly marked free

Result: Only 1 block lost, 999,999 others unaffected

Localized Failure:

Bit vector with 1,000,000 blocks

One bit corrupted:
  → Only 1 block is affected
  → Other blocks are independent
  → 0.0001% data loss

This is GRACEFUL DEGRADATION:
Single corruption affects only that bit

Comparison: Compactness & Redundancy

Linked List Representation:

Total space needed:
  1,000,000 blocks × 4 bytes per pointer = 4 MB
  (Just to store pointers in free blocks)

Cannot easily create backup:
  Would need to allocate backup blocks elsewhere
  Uses disk space
  Slow to create and maintain

Bit Vector Representation:

Total space needed:
  1,000,000 bits = 125 KB (1 million bits, 1 byte = 8 bits)

Much more compact!

Easy to create redundant copies:
  125 KB × 3 = 375 KB for triple redundancy
  Still tiny compared to other fs overhead
  Can store in superblock + two other locations
  On corruption, restore from backup copy!

Real Example: Filesystem Recovery

Scenario: Filesystem crashes, disk may be corrupted

Linked List Approach:
  Read superblock
  Get head of free list: Block 1000
  Follow chain: 1000 → 1500 → 2000 → ...
  Hit corrupted pointer at block 5000
  Entire free list lost! Cannot allocate anything!
  Recovery: Need filesystem repair tool (fsck) to rebuild
  Time: Hours for large filesystem

Bit Vector Approach:
  Read superblock
  Bit vector location: Blocks 10-20
  Read primary copy (corrupted)
  Read backup copy #1 (good!)
  Use backup copy immediately
  Continue operations
  Recovery time: Seconds!

Summary Table

AspectLinked ListBit Vector
Finding contiguous blocksSlow O(n) - follow many pointersFast O(1) - scan bits
Space efficiency4 MB (for 1M blocks)125 KB (for 1M blocks)
Corruption impactCASCADE FAILURE (entire list lost)LOCALIZED (only affected bit)
Backup/RecoveryDifficult, space-intensiveEasy, can store multiple copies
PerformanceRequires disk I/O per pointerFits in memory, fast access
Fragmentation trackingCan’t easily find clustersEasy to find adjacent free blocks

Why Modern Systems Use Bit Vectors

Linux ext4: Uses bitmap blocks for free space tracking
Windows NTFS: Uses bitmap
macOS APFS: Uses bitmaps with optimizations
BSD FFS: Uses bitmaps

All modern filesystems: BIT VECTORS
(Linked lists abandoned in practice)

The Key Insight

Bit vectors win on:

  1. Performance: 5000x faster for finding free space
  2. Reliability: Graceful degradation vs cascade failure
  3. Compactness: 32x smaller representation
  4. Recoverability: Easy to maintain redundant copies

This is why the linked list approach (though mentioned in older textbooks) is no longer used in production systems!

Problem 46: I-node Based File System with Indirect Pointers

Consider an i-node based file system that uses 512 byte blocks and that a block number occupies 4 bytes. Assume that the i-node has 4 direct pointers, 1 indirect pointer and 1 double indirect pointer.

(a) List 2 items that are stored in the directory entry for this file.

(b) What’s the maximum possible file size in this file system?

(c) Assuming the i-node is in memory, if an application opens the file, reads 600 bytes and closes the file, how many disk blocks will be read from the disk. You must justify your answer.

(d) Assuming the i-node is in memory, if an application opens a 5000 byte sized file and reads it sequentially, how many disk blocks will be read from the disk. You must justify your answer.

Problem 47: Reconstructing Free-Space List

Consider a system where free space is kept in a free-space list. Suppose that the pointer to the free-space list, stored in the superblock, is lost. Can the system reconstruct the free-space list? Explain your answer.

Instructor Solution

Yes. Since the location of the root directory is still known (inferred from the superblock), you can start from the root directory and determine all blocks occupied by all files and directories in the system by recursively traversing all directories in the file system. Recall that the blocks occupied by a file or directory is reachable from the i-node of the file (in a FS which uses i-nodes). Once all occupied blocks are determined, the remaining blocks are put to the free list, thus completing the reconstruction.

Elaboration (by LLM):

Let me walk you through this disaster recovery scenario and explain why we can rebuild the free-space list even when the pointer is lost.

The Disaster

Imagine this scenario:

  • A filesystem crash or corruption
  • The superblock is partially readable, but the pointer to the head of the free-space list is corrupted/lost
  • Without this pointer, the system can’t allocate new blocks
  • The filesystem is in a dire state

Why We Can Recover

The key insight is that the root directory location is hardcoded in UNIX. This is a deliberate design decision for exactly this scenario.

Superblock:
├─ Pointer to free list: CORRUPTED ❌
├─ Magic number: OK ✅
├─ Block size: OK ✅
├─ Root directory location: OK ✅ (hardcoded!)
└─ Other metadata: OK ✅

The Recovery Algorithm

Step 1: Traverse All Files and Directories

Start from the root directory (location known) and recursively traverse:

Start at: Root directory (hardcoded location)

/ (root)
├─ /home/
│  ├─ /home/alice/
│  │  ├─ thesis.pdf (i-node 5000)
│  │  ├─ notes.txt (i-node 5001)
│  │  └─ /home/alice/projects/ (i-node 5002)
│  └─ /home/bob/
│     └─ code.c (i-node 5003)
├─ /usr/
│  ├─ /usr/bin/
│  │  ├─ ls (i-node 5004)
│  │  └─ cat (i-node 5005)
│  └─ /usr/lib/
└─ /etc/
   └─ passwd (i-node 5006)

Step 2: For Each File/Directory, Read Its I-node

Each i-node contains block pointers:

I-node 5000 (thesis.pdf):
├─ Direct pointer 0: Block 100
├─ Direct pointer 1: Block 101
├─ Direct pointer 2: Block 102
├─ Single indirect: Block 200
└─ ... (more pointers)

Record: Blocks 100, 101, 102, 200, ... are OCCUPIED

Step 3: Mark All Blocks as Occupied

Following all i-node pointers:

As we traverse, we build a set of occupied blocks:
Occupied = {100, 101, 102, 110, 111, 200, 201, ...}

(Could be thousands or millions of blocks)

Step 4: Compute Free Blocks

Total blocks in filesystem: N
Occupied blocks: O (computed in step 3)
Free blocks: N - O

Free = {103, 104, 105, 106, 109, 112, ..., N}

Step 5: Rebuild Free-Space List

Insert all free blocks back into the free-space list structure:

If using linked list:
  Block 103 → Block 104 → Block 105 → ...

If using bit vector:
  Bit 0: 1 (occupied)
  Bit 1: 1 (occupied)
  ...
  Bit 103: 0 (free)
  Bit 104: 0 (free)
  ...

Update superblock pointer to head of rebuilt list

Why This Works

Key Principle: Block Ownership Chain

Every block either:

  1. Contains filesystem metadata (superblock, i-node table, etc.)
  2. Is pointed to by an i-node (file data or directory data)
  3. Is free
All blocks in filesystem:

Metadata blocks (known):
├─ Superblock (block 0)
├─ I-node table (blocks 1-10)
├─ Bitmap (blocks 11-12)
└─ Direct pointers

Data blocks (found via i-node traversal):
├─ File data
├─ Directory data
└─ Indirect blocks

Everything else: FREE

Example: Small Filesystem

Disk: 20 blocks total (0-19)

Block allocation:
0: Superblock
1-2: I-node table
3-4: Root directory
5: File A data
6-7: File B data
8: Indirect block (for File C)
9-10: File C data
11-15: Free
16-19: Free

Recovery process:
1. Start at root (block 3-4)
2. Find File A (i-node 100) → points to block 5
3. Find File B (i-node 101) → points to blocks 6-7
4. Find File C (i-node 102) → points to indirect block 8 → points to blocks 9-10
5. Occupied blocks: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
6. Free blocks: {11, 12, 13, 14, 15, 16, 17, 18, 19}
7. Rebuild free-space list from {11, 12, 13, 14, 15, 16, 17, 18, 19}

Why Root Directory Location is Special

UNIX filesystems hardcode the root i-node number:

UNIX Convention: Root i-node is ALWAYS i-node number 2

Why not i-node 0 or 1?
  i-node 0: Reserved (invalid)
  i-node 1: Historical reasons (reserved)
  i-node 2: Root directory (by convention)

This is standardized across all UNIX systems!

Time Complexity

For filesystem with:
  N total blocks
  M files/directories

Traversal time:
  O(N) - visit every block in worst case
  O(M × D) - better estimate (M files, D avg depth)

For typical filesystem: Minutes to hours
  (Even 1 million files takes reasonable time)

Why This Matters

Real-world scenario: Ransomware attacks filesystem

Attacker corrupts:
  ✅ Free-space list
  ✅ Superblock pointers
  ✅ Bitmap blocks

But attacker CANNOT corrupt:
  ❌ Root directory (special handling)
  ❌ All i-nodes (scattered)
  ❌ All file data (scattered)

Recovery: Still possible with this algorithm!
Filesystem survives partial corruption.

Tools That Use This

fsck (filesystem check):
  1. Reads root directory
  2. Traverses all i-nodes
  3. Identifies lost/orphaned blocks
  4. Rebuilds metadata

tune2fs:
  Can also rebuild free-space list

Windows chkdsk:
  Similar algorithm for NTFS

macOS Disk Utility:
  Repair uses similar recovery

Limitations

This recovery fails if:
  1. Root directory completely corrupted
     → Can't start traversal
  2. I-node table corrupted
     → Can't find i-node locations
  3. Multiple cascading failures
     → May lose data

But: Works great for single component failure!

This is a beautiful example of filesystem robustness design - by making the root directory location hardcoded and having all file locations reachable from i-nodes, the system can recover even from significant metadata corruption!

Problem 48: Contiguous Allocation with Read and Write Operations

Consider a file system that employs contiguous allocation with a block size of 1024 bytes, and assume you have the following directory:

FileStartLength
a.txt52
b.txt23

Code block (a)

char buffer[2048];
int fd = open("a.txt", READ_ONLY);
read(fd, buffer, 100);    // [1]
read(fd, buffer, 900);    // [2]
read(fd, buffer, 1000);   // [3]
close(fd);

Code block (b)

char buffer[2048];
int fd = open("b.txt", READ_WRITE);
seek(fd, 1000);           // [1]
read(fd, buffer, 1100);   // [2]
write(fd, buffer, 2000);  // [3]
close(fd);

(a) For each of the read calls in code block (a), which physical disk blocks are actually read from the disk (if any)? Justify your answer.

(b) For each of the seek/read calls in code block (b), which physical disk blocks are actually read from the disk (if any)? Justify your answer.

(c) Would the last write call in code block (b) succeed? Why or why not? Justify your answer.

Problem 49: File System with Contiguous Allocation

Consider a file system that uses contiguous allocation, i.e., files are stored contiguously on the disk. Assume that the disk partition over which the file system resides contains 15 blocks, labeled 1 through 15. The block size is 512 bytes, the first block contains the superblock, and the rest of the blocks are initially empty. The file system uses a single level directory system, i.e., only the root directory exists.

FileSize (bytes)
a.txt600
b.dat300
c.avi1100
d.txt1600

(a) Show the contents of the root directory.

LLM Solution

Root Directory Contents:

With contiguous allocation, directory entries store filename, starting block, and length.

Disk Layout:

Block 1: [Superblock]
Blocks 2-4: a.txt (600 bytes = 2 blocks)
Block 5: b.dat (300 bytes = 1 block)
Blocks 6-7: c.avi (1100 bytes = 3 blocks... wait, need to recalculate)

Let me recalculate with 512-byte blocks:
- a.txt: 600 bytes = ⌈600/512⌉ = 2 blocks
- b.dat: 300 bytes = ⌈300/512⌉ = 1 block
- c.avi: 1100 bytes = ⌈1100/512⌉ = 3 blocks
- d.txt: 1600 bytes = ⌈1600/512⌉ = 4 blocks
Total: 2+1+3+4 = 10 blocks (fits in 15 blocks)

Root Directory (Block allocation shown below):

FilenameStarting BlockLength (blocks)Size (bytes)
a.txt22600
b.dat41300
c.avi531100
d.txt841600

(b) Show the contents of the disk. In particular show the blocks that each file occupies on the disk.

LLM Solution

Disk Layout (15 blocks total):

Block 1:  [Superblock]
Block 2:  [a.txt data (part 1: bytes 0-511)]
Block 3:  [a.txt data (part 2: bytes 512-599)] [padding 100 bytes]
Block 4:  [b.dat data (bytes 0-299)] [padding 212 bytes]
Block 5:  [c.avi data (part 1: bytes 0-511)]
Block 6:  [c.avi data (part 2: bytes 512-1023)]
Block 7:  [c.avi data (part 3: bytes 1024-1099)] [padding 412 bytes]
Block 8:  [d.txt data (part 1: bytes 0-511)]
Block 9:  [d.txt data (part 2: bytes 512-1023)]
Block 10: [d.txt data (part 3: bytes 1024-1535)]
Block 11: [d.txt data (part 4: bytes 1536-1599)] [padding 368 bytes]
Blocks 12-15: [Free space - 4 blocks = 2048 bytes]

Summary:

FileBlocksStatus
Superblock1Used
a.txt2-3Used (contiguous)
b.dat4Used
c.avi5-7Used (contiguous)
d.txt8-11Used (contiguous)
Free12-15Empty (4 blocks = 2048 bytes)

Problem 50: Hierarchical Directory Simulation

Given a filesystem that supports only one directory (i.e. it’s not possible to create sub-directories) and 8 character filenames, explain how you could write a user level library that simulates hierarchical directories. That is, the given file system only supports the following file operations: creat, open, read, write, seek, close. Your library must not only support these file operations, but also support the following directory operations: mkdir, rmdir, chdir. Describe how you would implement these file and directory operations on top of the given file system and specifically discuss what happens when the user opens /a.txt and /dir1/a.txt and calls mkdir(“/dir1/dir2”);

Instructor Solution

The complication arises because the user can create a file/directory with the same name in different directories in a hierarchical file system. For example you can create a file “/a.txt” and “/dir1/a.txt” in a hierarchical file system, but in a single directory file system, you cannot have 2 files with the same name. Our idea is then to implement what the file system should have done at the user level: Recall that a directory is nothing but a special file that contains meta-data. In other words, a directory contains the file/directory names that are in it. Since the File System supports a single-level directory, we need to make sure there are no name conflicts. To do this we convert a user supplied file/directory name into a random filename that is unique within the OS file system and supply that filename to the OS. We then note this mapping from the user supplied filename to the OS maintained filename in the directory, which is implemented as a special file. Since we have 8 characters long filenames and each char can hold a value between 0 and 255, you have 256^8 possible filename in this system.

In this system, each directory is implemented by a file in the OS file system. The contents of the directory consist of pairs (user supplied filename, user level library generated filename). So when a user opens say “/a.txt”, the user level library opens the file implementing the directory ”/”, searches for a.txt. Then finds mapping to the filename understood by the OS file system and calls OS’s open system call with that name.

When a user opens say “/dir1/a.txt”, the user level library opens the file implementing the directory ”/”, searches for dir1. When it finds the file implementing “/dir1” (say “ttghd5td”), the library opens that file and searches for “a.txt” within that file and finds the mapping to the filename understood by the OS file system (say “4fdsd4re”) and calls OS’s open system call with that name.

Elaboration (by LLM):

Let me walk through this clever abstraction layer approach that transforms a flat filesystem into a hierarchical one!

The Core Problem

Native filesystem: FLAT (single directory only)

User wants:
/a.txt
/dir1/a.txt         ← Same filename in different "directories"!
/dir1/dir2/file.txt

Problem:
The OS can't have two files named "a.txt" at the same time.
We need to simulate this at the user level!

The Solution: Namespace Translation

The key idea is to create a namespace translation layer:

User-level view:        OS-level view:
/a.txt            →     "a1b2c3d4" (random, unique)
/dir1/a.txt       →     "x5y6z7w8" (different random, unique)
/dir1/dir2/f.txt  →     "p9q8r7s6" (another random)

Mapping stored:
Directory "/" → file "root_dir" → contains {("a.txt", "a1b2c3d4"), ("dir1", "t2u3v4w5"), ...}
Directory "/dir1" → file "t2u3v4w5" → contains {("a.txt", "x5y6z7w8"), ("dir2", "m1n2o3p4"), ...}

Understanding the Namespace

Step 1: Random Filename Generation

With 8-character filenames and 256 possible values per character:

Possible filenames: 256^8 = 1.84 × 10^19 combinations

For practical purposes:
  Filesystem will have: millions to billions of files
  Random collision probability: Negligible

Use strong random generation to avoid collisions

Step 2: Directory as Special File

Each directory in the hierarchical view is stored as a special file in the OS:

Traditional hierarchical:
Directory structure:
/
├─ /home/
├─ /etc/
├─ /var/

Our flat simulation:
OS sees:
root_dir (file)
h4e5m2i9 (file)  ← actually /home/
e7t3c5f2 (file)  ← actually /etc/
v1a2r3d4 (file)  ← actually /var/

Step 3: Directory File Contents

Each “directory file” contains a list of mappings:

File "root_dir" (simulating /):
[
  ("a.txt", "a1b2c3d4"),
  ("dir1", "t2u3v4w5"),
  ("bin", "b3i5n2x1"),
  ...
]

File "t2u3v4w5" (simulating /dir1/):
[
  ("a.txt", "x5y6z7w8"),     ← SAME NAME, different file!
  ("dir2", "m1n2o3p4"),
  ("notes.txt", "n7o3t4e5"),
  ...
]

File "m1n2o3p4" (simulating /dir1/dir2/):
[
  ("file.txt", "p9q8r7s6"),
  ...
]

Path Traversal Example

Opening /a.txt:

User calls: fp = open("/a.txt", "r")

Library does:
  1. Current directory = "/" (root)
  2. Parse path: components = ["a.txt"]
  3. Root directory is stored in file "root_dir"
  4. Open OS file "root_dir"
  5. Search for entry "a.txt"
  6. Found! Maps to OS filename "a1b2c3d4"
  7. Call OS: open("a1b2c3d4", "r")
  8. Return file descriptor

Timeline:
User: open("/a.txt")
  ↓
Library: decompose path "/"  + "a.txt"
  ↓
Library: load directory "/" (file "root_dir")
  ↓
Library: search for "a.txt" in directory entries
  ↓
Library: find mapping "a1b2c3d4"
  ↓
Library: call OS open("a1b2c3d4")
  ↓
OS: returns fd
  ↓
User: receives fd

Opening /dir1/a.txt:

User calls: fp = open("/dir1/a.txt", "r")

Library does:
  1. Current directory = "/" (root)
  2. Parse path: components = ["dir1", "a.txt"]
  3. Process component "dir1":
     - Load directory "/" (file "root_dir")
     - Search for "dir1"
     - Found! Maps to OS filename "t2u3v4w5"
     - This is the new current directory
  4. Process component "a.txt":
     - Load directory "/dir1" (file "t2u3v4w5")
     - Search for "a.txt"
     - Found! Maps to OS filename "x5y6z7w8"
     - Call OS: open("x5y6z7w8", "r")
  5. Return file descriptor

Key insight:
  /a.txt uses file "a1b2c3d4"
  /dir1/a.txt uses file "x5y6z7w8"
  Different files! No conflict!

Creating mkdir(“/dir1/dir2”):

User calls: mkdir("/dir1/dir2")

Library does:
  1. Parse path: components = ["dir1", "dir2"]
  2. Traverse to /dir1:
     - Load "/" (root_dir)
     - Search "dir1" → found "t2u3v4w5"
  3. Create new directory entry in /dir1:
     - Generate random filename: "m1n2o3p4"
     - Create OS file "m1n2o3p4" (empty initially)
     - Add entry to "t2u3v4w5": ("dir2", "m1n2o3p4")
     - Write back "t2u3v4w5" with new mapping
  4. Return success

Result:
  /dir1/dir2/ now exists
  Backed by OS file "m1n2o3p4"

Complex Example: Three Operations

Scenario:

1. User creates /a.txt
   Library:
     Create OS file "a1b2c3d4"
     Update "/" (root_dir): add ("a.txt", "a1b2c3d4")

2. User creates /dir1/
   Library:
     Create OS file "t2u3v4w5" (initially empty directory)
     Update "/" (root_dir): add ("dir1", "t2u3v4w5")

3. User creates /dir1/a.txt
   Library:
     Create OS file "x5y6z7w8"
     Load directory "/dir1" (t2u3v4w5)
     Update it: add ("a.txt", "x5y6z7w8")
     Write back to OS file "t2u3v4w5"

Result in OS flat filesystem:
  a1b2c3d4 ← /a.txt data
  x5y6z7w8 ← /dir1/a.txt data
  t2u3v4w5 ← /dir1/ directory listing
  root_dir ← / directory listing

No naming conflicts! Both "a.txt" files can exist!

Key Innovations

1. Namespace Separation

User sees: Hierarchical paths
  /a.txt, /dir1/a.txt, /dir1/dir2/f.txt

OS sees: Flat namespace
  a1b2c3d4, x5y6z7w8, p9q8r7s6, etc.

Library translates between them

2. Directories as Special Files

Traditional approach:
  Directory = special inode with directory-specific metadata

Our approach:
  Directory = regular file containing mapping entries

Each entry: (user_name, os_name)

3. Random Name Generation

Why random?
  Avoid collisions
  Simplify implementation (no hash function needed)

With 256^8 namespace:
  Can support billions of files
  Collision probability: essentially zero

Implementation Structure

// Pseudocode structure
 
struct DirectoryEntry {
    char user_name[256];      // User-facing name
    char os_name[8];          // OS-facing random name
};
 
// When user calls open("/dir1/a.txt", "r"):
FileDescriptor open_hierarchical(const char* path, int flags) {
    // 1. Decompose path
    vector<string> components = split(path, '/');
 
    // 2. Start at root
    string current_dir = "root_dir";
 
    // 3. Traverse path components
    for (int i = 0; i < components.size() - 1; i++) {
        // Load directory file
        vector<DirectoryEntry> entries = load_directory(current_dir);
 
        // Find component
        string next_name = find_mapping(entries, components[i]);
        if (not found) return ERROR;
 
        // Move to next directory
        current_dir = next_name;
    }
 
    // 4. Open final file
    vector<DirectoryEntry> final_dir = load_directory(current_dir);
    string os_filename = find_mapping(final_dir, components.back());
 
    // 5. Call OS open
    return os_open(os_filename, flags);
}

Limitations and Trade-offs

Pros:
  ✅ Simulates full hierarchical filesystem
  ✅ Arbitrary depth (unlike OS limitation)
  ✅ No OS changes needed
  ✅ Portable across different OSs

Cons:
  ❌ Extra directory lookup overhead
  ❌ Path traversal is slower
  ❌ Each level requires reading a file
  ❌ Larger directory files = more I/O

Performance impact: Minimal
  (Usually not performance-critical for application layer)

Why This Design is Elegant

The instructor’s solution recognizes that:

  1. Directory = File containing metadata: By treating directories as special files, we can store them in the same flat namespace

  2. Name mapping is the solution: Using random names avoids collisions, allowing multiple “a.txt” files in different “directories”

  3. Recursive traversal works: Each directory file contains pointers to subdirectory files, enabling tree traversal

  4. User abstraction preserved: Users see the familiar /path/to/file interface without knowing about the underlying flat namespace

This is a brilliant example of abstraction layering - implementing a richer abstraction (hierarchical directories) on top of a simpler one (flat filesystem) at the user level!

Problem 51: I-node Based File System with Multiple Indirection Levels

Consider a file system (FS) that has both a logical and physical block sizes of 512 bytes. Assume that the file system uses i-nodes. Within a i-node, the FS stores 10 direct pointers, 1 single indirect pointer, 1 double indirect pointer and 1 triple indirect pointer. Further assume that disk block addresses require 4 bytes.

(a) What’s the maximum size of a file stored in this file system.

Instructor Solution

Since disk block addresses are 4 bytes, each block can hold 512/4 = 128 disk block addresses. So the maximum number of blocks that a file in this FS can hold is: 10 (direct pointers) + 128 (single indirect pointers) + 128² (double indirect pointers) + 128³ (triple indirect pointers). The maximum file size is then (10 + 128 + 128² + 128³)*512 bytes.

Elaboration (by LLM):

Let me break down this calculation step-by-step so you understand the indirect pointer mechanics.

Block Capacity Calculation

With 512-byte blocks and 4-byte block addresses:

Block size: 512 bytes
Block address size: 4 bytes
Addresses per block: 512 / 4 = 128 addresses

File Block Capacity

An i-node has:

  • 10 direct pointers
  • 1 single indirect pointer
  • 1 double indirect pointer
  • 1 triple indirect pointer

Direct Pointers (10 blocks):

Direct pointers point directly to data blocks

Inode:
├─ Direct[0] → Block 100 (data)
├─ Direct[1] → Block 101 (data)
├─ ...
└─ Direct[9] → Block 109 (data)

Blocks accessible: 10

Single Indirect Pointer (128 blocks):

Single indirect pointer points to an index block containing 128 pointers

Inode:
└─ Indirect → Index Block X
             ├─ Pointer[0] → Block 110 (data)
             ├─ Pointer[1] → Block 111 (data)
             ├─ ...
             └─ Pointer[127] → Block 237 (data)

Blocks accessible: 128

Double Indirect Pointer (128² = 16,384 blocks):

Double indirect points to an index block of index blocks

Inode:
└─ Double Indirect → Index Block Y
                   ├─ Pointer[0] → Index Block Y0
                   │             ├─ Pointer[0] → Block 300 (data)
                   │             ├─ ...
                   │             └─ Pointer[127] → Block 427 (data)
                   ├─ Pointer[1] → Index Block Y1
                   │             ├─ Pointer[0] → Block 428 (data)
                   │             ├─ ...
                   │             └─ Pointer[127] → Block 555 (data)
                   ├─ ...
                   └─ Pointer[127] → Index Block Y127
                                   ├─ Pointer[0] → Block (data)
                                   ├─ ...
                                   └─ Pointer[127] → Block (data)

First level: 128 index blocks
Second level: 128 × 128 = 16,384 pointers to data blocks
Blocks accessible: 128² = 16,384

Triple Indirect Pointer (128³ = 2,097,152 blocks):

Triple indirect points to index blocks of index blocks of index blocks

This goes three levels deep:

Inode:
└─ Triple Indirect → Index Block Z
                   ├─ Pointer[0] → Index Block Z0
                   │             ├─ Pointer[0] → Index Block Z0[0]
                   │             │             ├─ Pointer[0] → Block (data)
                   │             │             ├─ ...
                   │             │             └─ Pointer[127]
                   │             ├─ ...
                   │             └─ Pointer[127] → Index Block Z0[127]
                   ├─ ...
                   └─ Pointer[127] → Index Block Z127

Three levels deep:
  Level 1: 128 second-level index blocks
  Level 2: 128 × 128 = 16,384 third-level index blocks
  Level 3: 128 × 128 × 128 = 2,097,152 data blocks
Blocks accessible: 128³ = 2,097,152

Total File Capacity

Adding all pointers:

Direct blocks:        10
Single indirect:      128
Double indirect:      16,384 (128²)
Triple indirect:      2,097,152 (128³)
                      ─────────────
Total blocks:         2,113,674

Block size: 512 bytes
Maximum file size: 2,113,674 × 512 = 1,082,212,352 bytes
                                    ≈ 1.08 GB (gigabytes)
                                    ≈ 1,083 MB (megabytes)

Practical Interpretation

For comparison:
  Max direct only (10 blocks):              5 KB (tiny)
  Max direct + single (138 blocks):         69 KB
  Max direct + single + double:             8 MB
  Max with all four levels:                 ~1 GB

The triple indirect pointer enables:
  Large files up to 1 GB
  Still manageable overhead (3 index reads for very large files)

Why This Design Makes Sense

Small files (< 5 KB):
  Use only direct pointers
  No extra block reads
  Super fast

Medium files (5 KB - 69 KB):
  Need single indirect
  1 extra block read for index

Large files (69 KB - 8 MB):
  Need double indirect
  2 extra block reads for nested indices

Very large files (8 MB - 1 GB):
  Need triple indirect
  3 extra block reads for deeply nested indices

The design scales gracefully with file size while minimizing overhead for typical files!

(b) Consider a 220 byte file stored in this file system. Assume that an application issues an open system call to open this file, located in the current directory. Further assume that the file system uses inodes and the i-node for the current directory is already in memory. Describe the actions taken by the file system during open() system call for this file. (Assume that the current directory contents fit into a single disk block). How many blocks are read from the disk? Justify your answer.

Instructor Solution

When the open system call is executed on a file, the FS first reads the contents of the current directory from the disk. To do this the FS looks at the i-node of the current directory and follows the first direct pointer (assuming the directory contents fit into a single disk block). After reading the contents of the directory to memory, the FS searches the filename within the directory and learns the i-node of the file. The FS then reads the i-node of the file to complete the open system call. Of course, the protections are checked at each step. In conclusion the FS reads a total of 2 blocks.

Elaboration (by LLM):

Let me walk through the open() sequence and explain why exactly 2 blocks are read.

What Happens During open()

When you call open("/path/to/file", O_RDONLY), the OS must:

  1. Find the file’s i-node
  2. Load the i-node into memory
  3. Perform permission checks
  4. Return a file descriptor

Step-by-Step Breakdown

Precondition: The i-node for the current directory is already in memory

In memory:
Current_dir_inode = {
    ...
    direct[0] = Block 500  (directory data block)
    ...
}

Step 1: Read Directory Data Block (1 block)

Action:
  Look at current directory i-node (already in memory)
  Follow direct pointer[0] (points to block 500)
  Read block 500 from disk

Content of block 500:
  [
    ("file1.txt", inode_num: 2000),
    ("file2.txt", inode_num: 2001),
    ("myfile", inode_num: 2002),    ← We want this!
    ...
  ]

OS does:
  Search directory entries for "myfile"
  Find: inode_num = 2002

Disk reads so far: 1 block

Step 2: Read File’s I-node (1 block)

Action:
  Now we know the file's i-node number (2002)
  I-nodes are stored in the i-node table
  Read i-node 2002 from disk

I-node 2002 contains:
  {
    file_size: 2^20 = 1,048,576 bytes
    permissions: 0644
    owner_uid: 1000
    timestamps: ...
    direct[0-9]: Block numbers for data
    single_indirect: Block number for index
    double_indirect: Block number for index
    triple_indirect: Block number for index
  }

OS does:
  Load i-node into memory
  Check permissions (uid, gid, mode bits)
  Allocate file descriptor
  Return to user

Disk reads so far: 1 + 1 = 2 blocks

Why No Data Blocks Are Read

The open() system call does NOT read file data. It only:

  1. Finds the i-node
  2. Loads the i-node

Actual data reading happens later when you call read():

open("myfile")     ← Read directory + i-node = 2 blocks
read(fd, ...)      ← Read file data blocks (happens later)

Why Exactly 2 Blocks?

Block 1: Directory data
  - Contains filename → i-node mapping
  - Needed to find which i-node belongs to the file

Block 2: File's i-node
  - Contains file metadata
  - Contains block pointers to file data
  - Needed for all subsequent operations

That's it! No data blocks needed yet.

Timeline of open()

User: open("myfile", O_RDONLY)
  ↓
OS: Check current directory i-node (already in memory)
  ↓
OS: Read directory data (block 500) ← 1st disk read
  ↓
OS: Search for "myfile" in directory
  ↓
OS: Found! inode_num = 2002
  ↓
OS: Read i-node 2002 from i-node table ← 2nd disk read
  ↓
OS: Permission check
  ↓
OS: Allocate file descriptor
  ↓
User: receives fd

Why 2^20 Byte File Still Costs Only 2 Blocks

The file size (1 MB) doesn’t matter for open()!

open(2 KB file):   2 blocks read (directory + i-node)
open(100 MB file): 2 blocks read (directory + i-node)
open(1 GB file):   2 blocks read (directory + i-node)

open() doesn't care about file size!
File size only matters for read() operations.

What Would Be Different if Directory Didn’t Fit in 1 Block?

If the directory contained thousands of entries:

Current (directory fits in 1 block):
  Read directory: 1 block

If directory needed multiple blocks:
  Read all directory blocks needed to find the entry
  Example: 3-block directory = 3 block reads

Then: Read file i-node = 1 block

Total: Could be 4+ blocks

But the problem states: “current directory contents fit into a single disk block”

So we only read 1 directory block.

Key Insight

The open() operation is designed to be fast by:

  1. Avoiding data reads: Only load metadata
  2. Minimizing i-node reads: Load only one i-node (the file’s)
  3. Assuming cached directory i-node: Directory’s i-node is already in memory
  4. Using directory cache: In real systems, directory blocks are often cached too

This is why open() is typically O(1) in performance (assuming cached directory i-node).

(c) Assume that the file described in part (a) is already open. Further assume that the process that opened the file reads first 100 bytes of the file. How many disk blocks are read from the disk for this operation. Justify your answer.

Instructor Solution

First 100 bytes of the file resides inside the first block of the file. Since the i-node of the file is in memory (read during open), the FS simply reads the first direct block from the disk, copies the first 100 bytes to the user’s buffer and returns from the read system call. So FS reads just 1 block from the disk during this operation.

Elaboration (by LLM):

This is the simplest case: reading small data that fits in the first block.

Setup

File state:

  • File is already open
  • File i-node is in memory (loaded during open())
  • File size: 2^20 bytes
  • We want to read: first 100 bytes

Where Are First 100 Bytes?

File layout:
Block 0: bytes 0-511        ← First 100 bytes are HERE
Block 1: bytes 512-1023
Block 2: bytes 1024-1535
...

Byte range [0, 99] is entirely within Block 0.

Direct Pointer Access

The i-node has direct pointers:

I-node (in memory):
├─ Direct[0] → Block 1000 (contains bytes 0-511)
├─ Direct[1] → Block 1001 (contains bytes 512-1023)
├─ Direct[2] → Block 1002
├─ ...
└─ Direct[9] → Block 1009

Read Operation

User: read(fd, buffer, 100)

OS does:
1. File pointer is at byte 0
2. Calculate: which block contains byte 0?
   Block number = 0 / 512 = 0
   Offset in block = 0 % 512 = 0

3. Look in i-node: Direct[0] = Block 1000
   (I-node is already in memory, no disk read needed!)

4. Read Block 1000 from disk ← 1 disk read

5. Extract bytes 0-99 from the block

6. Copy to user buffer

7. Return

Total Disk Reads: 1 block

Why Only 1 Block?

Key observations:

1. I-node is already in memory
   → No need to read it again
   → Direct pointers are instantly available

2. Bytes 0-99 fit in one block (512 bytes)
   → Only need to read Block 0
   → Don't need blocks 1, 2, etc.

3. Direct pointer is immediately available
   → No need to read indirect blocks
   → Can go straight to the data

Contrast with Other Scenarios

If we read first 100 bytes (like this):
  I-node in memory: ✅
  Block 0 is direct: ✅
  Total reads: 1 block

If we read bytes 10,000-10,099 (in large file):
  Would need indirect block reads
  Total reads: 2+ blocks

If i-node wasn't in memory:
  Would need to read i-node first
  Total reads: 2+ blocks

But this case is optimal: i-node cached + small read from direct block.

(d) Consider the process in part (c) that has opened the file and read the first 100bytes. Assume now that the same process now reads the next 800 bytes of the file, that is, bytes 100-900. How many disk blocks are read from the disk for this operation. Justify your answer.

Instructor Solution

First block is already in memory. So the FS reads blocks 2 and 3, copies appropriate many bytes to the user buffer and returns from the read system call. So the FS reads 2 blocks during this operation.

Elaboration (by LLM):

Now we’re reading a larger range that spans multiple blocks.

Setup

Process state:

  • File is open
  • I-node in memory
  • Just read bytes 0-99
  • File pointer now at byte 100
  • Block 0 is in memory (from previous read)
  • Now reading: bytes 100-900

Which Blocks Contain Bytes 100-900?

Byte calculations:
Byte 100: Block = floor(100 / 512) = 0 (offset 100)
Byte 900: Block = floor(900 / 512) = 1 (offset 388)

Blocks needed:
Block 0: Bytes 0-511    (contains bytes 100-511) ✅ Already in cache
Block 1: Bytes 512-1023 (contains bytes 512-899)
Block 2: Not needed (bytes 1024+)

Bytes per Block

Block 0 portion:
  Needed: bytes 100-511
  Count: 512 - 100 = 412 bytes
  Status: In memory from previous read
  New read: NO

Block 1 portion:
  Needed: bytes 512-899
  Count: 900 - 512 = 388 bytes
  Status: NOT in memory
  New read: YES ← Must read this block

Total bytes: 412 + 388 = 800 ✅

Read Operation

User: read(fd, buffer, 800)  // bytes 100-900

OS does:
1. File pointer at byte 100
2. Need to read 800 bytes
3. Identify affected blocks:
   Start block: floor(100 / 512) = 0
   End block: floor(899 / 512) = 1

4. For Block 0:
   Is it cached? YES (from part c)
   Extract bytes 100-511 (412 bytes)
   No disk read needed

5. For Block 1:
   Is it cached? NO
   Read Block 1 from disk ← 1 disk read
   Extract bytes 512-899 (388 bytes)

6. Combine bytes: 412 + 388 = 800 bytes
7. Copy to user buffer
8. Return

Total Disk Reads: 1 block (Block 1 only)

Cache Hit vs Cache Miss

Block 0:
  Status: CACHE HIT (from previous read)
  Cost: 0 disk operations

Block 1:
  Status: CACHE MISS
  Cost: 1 disk read

This demonstrates cache benefits:
- Sequential reads benefit from caching
- Don't have to re-read same block twice

Instructor Solution Note

The instructor says “blocks 2 and 3” but I calculate Block 1 only. This might be due to:

  • Different block numbering (0-indexed vs 1-indexed)
  • Or a different interpretation of the problem

However, the key insight is: 1 new block needs to be read because the previous block (Block 0) was already in the cache.

(e) Consider the process in part (c) that has opened the file and read the first 100bytes. Assume that the process now seeks to byte 215 of the file, and reads 1040 bytes. How many disk blocks are read from the file? Justify your answer.

Instructor Solution

215 resides in block 215/29 = 26 = 64. Since block 64 has to be reached from the first indirect pointer, the FS first reads the first indirect block. Since the user is requesting 1040 bytes, the FS reads blocks 64, 65 and 66 by taking the addresses of these blocks from the single indirect block. So a total of 4 disk blocks are read during this operation.

Elaboration (by LLM):

This is where indirect pointers become necessary! Let me walk through the calculation.

Calculate Target Block

We want to read byte 2^15:

2^15 = 32,768 bytes

Which block contains byte 32,768?
Block number = 32,768 / 512 = 64
Offset in block = 32,768 % 512 = 0

So we start at Block 64, offset 0

Can Direct Pointers Reach Block 64?

I-node has:

  • Direct pointers: cover blocks 0-9 (only 10 blocks)
  • Single indirect: covers blocks 10-137 (128 blocks total)
  • Double indirect: covers blocks 138-16,521
  • Triple indirect: covers blocks 16,522+

Block 64 is in the range [10, 137], so it’s reachable via single indirect.

Block 64 in context of i-node pointers:
├─ Direct[0-9]: blocks 0-9
├─ Single Indirect: blocks 10-137  ← Block 64 is HERE
├─ Double Indirect: blocks 138-...
└─ Triple Indirect: blocks ...

Access Pattern for Block 64

To access Block 64:

Step 1: Read single indirect block
  I-node (in memory) → Single Indirect pointer
  Single Indirect points to: Block X (the index block)
  Read Block X from disk ← 1st disk read

Step 2: Find Block 64 in the index
  Block X contains 128 pointers
  Index into Block X: 64 - 10 = 54
    (subtract 10 because direct pointers cover 0-9)
  Pointer[54] in Block X = address of Block 64

Step 3: Read Block 64
  Address from Pointer[54] = Block 1000 (example)
  Read Block 1000 from disk ← 2nd disk read

How Many Bytes Span Blocks 64-66?

We want to read 1040 bytes starting from byte 2^15:

Byte range: [32,768, 32,768 + 1040) = [32,768, 33,808)

Block 64: bytes 32,768-33,279   (512 bytes)
Block 65: bytes 33,280-33,791   (512 bytes)
Block 66: bytes 33,792-34,303   (512 bytes)

Byte coverage:
From Block 64: bytes 32,768-33,279 (512 bytes) ← all of Block 64
From Block 65: bytes 33,280-33,791 (512 bytes) ← all of Block 65
From Block 66: bytes 33,792-33,807 (16 bytes)  ← partial Block 66

Total: 512 + 512 + 16 = 1,040 bytes ✅

Which Blocks Must Be Read?

Indirect block: Block X (to find pointers to 64, 65, 66)
Data Block 64: Read from disk
Data Block 65: Read from disk
Data Block 66: Read from disk

Total blocks read: 1 indirect + 3 data = 4 blocks

Read Operation Timeline

User: seek(fd, 2^15); read(fd, buffer, 1040)

OS does:
1. I-node in memory (from open)
2. File pointer = 32,768
3. Need 1,040 bytes
   Blocks needed: 64, 65, 66

4. Block 64 is indirect
   Look in i-node: SingleIndirect pointer
   Read the single indirect block ← 1st disk read

5. From single indirect block, get pointers to:
   Address of Block 64
   Address of Block 65
   Address of Block 66

6. Read Block 64 ← 2nd disk read
   Extract bytes 0-511 (512 bytes)

7. Read Block 65 ← 3rd disk read
   Extract bytes 0-511 (512 bytes)

8. Read Block 66 ← 4th disk read
   Extract bytes 0-15 (16 bytes)

9. Copy total 1,040 bytes to user buffer
10. Return

Total Disk Reads: 4 blocks

Breakdown:

  • 1 indirect block (the index block)
  • 3 data blocks (64, 65, 66)

Why Indirect Pointer Is Necessary

Without going through indirect:
  Can't reach Block 64 (only direct 0-9 available)
  Would have to store pointers for all 2 million blocks
  I-node would be huge!

With single indirect:
  Indirect block holds 128 pointers
  Covers blocks 10-137
  Block 64 is perfectly in this range
  Only pay 1 extra disk read to reach it

If we needed Block 200 instead:
  Block 200 > 137 (out of single indirect range)
  Would need double indirect
  Cost would be higher: 1 double + 1 single + 3 data = 5+ blocks

Key Insight: Cost-Benefit of Indirection

The cost of indirect block access is worth it:

Files up to 5 KB:       Use only direct (0-9 blocks, 10×512B)
Files up to 70 KB:      Use single indirect (10-137 blocks)
Files up to 8 MB:       Use double indirect (138-16,521 blocks)
Files up to 1 GB:       Use triple indirect (16,522+ blocks)

For large file access, paying 1-3 indirect reads to get blocks
is much better than having no way to address them!

Compare to naive approach:

  • Store all 2 million block pointers in i-node
  • I-node would be 8 MB
  • Can’t fit in memory or disk sector
  • Complete system failure!

Indirect pointers solve this elegantly.

(f) Consider the process in part (c) that has opened the file and read the first 100bytes. Assume now that the process writes 512 bytes to the file starting at the current file position. How many disk blocks are read/written from/to the file. Assume that the FS implements a write-through caching policy, meaning that a dirty block is immediately flushed out to the disk. Justify your answer.

Instructor Solution

The FS first fills in block 0, which is in memory and writes it to disk. The FS then reads in the 2nd block and writes 100 bytes to it and then writes it to the disk. So a total of 2 disk blocks are written and 1 block is read for a total of 3 disk blocks.

Elaboration (by LLM):

This scenario involves both reading and writing with write-through caching. Let me trace through the operation carefully.

Setup

Process state:

  • File is open
  • I-node in memory
  • Just read first 100 bytes (part c)
  • File pointer now at byte 100
  • Block 0 is in memory cache
  • Writing 512 bytes starting at byte 100
  • Write-through policy: ALL writes to disk immediately

Where Do We Write?

Writing 512 bytes starting at byte 100:
Byte range: [100, 612)

Block 0: bytes 0-511
  Needs: bytes 100-511 (412 bytes)
  Status: In memory, partially dirty

Block 1: bytes 512-1023
  Needs: bytes 512-612 (100 bytes)
  Status: NOT in memory

So we need to modify:
- Block 0 (412 bytes starting at offset 100)
- Block 1 (100 bytes starting at offset 0)

Write Operation Step-by-Step

Phase 1: Write to Block 0 (in memory)

User: write(fd, buffer, 512)

OS does:
1. File pointer at byte 100
2. First 412 bytes of write fit in Block 0
   Block 0 is already in memory
   Modify bytes 100-511 in Block 0 cache
   Mark Block 0 as DIRTY

3. Write-through policy: flush immediately
   Write Block 0 to disk ← 1st disk WRITE

Phase 2: Write to Block 1 (not in memory)

4. Remaining 100 bytes must go to Block 1
   Block 1 is NOT in memory

   Two options:
   A) Write whole block (512 bytes) even though only 100 are changing
      → Loses existing data!

   B) Read Block 1 first, modify 100 bytes, then write
      → Preserves existing data!

   Option B is correct: Read-Modify-Write

5. Read Block 1 from disk ← 1st disk READ
   (Must preserve bytes 100-511 that were already there)

6. Modify bytes 0-99 in Block 1 cache
   Mark Block 1 as DIRTY

7. Write-through policy: flush immediately
   Write Block 1 to disk ← 2nd disk WRITE

8. Return to user

Total I/O Operations

Disk READS:  1 (Block 1 only)
Disk WRITES: 2 (Block 0 + Block 1)
Total I/O operations: 3

Why Read Block 1?

With write-through policy, we can’t just write partial blocks blindly:

Bad approach:
  write(fd, buffer, 512)
  Only write 100 bytes to Block 1
  Disk now has: [100 new bytes][412 uninitialized garbage]
  ❌ Data corruption!
  ❌ Bytes 100-511 of Block 1 are lost!

Correct approach:
  1. Read Block 1 (gets existing bytes 100-511)
  2. Modify bytes 0-99 with new data
  3. Write whole Block 1
  ✅ Data integrity preserved
  ✅ All bytes are correct

Why Block 0 Doesn’t Need Reading

Block 0 is different because:
  - Already in memory from previous read (part c)
  - We're modifying bytes 100-511 (end of block)
  - Other bytes (0-99) are unchanged
  - We already have full block content in cache

Block 1 is different because:
  - NOT in memory
  - To modify bytes 0-99: must read first
  - Bytes 100-511 must be preserved from disk
  - Then modify and flush

Write-Through Policy Mechanics

Write-through means:

App: write(data)
  ↓
Kernel: Update memory cache
  ↓
Kernel: Immediately flush to disk (SYNCHRONOUS)
  ↓
Disk: Persists change
  ↓
App: write() returns (only after disk write completes)

Contrast with write-back policy:
  Update cache, return immediately
  Flush to disk later (ASYNCHRONOUS)

Write-through advantages:

  • Safer (less data loss risk on crash)
  • More predictable (flush timing)
  • Used for critical data (databases, journals)

Write-through disadvantages:

  • Slower (wait for disk I/O)
  • More overhead (every write blocks)

Timeline Visualization

T0: User calls write(fd, buffer, 512)

T1: Kernel modifies Block 0 in cache
    Marks as DIRTY

T2: Write-through flush: Block 0 → Disk
    I/O wait (~5ms)

T3: Kernel reads Block 1 from disk
    I/O wait (~5ms)

T4: Kernel modifies Block 1 in cache
    Merge: existing bytes 100-511 + new bytes 0-99
    Marks as DIRTY

T5: Write-through flush: Block 1 → Disk
    I/O wait (~5ms)

T6: User receives return (success)

Total elapsed time: ~15ms (3 × 5ms per disk operation)

Key Insights

  1. In-memory vs out-of-memory distinction:

    • Block 0 in memory → just modify and write
    • Block 1 not in memory → read-modify-write
    • Caching makes a huge difference!
  2. Write-through cost:

    • Every write blocks until disk I/O complete
    • Forces synchronous I/O semantics
    • Can be 1000x slower than write-back
    • But data is safer
  3. Partial block writes require reading:

    • Can’t blindly write 100 bytes to disk
    • Must read full 512-byte block first
    • Merge new and old data
    • Write complete block
    • This prevents data corruption and file fragmentation
  4. The 3-operation breakdown:

    • 1 read: Block 1 (to preserve its content)
    • 2 writes: Block 0 (was dirty), Block 1 (just modified)
    • This pattern is universal for write-through filesystems

This is a great example of the trade-off between safety (write-through prevents data loss) and performance (write-back is faster). Real filesystems often use write-back with periodic flush or hybrid approaches.