07 - Memory Management

Problem 1: Variable Partitioning in Memory Management

What’s variable partitioning in memory management? Briefly explain.

Student Solution

Answer by Justin:

Variable Partitioning in memory management is when a process arrives you reserve a partition exactly as large as the process’s memory needs. Because the holes can be of any size, you can still suffer external fragmentation.

Elaboration (by LLM):

Variable partitioning is a memory allocation scheme where the OS dynamically allocates and deallocates memory partitions based on actual process requirements, rather than using fixed-size partitions.

How Variable Partitioning Works:

Allocation: When a process arrives requesting KB of memory, the OS finds a free hole and allocates exactly KB to it
Deallocation: When a process terminates, the OS reclaims its partition, creating a new hole
Fragmentation Problem: Over time, free memory becomes scattered in many small holes

Visual Example:

Initial state:

P3 (5KB) arrives - doesn’t fit in any single hole:

Total free memory: KB, but no single hole is large enough ❌

After deallocation of P2, we can fit P3:

But now we have fragmented free space: KB split into two holes ✅

Why External Fragmentation is Still a Problem:

Even though variable partitioning is efficient in terms of memory use per process, it leads to external fragmentation: the total free memory may be large enough to fit a process, but it’s fragmented across multiple holes, with no single hole large enough. This is the key disadvantage that motivated the development of paging systems.

Problem 2: Physical Addresses and Their Problems

In early architectures CPUs used to generate physical addresses to access RAM. What kind of problems did physical addresses create and how did memory management mechanisms solve these problems?

Student Solution

Answer by Justin:

Using physical addresses caused a few problems, one of them being that if the process wasn’t loaded at the expected address, the memory references would be incorrect. To fix this, compilers generated relocation records. With this, the OS loader adjusted memory references using the relocation records to ensure correctness.

Elaboration (by LLM):

Early computer systems required programmers and compilers to work directly with physical memory addresses, which created significant challenges.

Problems with Physical Addresses:

Hardcoded Addresses: A compiled program contained hardcoded references like “load value from physical address 0x1000”. This assumed the program would always be loaded at the same location
Inflexibility: If two processes needed to run simultaneously, they would conflict at the same physical address
Memory Fragmentation: The OS couldn’t move processes around to reduce fragmentation because all memory references were hardcoded
Security Risk: One process could reference another’s physical memory directly

Example Problem:

Program compiled with absolute addresses:
  Address 0x1000: load global_var
  Address 0x1004: store to 0x2000

If OS loads program at 0x5000 instead:
  The references still point to 0x1000 and 0x2000 ❌ WRONG!

Solution: Relocation Records and Base Addresses:

The compiler generated relocation records listing all memory references that needed adjustment:

Compiler output:
  Instructions: [...]
  Relocation records: [0x1000 is relocatable, 0x2000 is relocatable]

OS Loader:
  Load address = 0x5000
  For each relocation record:
    old_address += load_address
    Update instruction at that location

Result: 0x1000 becomes 0x6000, 0x2000 becomes 0x7000 ✅

This approach allowed programs to be loaded at any physical address, but was tedious and error-prone. Modern systems use virtual addressing instead, where the hardware (MMU) handles translation automatically at runtime.

Problem 3: Global Variable Address in Multiple Processes

Assume you have a simple program that declares a global variable and prints its address on the screen. When you run two copies of this program simultaneously, you see that both programs print the same address on the screen. How is this possible? Briefly explain.

Student Solution

Answer by Justin:

This is possible because what is printed is the virtual address, and the virtual address points to the same location in memory, that being the location of the global variable.

Elaboration (by LLM):

This is a fundamental insight into how virtual addressing works and why two independent processes can have the same virtual address for different data.

Why Both Processes Print the Same Address:

Program source code:
  static int global_var = 42;
  printf("Address: %p\n", &global_var);

Process A:                          Process B:
  Compiler assigns:                  Compiler assigns:
  global_var → virtual addr 0x1000   global_var → virtual addr 0x1000
  Prints: "Address: 0x1000"         Prints: "Address: 0x1000"

MMU Translation (invisible to process):
  Process A: Virtual 0x1000 → Physical 0x5000
  Process B: Virtual 0x1000 → Physical 0x8000

Key Insight: Virtual Address Space is Per-Process

Each process has its own virtual address space of identical layout. Since the compiler generates the same virtual address for global_var in both copies of the program, and the printf statement prints the virtual address (not the physical address), both processes print the same number.

However, the physical addresses are completely different:

Process A’s global_var is actually stored at some physical location (say, 0x5000)
Process B’s global_var is stored at a completely different physical location (say, 0x8000)

Practical Implication:

This design allows:

Multiple independent processes to run the same program
Each has its own memory isolated from the others
Processes don’t need to know about physical memory locations
The OS/MMU transparently maps virtual → physical addresses

If processes could see physical addresses, they would conflict when trying to use the same memory region!

Problem 4: External Fragmentation in Memory Management

What’s external fragmentation in memory management? Is external fragmentation a problem in paging systems? Briefly explain.

Student Solution

Answer by Justin:

External fragmentation occurs when you use variable partitioning to load processes into memory. It is when the free slots within the memory are not big enough to hold the incoming process consecutively. Even if the TOTAL free memory is big enough to fit a new process, there may be no single contiguous slot large enough!

Elaboration (by LLM):

External fragmentation is a critical problem in memory management systems using variable partitioning. It represents a gap between available memory and usable memory.

The Core Problem - Total ≠ Usable:

Total free memory: KB

New process arrives: KB

Can the 6KB process fit? ❌ NO! No single hole is KB, even though total free KB

How External Fragmentation Develops:

Process arrivals and departures: Processes come and go, leaving gaps
Variable-sized holes: Each hole size depends on which process was there previously
Poor allocation choices: Allocating processes suboptimally creates many tiny holes
Compounding effect: As system runs longer, more holes accumulate

Visualization of Worsening Fragmentation:

Time 0:

Time 1:

Time 2:

Time 3:

System has 17 KB free total, but largest hole is only 7 KB!

Is External Fragmentation a Problem in Paging Systems?

No! Paging completely eliminates external fragmentation because:

Memory is divided into fixed-size frames
Processes are divided into fixed-size pages
A page always fits into a frame, regardless of fragmentation
Pages don’t need to be contiguous in physical memory

Trade-off Introduced by Paging:

While paging solves external fragmentation, it introduces internal fragmentation (wasted space within an allocated page if the process doesn’t use it fully). However, internal fragmentation is typically smaller than external fragmentation.

Problem 5: Internal Fragmentation

What’s internal fragmentation? In what memory management method does internal fragmentation occur? Briefly explain.

Student Solution

Answer by Justin:

Internal fragmentation is when a process does not use all of the memory allocated to it. This is because when the OS allocated memory in fixed-size chunks, a process may not need the full chunk that has been provided.

Elaboration (by LLM):

Internal fragmentation is the “wasted” memory within an allocated block that a process doesn’t actually use. It’s the opposite of external fragmentation in both cause and location.

Key Difference from External Fragmentation:

Aspect	Internal Fragmentation	External Fragmentation
Location	Inside allocated memory	Between allocated blocks
Cause	Fixed allocation size	Variable-size holes
Memory is fragmented?	Yes, but allocated anyway	No, it’s unused
Example	4KB page with 1KB process (3KB wasted)	7KB total free split as 2KB+3KB+2KB

When Internal Fragmentation Occurs:

Internal fragmentation happens in memory management systems that allocate fixed-size blocks, specifically:

Paging: Pages have fixed size (e.g., 4KB). If a process needs 5KB, it gets 2 pages (8KB), wasting 3KB
Fixed Partitioning: Partitions have fixed size. If partitions are 16KB and a process needs 15KB, 1KB is wasted
Slab Allocation: Object caches allocate fixed-size slabs; if not perfectly filled, space is wasted

Concrete Example - Paging with 4KB Pages:

Process A needs 6 KB of memory

Page size = 4 KB (fixed)

Allocation:

Usage:

Waste:

Internal fragmentation ratio

Comparison of Both Fragmentation Types:

External Fragmentation (Variable Partitioning):

Total free: 3 KB, but no usable hole KB

Internal Fragmentation (Paging):

Waste in Page 1: 1 KB | Waste in Page 2: 2 KB | Total waste: 3 KB (all allocated, some unused)

Why Paging Still Wins Overall:

Although paging introduces internal fragmentation, it’s generally preferable because:

Internal fragmentation is bounded: max waste = page_size - 1 bytes per process
External fragmentation is unbounded: can waste huge amounts as system fragments
Internal fragmentation can be minimized with smaller page sizes (trade-off with page table overhead)

Problem 6: Variable Partitioning with Memory Allocation

Consider a system that uses variable partitioning for memory management. Assume that the system has 32KB total memory and the current snapshot of the memory is as follows:

Assume that the following 2 processes arrive in the given order: P4(4KB), P5(1KB). For each of the following memory management techniques, show the final state of the memory after the processes are allocated memory.

(1) First fit

Student Solution

Answer by Justin:

First Fit (P4: 4KB, P5: 1KB):

Initial State:

First fit allocates P4 to the first hole that fits (5KB hole). Then P5 fits in any remaining hole.

Then P5 (1KB) fits in the 1KB hole:

Elaboration (by LLM):

First Fit is the simplest memory allocation algorithm. The OS maintains a list of free holes and allocates to the first hole that is large enough.

Algorithm:

for each hole in order:
    if hole.size >= process.size:
        allocate process to this hole
        break

Advantages and Disadvantages:

Pros:

Fast: Only scan until first fit is found, no need to check all holes
Simple to implement: Just iterate through hole list
Good for quick allocations

Cons:

May waste memory: First hole might be much larger than needed, creating tiny remaining holes
Can lead to fragmentation: After many allocations/deallocations, creates many unusable small holes

Analysis of the Example:

P4 (4KB) scans holes in order: 5KB ✅ (first fit found!), allocates here

Remaining in 5KB hole: 5 - 4 = 1KB

P5 (1KB) scans holes from start: 1KB ✅ (first fit found!), allocates here

This 1KB hole is completely used

Why First Fit Left the 10KB Hole Untouched:

First Fit doesn’t need to look at the larger 10KB hole because it already found a suitable hole (the 5KB hole). This is efficient for allocation speed but may not be optimal for memory utilization. Notice that the 10KB hole remains completely free—this is common with First Fit.

(2) Best Fit

Student Solution

Answer by Justin:

Best Fit (P4: 4KB, P5: 1KB):

Initial State:

Best fit allocates P4 to the smallest hole that fits (4KB hole):

Then P5 (1KB) fits in the 5KB hole:

Elaboration (by LLM):

Best Fit is a memory allocation strategy that tries to minimize wasted space by allocating to the smallest hole that still fits the process.

Algorithm:

best_hole = null
for each hole:
    if hole.size >= process.size:
        if (best_hole == null) or (hole.size < best_hole.size):
            best_hole = hole
allocate process to best_hole

Advantages and Disadvantages:

Pros:

Minimizes waste: Allocates to smallest suitable hole, leaving larger holes for future processes
Better fragmentation: Tends to reduce external fragmentation compared to First Fit
Improves allocation success: Larger holes available for larger incoming processes

Cons:

Slower: Must scan all holes to find the best one (O(n) vs O(1) for First Fit)
Still creates small holes: The remaining space in the “best” hole becomes a new tiny hole
Overhead: The cost of scanning all holes may outweigh benefits

Detailed Analysis of the Example:

Available holes: [5KB, 10KB, 4KB]

P4 (4KB) needs best fit:

5KB hole: fits, size = 5 (candidate)
10KB hole: fits, size = 10 (not better than 5)
4KB hole: fits, size = 4 (BEST! smaller than 5)

Allocate P4 to 4KB hole → no waste!

P5 (1KB) needs best fit among remaining: [5KB, 10KB, 4KB]

5KB hole: fits, size = 5 (candidate)
10KB hole: fits, size = 10 (not better than 5)
4KB hole: already used

Allocate P5 to 5KB hole → leaves 4KB hole

Comparison to First Fit Result:

First Fit left: 10KB hole unused (waste) Best Fit left: 10KB hole unused, but better fragmentation overall

Best Fit is theoretically superior but has higher CPU overhead.

(3) Worst fit

Student Solution

Answer by Justin:

Worst Fit (P4: 4KB, P5: 1KB):

Initial State:

Worst fit allocates P4 to the largest hole (10KB hole):

Then P5 (1KB) fits in the 6KB hole:

Elaboration (by LLM):

Worst Fit is a counterintuitive strategy that allocates to the largest available hole, with the idea that this leaves large holes for future allocations.

Algorithm:

worst_hole = null
for each hole:
    if hole.size >= process.size:
        if (worst_hole == null) or (hole.size > worst_hole.size):
            worst_hole = hole
allocate process to worst_hole

Motivation Behind Worst Fit:

The intuition is: “Allocate to the largest hole so the remaining hole is still reasonably large for future processes.”

Advantages and Disadvantages:

Pros:

Preserves large holes: After allocation, remaining space is still sizeable
Allows larger future processes: Future large processes have more options
Theory: Should reduce external fragmentation

Cons:

Wastes large holes: Uses large holes even when smaller ones suffice
Actually performs worse: Empirical studies show Worst Fit performs poorly in practice
Fragments available space: Creates many medium-sized holes instead of keeping one large hole
Slower than First Fit: Still O(n) to find maximum

Detailed Analysis of the Example:

Available holes: [5KB, 10KB, 4KB]

P4 (4KB) needs worst fit:

5KB hole: fits, size = 5 (candidate)
10KB hole: fits, size = 10 (WORST! largest)
4KB hole: fits, size = 4 (not worst)

Allocate P4 to 10KB hole → leaves 6KB hole

P5 (1KB) needs worst fit among remaining: [5KB, 6KB, 4KB]

5KB hole: fits, size = 5
6KB hole: fits, size = 6 (WORST! largest)
4KB hole: fits, size = 4

Allocate P5 to 6KB hole → leaves 5KB hole

Why Worst Fit Usually Fails:

Suppose many 2KB processes arrive next:

After P4, P5: [5KB hole] [6KB hole] [4KB hole]
After 2 more 2KB: [1KB] [2KB] [2KB] [2KB]
Now lots of 2KB holes, no 5KB+ hole available!

Comparison:

Best Fit: Preserves large 10KB hole longer
Worst Fit: Breaks up the 10KB hole unnecessarily

Empirical Results:

In practice, Best Fit performs best, then First Fit, and Worst Fit performs worst despite its theoretical motivation.

Problem 7: Memory Allocation Algorithms

Given free memory partitions of 100K, 500K, 200K, 300K and 600K (in order) how would each of the Firstfit, Best-fit, and Worst-fit algorithms place processes of 212K, 417K, 112K and 426K (in order)? Which algorithm makes the most efficient use of memory?

LLM Solution

First Fit: Places 212K in 500K, leaving 288K. Places 417K in 600K, leaving 183K. Cannot place 112K (all remaining partitions too small). Cannot place 426K.

Best Fit: Places 212K in 300K, leaving 88K. Places 417K in 600K, leaving 183K. Places 112K in 500K, leaving 388K. Cannot place 426K.

Worst Fit: Places 212K in 600K, leaving 388K. Places 417K in 500K, leaving 83K. Cannot place 112K. Cannot place 426K.

Best-fit makes the most efficient use of memory in this case since it successfully allocates 3 processes compared to the others.

Elaboration:

This problem demonstrates the practical differences between allocation algorithms with realistic workloads.

Initial State:

Available partitions (in order): Processes to allocate (in order):

Memory Layout (Initial):

First Fit Trace:

Best Fit Trace:

Actually, reviewing: After Step 2, remaining holes are plus the split from Step 1.

Let me re-trace:

Key Insight - Algorithm Comparison:

Algorithm	P1 (212K)	P2 (417K)	P3 (112K)	P4 (426K)	Processes Allocated
First Fit	✅ (500K)	✅ (600K)	❌	❌	2
Best Fit	✅ (300K)	✅ (600K)	✅ (200K)	❌	3
Worst Fit	✅ (600K)	✅ (500K)	❌	❌	2

Why Best Fit Wins:

Best Fit’s strategy of choosing the smallest suitable partition preserves larger partitions:

Best Fit placed 212K in 300K (not 500K), preserving the 500K for later processes
First Fit greedily took 500K for 212K, leaving insufficient holes for P3
This allowed Best Fit to successfully allocate three processes vs. two for the others

Outcome Metric: Best Fit achieved 3/4 allocations (75%) vs. First Fit and Worst Fit with 2/4 (50%)

General Takeaway:

Best Fit is empirically superior because:

Minimizes waste in each allocation
Preserves larger holes for larger future processes
Results in fewer allocation failures

However, Best Fit requires scanning all partitions (slower), making it a CPU-efficiency vs. memory-efficiency trade-off.

Problem 8: Variable Partitioning with Base and Limit Registers

Briefly explain how memory management with variable partitioning and a simple MMU with base and limit registers works. Does this memory management scheme use physical or logical addresses?

Student Solution

Answer by Justin:

This memory management scheme uses physical addresses! How it works is that a logical address (offset) is fed into a memory management unit (MMU), and the MMU checks if the logical address is valid for the process. The MMU contains 2 registers, base and limit. The base is the base address for the current process, and the limit contains the size of the partition. If the value in the base + offset is out of bounds of the process’s limit, then a page fault is made. If it is within bounds, then the MMU directs to the process’s base address which is in the base register, and adds the offset to grab the requested memory.

Elaboration (by LLM):

This is a transition point between early physical addressing and modern virtual addressing. The scheme uses base and limit registers—a simple but powerful MMU design that enables protection and relocation without full virtual memory.

How Base-Limit MMU Works:

The MMU translates logical addresses to physical addresses using a simple addition formula:

Concrete Example:

Where:

CPU generates: logical address = (offset within process)
MMU base register = (where process is loaded in physical RAM)
MMU limit register = (process size = 16KB)

Boundary Check:

Before accessing, MMU verifies:

If check fails: Protection violation (trap to OS)

Result: CPU accesses physical RAM location

Key Features:

Relocation: Process can be loaded at any physical address by just changing the Base register
Protection: Limit register prevents process from accessing beyond its partition
Simplicity: Only 2 hardware registers needed

Address Translation Flow:

          CPU
           |
    logical address ($0x2000$)
           |
           v
      ┌────────────────────────────────────────────────┐
      │   MMU (Hardware)                               │
      │                                                │
      │  $\text{Base} = 0x5000$                        │
      │  $\text{Limit} = 0x4000$                       │
      │                                                │
      │  Check: $\text{addr} < \text{limit}$?          │
      │  Compute: $\text{addr} + \text{base}$          │
      └────────────────────────────────────────────────┘
           |
    physical address ($0x7000$)
           |
           v
         RAM

Advantages vs Disadvantages:

Pros:

Simple hardware: Only 2 registers
Fast translation: Single addition operation
Protection: Prevents accessing other processes
Relocation: Solves the physical address problem

Cons:

Requires contiguous memory: Process must fit in one contiguous block
External fragmentation: Still suffers from variable partitioning
No paging: Cannot use pages for more flexible memory management

Comparison to Modern Virtual Addressing:

Base-Limit is essentially “simple virtual addressing”:

Virtual address ≠ Physical address ✅
Automatic translation at runtime ✅
Protection between processes ✅
But no paging, so external fragmentation remains ❌

Modern systems use page tables instead of base-limit because page tables:

Allow non-contiguous allocation
Eliminate external fragmentation
Enable demand paging (pages in/out to disk)

Problem 9: Virtual vs Physical Addresses

What’s a virtual address? How does it differ from a physical address? Briefly explain.

Student Solution

Answer by Justin:

A virtual address is the address used by a program to access memory. It’s generated by the CPU during program execution. It does not refer to the physical location in memory, instead it is translated into physical memory by the MMU, which is an actual location in memory

Elaboration (by LLM):

Virtual and physical addresses are fundamental concepts in modern memory management. Understanding their relationship is crucial to understanding how operating systems isolate and protect processes.

Key Definitions:

Concept	Virtual Address	Physical Address
Definition	Address used by program/CPU	Actual location in RAM hardware
Who generates	CPU core (during instruction execution)	MMU during translation
Is same across processes?	NO - each process has own space	YES - all processes share RAM
Translation	Virtual → Physical via MMU/page tables	Hardware bus to RAM
Visibility to program	YES - used by code	NO - transparent to program

Virtual Address Space (per Process):

Each process has identical virtual address spaces, but they map to different physical locations:

Process A Virtual Space:

Code segment: (physical)
Data segment: (physical)
Stack segment: (varies)

Process B Virtual Space:

Code segment: (physical) ← different physical address!
Data segment: (physical) ← different physical address!
Stack segment: (varies)

Key Insight: Same virtual address () in each process maps to different physical addresses ( vs ):

Process A Virtual:  0x08000000 ──MMU──→ Physical: 0x3000
Process B Virtual:  0x08000000 ──MMU──→ Physical: 0xAB00

Why Separate Virtual and Physical Addresses?

Relocation: Programs always see memory starting at 0x0, regardless of where OS loads them
Protection: Process A can’t access Process B’s pages (MMU enforces this)
Multitasking: Multiple processes can run; each thinks it has whole memory
Paging: Pages can be scattered in RAM or swapped to disk; program doesn’t care
Abstraction: Hardware details (RAM layout, fragmentation) hidden from programs

Example with C Code:

int x = 42;        // Variable stored at virtual address (e.g., 0x1234)
int *p = &x;       // p stores virtual address 0x1234
printf("%p", p);   // Prints virtual address (e.g., 0x1234)

What happens when this code runs:

Expression &x evaluates to: (what program sees)
When CPU accesses *p, MMU translates: (physical)
CPU reads from physical RAM location: , retrieves value
Translation is automatic; C program never directly sees

Security Without Virtual Addressing:

❌ Without virtual addressing (old systems):

Program A writes to → directly modifies physical RAM at

Program B writes to → overwrites same physical location ❌ Data corruption!

✅ With virtual addressing (modern systems):

Program A writes to → MMU translates to physical (isolated)

Program B writes to → MMU translates to physical (isolated)

Result: Same virtual address maps to different physical locations ✅ Protected!

This explains why modern multi-process operating systems REQUIRE virtual addressing—it’s the only way to safely run multiple programs simultaneously.

Problem 10: Process Swapping

Briefly explain what process swapping is and what it was used for? In modern paging systems, is process swapping employed? Why or why not?

Student Solution

Answer by Justin:

Process swapping occurs when a high priority process needs to be loaded into memory to be executed but there is not enough space. In this case an entire lower priority process is removed from memory and put into a backing store, and the new process is put into memory to be executed. In modern paging systems, process swapping is NOT employed. This is because paging moves smaller memory units (pages) and not the entire process.

Elaboration (by LLM):

Process swapping is a memory management technique from the era of variable partitioning. It represents a drastic measure to make room for high-priority processes.

When Swapping Occurs:

Memory state:
[P1: 8KB] [P2: 12KB] [P3: 6KB] [Free: 2KB]

High-priority P4 (10KB) arrives and needs to run immediately.
Problem: No contiguous 10KB hole available.

Solution (Swapping):
1. Find lowest-priority process (e.g., P2)
2. Save P2's entire memory (12KB) to disk (backing store/swap disk)
3. Free P2's 12KB partition
4. Load P4 (10KB) into memory
5. Later, restore P2 from disk when more space available

Memory after swap:
[P1: 8KB] [P4: 10KB] [P3: 6KB] [Free: 2KB]
Disk: [P2: 12KB saved]

Characteristics of Swapping:

Granularity: Entire process (all its memory) Frequency: Infrequent (only when necessary) Trigger: High-priority process arrival + insufficient contiguous space Overhead: Very high (I/O cost to disk)

Cost Analysis:

Swapping a 12KB process:

Disk access time:
  Seek time: ~5-10 ms
  Transfer 12KB: ~0.2 ms
  Total: ~10 ms for one process

Context switch time: ~1-2 ms
Swap cost >> context switch cost by 5-10x!

Why Swapping is Expensive:

Disk I/O is the bottleneck in computer systems:

CPU speed: 1-3 GHz (1-3 ns per instruction)
Disk speed: 10-15 ms latency + transfer time
Ratio: 10 million × slower!

Swapping an entire process is catastrophic for performance.

Why Modern Systems Use Paging Instead:

Swapping (Variable Partitioning):
[P1: 8KB] [Hole: 5KB] [P2: 12KB] [Hole: 3KB]
New process (10KB): Must swap entire P2 → disk I/O cost!

Paging (Fixed Pages, e.g., 4KB):
[P1 page 1] [Hole: 4KB] [P2 page 1] [Hole: 4KB] [P2 page 2] [Hole: 4KB]
New process: Can allocate 3 pages (12KB) in non-contiguous holes
→ No need to swap anything!

Paging’s Advantages Over Swapping:

Feature	Swapping	Paging
Granularity	Entire process	Individual pages (4KB)
Frequency	Infrequent	Very frequent (transparent)
Allocation	Requires contiguous space	Works with fragmented space
I/O cost	High per operation	Amortized low (many processes)
Use case	Rare (priority inversion)	Continuous (demand paging)

Modern Paging Analogy:

Instead of swapping entire books to storage, paging swaps individual pages:

Have 10 books but shelf space for 8:
- Swapping: Remove entire book B1, store on shelf, later swap it back
  Cost: Move whole book multiple times

- Paging: Keep most-used pages of all books on shelf,
  swap least-used pages to storage, rotate as needed
  Cost: Move small units frequently, but overall efficient

When Swapping Still Happens:

Modern OSs use swapping but at page granularity:

Linux: Memory pressure → pages swapped to swap space on disk
       (not entire processes)
Windows: Similar page-based swapping in virtual memory manager

True “process swapping” (entire process) is obsolete in modern systems.

Problem 11: Segmentation for Memory Management

Briefly explain the motivation behind segmentation for memory management and how it works?

Student Solution

Answer by Justin:

The motivation behind segmentation in memory management was to avoid external fragmentation. Before segmentation, the entire process need to be loaded into main memory in a contiguous memory space, which made it so that large processes could not be loaded when there was not enough contiguous space. Segmentation works to fix this by breaking up a process into parts: Code, Data, Stack. These parts are able to be place any where in the memory space and do not have to be contiguous with each other!

Elaboration (by LLM):

Segmentation was an intermediate memory management approach that addressed one of the key limitations of variable partitioning: external fragmentation.

The Core Problem Segmentation Solves:

Before segmentation, the entire process had to be allocated as a single contiguous block. When you have 3 KB, 2 KB, and 1 KB of holes scattered throughout memory but need 5 KB contiguous space, external fragmentation makes that allocation impossible.

Segmentation divides each process into logical segments (Code, Data, Stack, etc.). Each segment is fixed or grows at its own rate, and they can be placed in different memory locations, not necessarily adjacent to each other.

Key Advantages:

Non-Contiguous Allocation: Segments can be placed in different memory locations, allowing use of scattered free space
Logical Protection: Each segment can have different access rights (code=execute only, data=read/write)
Shared Segments: Code segments of shared libraries can be mapped to multiple processes
Reduces External Fragmentation: Though not eliminating it entirely

Why It Wasn’t the Final Solution:

While segmentation reduced fragmentation, processes still needed contiguous allocation for each segment. This meant:

Small gaps still remained between segments
Segment sizes are variable, making memory management complex
Programmers must create segments explicitly

Comparison to Paging (What Replaced Segmentation):

Paging takes the concept further by dividing BOTH physical and virtual memory into fixed-size units (pages/frames), completely eliminating external fragmentation. Modern systems use paging because it’s simpler and more effective.

Problem 12: Paging and Hardware Support

All modern memory management schemes use paging. Briefly explain the main motivation behind paging and if paging requires hardware support.

Student Solution

Answer by Justin:

The main motivation behind paging was to create a solution for external fragmentation. External fragmentation made it so that large processes could not be loaded into memory to be executed if there was not enough contiguous space for it to be held. This problem was solved with paging, by breaking the memory into equal sized slots called frames, and breaking the processes up into pages of the same size, so that they may always fit into non-contiguous frames. Paging does require hardware support, as it requires a MMU.

Elaboration (by LLM):

Paging is the fundamental memory management technique in all modern operating systems. It completely solves the external fragmentation problem by using fixed-size allocation units.

Why External Fragmentation is Eliminated:

The key insight is that all pages and frames are the same size. If a 4 KB page needs to be placed, it can fit into ANY 4 KB frame—there’s no size mismatch. This guarantees complete memory utilization with no wasted gaps.

Without paging (variable-size blocks):

Holes: 5 KB, 2 KB, 3 KB (scattered)
Process needs 4 KB → Fits in 5 KB hole, but 1 KB wasted
Process needs 6 KB → Can’t fit, even though 10 KB total free

With paging (fixed 4 KB blocks):

Frames available: As many as needed
Page needs 4 KB → Fits perfectly in exactly one frame
Multiple pages → Each fits perfectly, no wasted space

Why Hardware Support (MMU) is Essential:

Paging REQUIRES hardware support for three critical reasons:

Translation Speed: Every memory access needs virtual→physical translation. Software translation would be too slow (microseconds vs nanoseconds needed).
Protection: Hardware must enforce that processes don’t access each other’s pages—this must happen at CPU cycle speed, not software speed.
Concurrency: With multiple processes rapidly context switching, hardware must handle translation with zero overhead.

The MMU (Memory Management Unit) contains:

Page Table Lookup: Translates virtual address to physical address
TLB (Translation Lookaside Buffer): Cache of recent translations
Permission Checking: Validates access rights on every access

What Makes Paging Superior:

Compared to segmentation, paging:

Completely eliminates external fragmentation (vs. reducing it)
Transparent to programmers (no explicit segment creation)
Enables virtual memory (page swapping to disk)
Simpler hardware implementation

Problem 13: Page Table Entry Contents

List 3 things that are stored in a Page Table Entry (PTE) and briefly explain what each is used for?

Student Solution

Answer by Justin:

Modify bit: The modify bit signals if the page has been modified (if the page is dirty)
Reference bit: The reference bit signals if the page has been read or written to
Frame Number: Which frame this page belongs to

Elaboration (by LLM):

A Page Table Entry (PTE) is the metadata structure that controls how the MMU and OS manage that page. Understanding PTE contents is critical to understanding paging systems.

The Three Key Fields in This Answer:

Each PTE contains multiple fields. The three mentioned are among the most important:

1. Frame Number (Mandatory)

This field stores the physical frame address where this page resides. When a page is loaded into memory, the OS updates this field with the frame number. This is how the MMU translates virtual page numbers to physical frame numbers.

Example: If virtual page 5 is in physical frame 203, PTE[5] contains 203.

2. Dirty Bit (Modified Bit)

This 1-bit flag indicates whether the page has been written to since it was loaded from disk.

Dirty = 0: Page hasn’t been modified → when evicted, just free the frame (fast)
Dirty = 1: Page modified → must write back to disk before freeing frame (slow, ~10-100ms)

The OS uses this for page replacement optimization: prefer evicting clean pages to avoid expensive disk writes.

3. Reference Bit (Accessed Bit)

This 1-bit flag indicates whether the page has been accessed (read OR written) recently. The MMU sets this to 1 whenever the page is accessed. The OS periodically clears it to 0.

Usage: Page replacement algorithms like LRU (Least Recently Used) examine this bit. Pages with reference = 0 are candidates for eviction because they haven’t been used recently.

Other Important PTE Fields Not Mentioned:

While the solution lists three, complete PTEs typically include:

Valid Bit: Is this page in memory (1) or on disk/swap (0)?
Protection Bits: What operations are allowed (read, write, execute)?
Cache Disable: Should this page be cached (important for I/O memory)?

Example: How These Fields Are Used

When a process tries to access memory address 0x5234 (page 5, offset 0x234):

MMU looks up PTE[5]
Checks Valid bit = 1 (page is in memory)
If valid = 0 → Page Fault (page is on disk, must load)
If valid = 1 → Extract Frame Number (e.g., 203)
Set Reference bit = 1 (mark as accessed)
Form physical address: (203 << 12) | 0x234
Access memory location

If code later writes to this page: 8. CPU sets Dirty bit = 1 (mark as modified)

When page replacement needed: 9. OS examines Dirty bit → if 1, must write to disk first 10. OS examines Reference bit → if 0, this is a good candidate to evict

Problem 14: Advantages and Disadvantages of Paging

List 1 advantage and 1 disadvantage of paging.

Student Solution

Answer by Justin:

One advantage of paging is that it eliminates external fragmentation One disadvantage is that it is memory reference intensive. By that I mean when accessing memory you had to make 2 references rather than one (page table, then memory)

Elaboration (by LLM):

This answer correctly identifies the fundamental trade-off in paging: it solves fragmentation at the cost of additional memory references.

Advantage: Eliminating External Fragmentation

This is paging’s primary benefit. External fragmentation was the chronic problem with variable partitioning and segmentation—wasted space accumulating over time, preventing allocation of new processes.

With paging’s fixed-size units, this is completely eliminated:

4 KB page always fits in any 4 KB frame
No holes, no wasted gaps between allocations
System remains usable indefinitely without defragmentation

Disadvantage: Memory Reference Overhead (Mitigated by TLB)

The student correctly identifies that accessing memory now requires two references:

First reference: Look up page table to get frame number (~100 ns)
Second reference: Access actual memory at that frame (~100 ns)

This doubles access time compared to direct memory systems!

However, the Solution: TLB (Translation Lookaside Buffer)

Modern systems solve this with a TLB—a hardware cache of recent virtual→physical translations:

TLB hit (90% of time): Single lookup + memory access = ~110 ns (only 10% slower)
TLB miss (10% of time): Two lookups + memory access = ~200 ns (double speed)

Weighted average: 0.9 × 110 + 0.1 × 200 = 119 ns (~19% slowdown)

So while the trade-off is real, TLB hardware makes it negligible in practice.

Other Disadvantages of Paging:

Page Table Memory Overhead:
- 32-bit system: ~4 MB per process just for page tables
- Solution: Multi-level page tables save 95%+ of this
Internal Fragmentation:
- Last page of process may not be fully used
- Bounded: maximum waste = (page_size - 1) bytes
- Negligible (~0.2% of memory)
TLB Flush on Context Switch:
- When switching processes, TLB becomes invalid
- Solution: Process ID tagging in TLB

The Big Picture:

Aspect	Before Paging	With Paging	With Paging+TLB
External Fragmentation	30-50%	0%	0%
Memory Overhead	Minimal	4 MB per process	4 MB per process
Access Time	100 ns	200 ns (2x)	119 ns (1.2x)

This comparison shows why paging dominates: the trade-offs are manageable, while the benefits are transformative.

Problem 15: Multi-level Page Tables

What’s the motivation behind multi-level page tables? Briefly explain.

Student Solution

Answer by Justin:

The motivation behind multi-level page tables is to save memory used by page tables in systems with large virtual address spaces. This is because single-level page tables must have one page table entry (PTE) for every page, even if the pages are not in use. To combat this, page tables are broken up into multiple levels, and only the parts of the page table needed for currently used addresses are kept in memory. This decreases memory use, but increases memory reference overhead.

Elaboration (by LLM):

Multi-level page tables are a crucial innovation that made paging practical for modern systems with large address spaces.

The Problem: Single-Level Page Table Overhead

For a 32-bit system with 4 KB pages:

Each PTE = 4 bytes, so each process needs:

The Waste: Even with 100 processes, that’s 400 MB just for page tables—and most processes don’t use their entire virtual address space!

Typical process:

Virtual address space allocated: 4 GB (all 32 bits)
Actually used: ~100 MB (code, data, stack, heap)
Wasted page table: ~4 MB (entries for unused 3.9 GB)

Solution: Multi-Level Page Tables

Instead of one flat table, create a hierarchy. Only allocate lower-level tables as needed.

Two-Level Example:

Single-level: One 1M-entry table (4 MB)
Two-level:   One 1K-entry directory (4 KB)
             + 50 1K-entry tables (200 KB for used regions)
             Total: ~204 KB (saved 3.8 MB!)

How Two-Level Page Tables Work:

Page Directory: 1024 entries (4 KB)
- Each entry points to a page table (or NULL if unused)
Page Tables: Multiple tables, allocated on-demand
- Each table is 1024 entries (4 KB)
- Only allocated for used regions of virtual space

Memory Savings:

Address Space	Single-Level	Two-Level	Savings
0-4 MB (code/data)	1 MB	8 KB	99.2%
Used heap	(included in above)	4 KB	99.6%
Stack region	(included in above)	4 KB	99.6%
Unused (3.9 GB)	3.9 GB	0 KB	100%
Total per process	4 MB	~20 KB	99.5%

Trade-off: Memory References

Multi-level tables require more lookups:

Single-level: 1 page table access + 1 memory access = 2 accesses
Two-level: 1 directory + 1 table + 1 memory = 3 accesses
Three-level: 1 directory + 2 table lookups + 1 memory = 4 accesses

However, TLB masks this overhead:

TLB caches the final translation (virtual page → physical frame)
With TLB hit (~95%), all intermediate lookups are skipped
Net result: ~1.2× slowdown instead of 2-4×

Three-Level Page Tables (64-bit Systems)

Modern 64-bit systems use 3-4 level hierarchies:

Linux x86-64:
PML4 (1024 entries)
  ├─ PDP (1024 entries)
  ├─ Page Directory (1024 entries)
  └─ Page Table (1024 entries)

This handles 48-bit virtual address spaces efficiently:

One 4 KB directory per process
Only allocates deeper tables as needed
Scales elegantly to massive address spaces

Real-World Impact:

Multi-level page tables enabled:

Virtual memory systems (swapping pages to disk)
Large address spaces (64-bit computing)
Sparse allocation (only allocate what’s used)
Process isolation (complete virtual address separation)

Without this optimization, page table overhead would consume significant physical memory, defeating the purpose of virtual memory.

Problem 16: 48-bit Virtual Addresses with 32-bit Physical Addresses

A machine has 48-bit virtual addresses and 32-bit physical addresses with a page size of 8KB.

(1) How many entries are needed for the page table?

Student Solution

Answer by Justin:

1 KB = 1024 = → 8 KB = → → bits

48-bit address space → bits

Entries for page table → Address space size / size of pages =

Elaboration (by LLM):

This calculation reveals how virtual address space size determines page table complexity—a critical consideration for architecture design.

Understanding the Calculation:

The number of page table entries (PTEs) must equal the number of pages in the virtual address space.

Step 1: Convert Page Size to Bits

Given: Page size = 8 KB

Step 2: Calculate Number of Pages

Virtual address space: bytes (48-bit addressing)

This is a massive number: billion pages!

Step 3: Page Table Entries

Each page needs exactly one PTE in the page table:

Why This Matters:

With entries and 8 bytes per PTE:

This is completely impractical! One process’s page table alone would consume more memory than typical servers have.

The Solution: Multi-Level Page Tables

This is precisely why multi-level page tables are essential for 48-bit address spaces:

Single-level (flattened): 256 TB per process (impossible)
Two-level: ~4 GB per process (still too large)
Three-level (Intel x86-64): ~2 MB per process (practical!)
Four-level (Linux): ~4 MB per process (with sparse allocation)

Virtual vs. Physical Address Spaces:

Important distinction this problem illustrates:

Metric	Virtual	Physical
Address bits	48	32
Page size	8 KB	8 KB
Pages/Frames
PTE entries		0 (not used for physical)
Frame table	0 (not used)	entries

The page table maps virtual pages to physical frames. So the table has one entry per virtual page, but only physical frames are limited by physical memory size.

Practical Example:

For this 48-bit virtual / 32-bit physical system:

Process can address: 256 TB of virtual memory
System has: 4 GB of physical memory
Page table entries needed: 34 billion
Pages actually in memory at any time: ≤ 1 million

The page table is sparse—most entries are for pages currently on disk or never allocated.

(2) How many frames are there in the physical memory?

Student Solution

Answer by Justin:

Frames need to be same size as pages → frame size = 8 KB = bytes
32-bit address space = bytes
Number of frames = Address space size / size of frames =

Elaboration (by LLM):

This calculation demonstrates the relationship between physical memory capacity and the number of frames—a key constraint on system capacity.

Physical Memory Constraints:

Unlike virtual address space (which can be arbitrary), the number of frames is determined by actual physical RAM installed.

For this system:

Physical memory: bytes = 4 GB
Frame size (must equal page size): 8 KB = bytes

Number of Frames:

What This Means:

This system has exactly 524,288 physical frames available. This is the absolute upper bound on how many pages can be resident (in physical memory) at any moment.

Comparison: Virtual vs. Physical Limitations

Aspect	Virtual (from Part 1)	Physical (from Part 2)
Address bits	48	32
Address space size	= 256 TB	= 4 GB
Pages/Frames	pages	frames
Page table size	256 TB (why multi-level tables exist)	4 MB frame table
Pages in memory	Limited to frames max	524,288 frames available

Physical Address Format:

With frames and 8 KB pages, physical addresses decompose as:

Example physical address:

Frame 0x5A2 (19 bits)
Offset 0x345 (13 bits)
Physical address: 0x5A2345

Key Insight: Virtual vs. Physical Page Table Entries

This problem distinguishes two concepts:

Part (1): Virtual page table has entries (one per virtual page)
Part (2): Physical system has frames (max resident pages)

A mismatch: virtual pages but only frames to hold them!

Solution: Virtual memory + paging to disk

All pages are “possible”
Only pages are in physical RAM at a time
OS swaps pages to disk as needed
Page faults trigger disk I/O to load pages into free frames

Frame Table (OS Data Structure):

The OS maintains a frame table with entries—one per frame:

Frame Table (524,288 entries, ~2-4 MB)
Entry 0: [Process A, Page 5, Dirty=1, Pinned=0]
Entry 1: [Process B, Page 23, Dirty=0, Pinned=0]
Entry 2: [Kernel, Page 0, Dirty=0, Pinned=1]
...
Entry 524287: [Free]

Each entry tracks:

Which process owns the frame
Which virtual page it contains
Dirty flag (write-back needed?)
Pinned flag (cannot evict?)

Why Frame Count Matters:

System capacity fundamentally limited by physical frames:

524,288 frames with 8 KB pages = 4 GB memory (expected!)
Each process competes for frames
When all frames full + page fault occurs → page replacement needed
Limited frames is why TLB and cache efficiency matter

Real-World Comparison:

System	Physical Memory	Frame Size	Frames	Virtual Space
32-bit (4 KB pages)	4 GB	4 KB	1M	4 GB
64-bit server	256 GB	4 KB	64M	256 TB
This problem	4 GB	8 KB	524K	256 TB

Modern servers have millions of frames; small embedded systems might have thousands.

Problem 17: Shared Page Frames

What is the effect of allowing two entries in a page table to point to the same page frame in memory? Explain how this effect can be used to decrease the amount of time and memory needed to copy a large amount of memory from one place to another. What would the effect of updating some byte in one page be on the other page? How would you make sure that updating a byte in one page does not change the same byte in the other page?

Student Solution

Answer by Justin:

If two entries in a page table point to the same page frame in memory, both pages refer to the same physical memory. That means any read from either page returns the same data, and any write to one page affects the other page. This decreases the amount of time and memory needed to copy a large amount of memory from one place to another because instead of actually moving the memory you can just have a new page point to the same physical address.

Problem 18: Logical Address Space with 8 Pages

Consider a logical address space of 8 pages of 1024 bytes each, mapped onto a physical memory of 32 frames.

(a) How many bits are there in the logical address?

Student Solution

Answer by Justin:

8 pages each 1 KB → → → → 13 bits

(b) How many bits are there in the physical address?

Student Solution

Answer by Justin:

Frames space = 32 frames each frame 1 KB → → → → 15 bits

(c) If the OS requires all pages of process to be in memory for it to run, how many processes can the OS load to memory at the same time?

Student Solution

Answer by Justin:

Problem 19: 32-bit Computer with Single-level Paging

Consider a 32-bit computer that has 1GB of physical memory. This machine employs single-level paging with 8192 byte (8 KB) frames.

(a) Show the parts of a virtual address, i.e., how many bits are used for offset, how many bits for indexing the page table?

Student Solution

Answer by Justin:

Since page size = bytes → we need 13 bits for the offset
For indexing, virtual address is 32 bits, offset is 13, so bits

(b) How many physical frames does the machine have? How many virtual pages does the machine have? How many bytes does a process page table occupy in memory?

Student Solution

Answer by Justin:

Physical frames → 1 GB = bytes, each frame →
Virtual → 32-bit computer = bits, each page →
Page table occupy size → # pages size of PTE

(c) A user process generates the virtual address 0x345672F3. Compute the page number and offset from the virtual address and explain how the system establishes the corresponding physical location.

Student Solution

Answer by Justin:

Virtual Address → 0x345672F3

Page Number: 0x34567
Offset: 0x2F3

To get the corresponding physical location, the system goes to the page that corresponds to the page number which will provide the frame number. Then the system will concatenate the offset to the frame number to get the physical location. For example if 0x34567 contained 0x12345 then the physical address is 0x123452F3.

(d) Assume that memory access takes 100 microseconds (100,000 nanoseconds). This machine uses TLB and TLB access takes 100 nano seconds. Assume a 90% TLB hit ration, compute the effective memory access time.

Student Solution

Answer by Justin:

Where:

= hit ratio = 0.90
= TLB access time = 100 nanoseconds
= Memory access time = 100,000 nanoseconds
= # levels = 1

Problem 20: 16-bit Architecture with Single-level Paging

Consider a 16-bit architecture that uses single-level paging. Assume the frame size is 256 bytes, and the total physical memory size is 16KB.

(a) What’s the total virtual memory size? How many frames does the physical memory have?

Student Solution

Answer by Justin:

256 bytes → | physical memory → 16 KB = → bytes

Virtual memory size → bytes

Frame # → frames

Elaboration (by LLM):

This problem illustrates the fundamental components of a paging system on a small 16-bit architecture.

Virtual Memory Size:

With a 16-bit architecture, each process can address:

This is the upper bound of addressable memory per process. All virtual addresses are 16 bits.

Physical Memory Constraint:

Total physical RAM installed: 16 KB = bytes

This is quite small—modern systems have GB or TB, but this problem models a small embedded or historical system.

Frame Calculation:

Page size = 256 bytes = bytes (matches virtual page size requirement)

Number of physical frames:

The system has exactly 64 frame slots to hold pages from any process.

System Capacity:

Metric	Value
Virtual address space	64 KB per process
Physical memory	16 KB total
Page size	256 bytes
Frame size	256 bytes
Total pages possible	64 KB / 256 B = 256 pages
Total frames	16 KB / 256 B = 64 frames

Key Insight: Only 64 out of 256 possible pages can be in memory simultaneously. The rest must reside on disk.

Memory Layout Example:

Process A (address space: 0x0000 to 0xFFFF):

Pages 0-9: Loaded in frames 5, 23, 12, 8, 3, 45, 18, 1, 52, 61
Pages 10-255: On disk or unallocated

When Process A needs page 100 (currently on disk) and all 64 frames are full:

OS uses page replacement algorithm (LRU, etc.)
Evicts an existing page from memory
Reads page 100 from disk into the freed frame

Process Page Table Structure:

Each process has its own page table with 256 entries:

Each entry is a PTE (typically 2-4 bytes for a 16-bit system).

Multi-process Example:

With 64 total frames and multiple processes:

Process A: 20 pages in memory
Process B: 30 pages in memory
Process C: 10 pages in memory
Process D: 4 pages in memory
Total: 64 frames (fully utilized)

If Process E arrives wanting 10 pages, page replacement must happen (evicting pages from A-D).

(b) Draw the virtual address and show how many bits are used for offset and how many bits are used for page number.

Student Solution

Answer by Justin:

Since page size is bytes, we need 8 bits to contain the offset. Since we are in a 16-bit architecture, 16 - 8 = 8 bits so 8 bits are left for the page number.

Virtual address format:

Elaboration (by LLM):

This problem demonstrates address decomposition—how a virtual address is split into page number and offset, a critical concept for understanding address translation.

Why Offset Bits Equal Log₂(Page Size):

The offset must be able to address every byte within a page.

Given: Page size = 256 bytes =

Need bits to address distinct bytes within the page.

Why Remaining Bits Are Page Number:

Total address bits: 16 (for 16-bit architecture)

Bits used for offset: 8

Bits left for page number:

Address Format:

The solution shows offset first (bit positions 0-7, lowest order bits) and page number second (bit positions 8-15, highest order bits).

Or written left-to-right as the solution shows:

Example Address Decomposition:

Virtual address: 0xA5C2 (binary: 1010 0101 1100 0010)

Offset (lower 8 bits): 0xC2 = 194 bytes
Page number (upper 8 bits): 0xA5 = 165

This means: “In page 165, at byte offset 194 within that page”

How MMU Uses This Decomposition:

Extract page number from upper 8 bits: 0xA5 = 165
Look up page table entry: PT[165]
Get frame number (e.g., 23)
Extract offset from lower 8 bits: 0xC2 = 194
Form physical address: (23 << 8) | 194 = 0x17C2
Access RAM at location 0x17C2

Comparison Across Different Page Sizes:

Page Size	Offset Bits	Page # Bits (16-bit)
256 B ()	8	8
128 B ()	7	9
512 B ()	9	7

For Larger Architectures:

32-bit system with 4 KB pages:

Page size: 4 KB =
Offset bits: 12
Page number bits: 32 - 12 = 20
Total pages: = 1 million pages

64-bit system with 4 KB pages:

Page size: 4 KB =
Offset bits: 12
Page number bits: 64 - 12 = 52
Total pages: = 4.5 quadrillion pages!

Key Principle:

This address decomposition is hardware-encoded:

The MMU physically separates the address bits
Lower N bits automatically go to offset hardware
Upper bits automatically go to page table lookup

The split is determined by page size and cannot be changed without hardware redesign.

(c) How many entries does a process’s page table have? What’s the size of a process’s page table?

Student Solution

Answer by Justin:

A process’s page table is bytes, and a page size is 256 KB ( bytes). So the process’s page table can have pages!

Elaboration (by LLM):

This solution clarifies the relationship between virtual address space size and page table dimensionality. (Note: The wording is slightly confusing—what’s really meant is “the process’s virtual address space is bytes, and page size is bytes.”)

Page Table Entries Calculation:

The number of page table entries equals the number of possible pages in the virtual address space.

Given:

Virtual address space: bytes (16-bit addressing)
Page size: 256 bytes = bytes

Number of pages:

Therefore, the page table must have 256 entries.

Page Table Size in Memory:

Assuming each PTE is 2 bytes (for this 16-bit system):

Key Insight:

Each process must have a page table capable of mapping all 256 possible pages in its virtual address space. This is true regardless of how many pages are actually in memory.

Page Table Layout:

Process A's Page Table (256 entries, 512 bytes)

Entry 0:   Frame 5     (page 0 is in frame 5)
Entry 1:   Frame 23    (page 1 is in frame 23)
Entry 2:   Invalid     (page 2 not in memory)
Entry 3:   Frame 8     (page 3 is in frame 8)
...
Entry 255: Invalid     (page 255 not in memory)

Total size: 512 bytes in physical RAM

Resident vs. Non-Resident Pages:

The page table has 256 entries, but processes use far fewer frames:

Aspect	Value
Possible pages	256 (all entries in page table)
Pages in memory	Limited by 64 total frames
Example process	20 pages in memory, 236 on disk

When page 150 is accessed but not in memory:

Page fault occurs
OS finds page 150 on disk
Page fault handler evicts a page using page replacement algorithm
Loads page 150 into freed frame
Updates PTE[150] with new frame number

Memory Usage Across Multiple Processes:

With N concurrent processes:

Examples:

5 processes: 2.5 KB for all page tables
10 processes: 5 KB for all page tables
32 processes: 16 KB for all page tables

This is quite small relative to the 16 KB physical memory!

Page Table Entry Details:

For this 16-bit system with 64 frames, each PTE likely contains:

Field	Bits	Purpose
Frame number	6	(since )
Valid bit	1	Is page in memory?
Dirty bit	1	Has page been modified?
Reference bit	1	Has page been accessed?
Protection bits	2	Read/write/execute
Other	5	Unused/reserved
Total	16 bits = 2 bytes

Comparison Across Architectures:

System	Virtual Space	Page Size	Entries	Bytes/Entry	PT Size
This problem	64 KB	256 B	256	2	512 B
32-bit system	4 GB	4 KB	1M	4	4 MB
64-bit system	256 TB	4 KB	64G	8	512 GB

Notice how page table size grows dramatically with address space! This is why multi-level page tables are essential.

Where Page Tables Live:

Page tables reside in physical memory:

This 512-byte table occupies 2 of the 64 frames
Multiple processes share the 64 frames: page tables + process data
In page replacement, page tables themselves can be paged to disk (though OS typically pins them in memory)

Trade-off: Table Size vs. Flexibility

Larger page tables consume more memory
But smaller pages means more table entries (trade-off)
This system chose 256-byte pages (compromise between overhead and internal fragmentation)

(d) Show a diagram that shows how address translation occurs. Draw TLB, page table and physical memory. Assume all pages of a process are in the memory for the process to run.

Student Solution

Answer by Justin:

alt text

(e) Assume TLB search takes 1ns and memory access takes 100ns. For a TLB hit rate of 90%, compute the effective memory access time.

Student Solution

Answer by Justin:

Where:

= hit ratio = 0.90
= memory access time = 100 ns
= TLB access time = 1 ns

Elaboration (by LLM):

Effective Memory Access Time (EMA) calculation is fundamental to understanding paging system performance. This problem shows how TLB hit rate dramatically affects overall memory performance.

Understanding the Two Scenarios:

Memory accesses fall into two categories based on TLB status:

Scenario 1: TLB Hit (probability = a = 0.90)

When the virtual→physical translation is cached in the TLB:

TLB lookup: 1 ns (fast cache hit)
Memory access: 100 ns
Total: (1 + 100) = 101 ns

Scenario 2: TLB Miss (probability = 1 - a = 0.10)

When the translation is NOT in TLB (must consult page table):

TLB lookup: 1 ns (search and fail)
Page table access: 100 ns (first memory reference to get frame number)
Memory access: 100 ns (second memory reference for actual data)
Total: (1 + 100 + 100) = 201 ns

Formula Breakdown:

Where:

= hit scenario time
= miss scenario time
= probability of hit
= probability of miss

Calculation:

Performance Analysis:

System performance with 90% TLB hit rate:

Ideal (direct memory, no paging): 100 ns
With paging: 111 ns
Overhead: 11%

Despite doubling memory accesses on TLB misses, the system is only 11% slower because TLB hits dominate (90% of accesses).

Impact of Changing TLB Hit Rate:

Varying the hit rate shows its dramatic effect on EMA:

Hit Ratio (a)	Hit Cost	Miss Cost	EMA	Overhead
50%	101 ns	201 ns	151 ns	51%
75%	101 ns	201 ns	131 ns	31%
80%	101 ns	201 ns	121 ns	21%
85%	101 ns	201 ns	116 ns	16%
90%	101 ns	201 ns	111 ns	11%
95%	101 ns	201 ns	106 ns	6%
99%	101 ns	201 ns	102 ns	2%
100%	101 ns	-	101 ns	0%

Sensitivity Analysis:

How much does each percentage point of hit rate matter?

Each 1% increase in TLB hit rate saves 1 ns of EMA:

Improving from 80%→90%: saves 10 ns
Improving from 90%→99%: saves 9 ns

This shows why TLB design is critical—small hit rate improvements yield significant system performance gains.

Why Real Systems Achieve High TLB Hit Rates:

Despite the threat of TLB misses, real systems consistently achieve 95%+ hit rates:

Workload	Typical Hit Rate	Reason
Sequential code	99%+	Instructions march through memory sequentially
Standard applications	95-98%	Locality of reference (programs reuse memory)
Memory-intensive	85-95%	Still good due to working set principle
Pathological (worst)	<50%	Random access patterns (rare in practice)

Working Set Principle:

At any moment, a program actively uses only ~10-50 virtual pages. With a modern TLB having 100-1024 entries, the working set fits entirely in TLB, achieving near-perfect hit rates.

Real-World Comparison:

System	TLB Entries	Typical WS	Hit Rate
Historical (Intel 386)	4-8	20 pages	~50%
Pentium IV	64-128	30 pages	~95%
Modern server	512-1024	50 pages	~99%

Implication for System Design:

This analysis justifies why systems invest heavily in TLB:

Large TLBs (1000+ entries): Small hardware cost for huge performance gain
TLB prefetching: Predict future translations and load proactively
Shared TLBs: Multiple cores share single large TLB

An additional 512 TLB entries might cost ~1% silicon but improve performance by 5-10%—an excellent trade-off.

Historical Context:

Early paging systems (1970s-1980s) with small TLBs (4-8 entries) saw 30-40% slowdowns. Modern systems with large TLBs achieve near-native performance, making paging practical for general-purpose computing.

Problem 21: Avoiding TLB Flush on Context Switch

We discussed in class that typically OSs keep a separate page table for each process. Since the logical address space of all processes are the same, the TLB must be flushed during a context switch. Suggest a fix that would NOT require the TLB to be flushed during a context switch. What are the implications of your suggestion?

Student Solution

Answer by Justin:

A fix that makes it so that the TLB does not have to be flushed during a context switch is using Context Identifiers! This makes it so that you can take TLB entries with which process they belong to. With this, translations from other processes can remain in the TLB but are ignored!

Elaboration (by LLM):

This solution identifies a critical performance problem in paging systems and its elegant hardware solution.

The Problem: TLB Invalidation on Context Switch

Without special handling, when the OS switches from Process A to Process B:

Both processes have separate page tables (different virtual→physical mappings)
Process A’s TLB entries are now invalid for Process B
Example: Process A’s page 100 might be in frame 5, but Process B’s page 100 might be in frame 200
If TLB still contains Process A’s translation and Process B accesses page 100, it gets wrong frame!

Why Flush Is Expensive:

Clearing 100-1000 TLB entries: ~1000 CPU cycles
Repopulating TLB: ~100 cycles per TLB miss
Context switches happen 10,000+ times per second on busy systems
Total overhead: 1-5% of CPU time wasted on TLB management

The Solution: Process ID (Context ID) Tagging

Instead of flushing, tag each TLB entry with the process ID:

TLB Entry Format (without ASID):
[Virtual Page] → [Physical Frame]

TLB Entry Format (with ASID/Context ID):
[ASID: 12 bits] [Virtual Page] → [Physical Frame]

How It Works:

Storage: Each TLB entry also stores the process’s Address Space Identifier (ASID)
On context switch: CPU register contains current ASID
TLB lookup: Compare both virtual page AND ASID
- Match found → Translation valid, use it
- ASID mismatch → Entry ignored (belongs to different process)

Example:

Process A (ASID=1):
- Page 100 → Frame 5 (stored as [ASID=1, Page=100] → Frame 5)

Process B (ASID=2):
- Page 100 → Frame 200 (stored as [ASID=2, Page=100] → Frame 200)

Context Switch: CPU.ASID_register = 2

Process B accesses page 100:
- MMU checks TLB with [ASID=2, Page=100]
- Finds match [ASID=2, Page=100] → Frame 200
- Uses Frame 200 (correct!)
- Ignores [ASID=1, Page=100] entry (different ASID)

Performance Improvement:

Approach	TLB Flush Cost	Benefit
Without Context ID	1000 cycles per switch	None
With Context ID	0 cycles (no flush needed)	1-5% system performance gain

Implications of This Solution:

Hardware Cost:
- Slightly larger TLB entries (12-16 bits for ASID)
- Comparator must match both ASID and page number
- Negligible cost (~1% more silicon)
Software Cost:
- OS assigns unique ASID to each process
- ASID pool limited (e.g., 4096 on ARM, 256 on older systems)
- ASID reuse after process exits
Effectiveness:
- Excellent for multi-process systems: Eliminates expensive TLB flushes
- Reduces latency: Context switches faster
- Improves throughput: CPU spends more time on useful work
Trade-offs:
- Requires OS to manage ASID allocation
- ASID exhaustion requires careful process lifecycle management
- Not beneficial for single-process systems

Real-World Adoption:

ARM: ASID feature (ASID tagging)
MIPS: ASID support in TLB
x86: No native ASID; uses process ID in CR3 register (different approach)
Modern systems: Virtually universal in multi-core CPUs

Why This Matters:

Context ID tagging is a perfect example of hardware-software co-design:

Simple hardware change (tag each entry)
Massive software benefit (eliminate expensive flushes)
Enables efficient process-switching in modern OSs

Problem 22: TLB Hit Ratio and Effective Memory Access Time

The percentage of time that any page number is found in the TLB is called the hit ratio. Assume a TLB hit ratio of 90% and a single-level page table. If it takes 10 nanoseconds to lookup the TLB, and 100 nanoseconds to access memory, calculate the effective memory-access time.

Student Solution

Answer by Justin:

Where:

= hit ratio = 0.90
= memory access time = 100 ns
= TLB access time = 10 ns

Elaboration (by LLM):

This problem extends the earlier EMAT calculation by using different TLB parameters and explicitly showing the three-reference cost of a TLB miss with single-level page tables.

Key Difference from Problem 20(e):

Problem 20 had:

TLB access: 1 ns (faster cache)
Memory access: 100 ns
Formula: → 2 references on miss

Problem 22 has:

TLB access: 10 ns (slower cache)
Memory access: 100 ns
Formula: → 3 references on miss

Why Three References on Miss?

With single-level page tables:

TLB HIT Path (1 reference):
  1. TLB lookup: 10 ns (hit!) → get frame number
  2. Memory access: 100 ns → get data
  Total: 110 ns

TLB MISS Path (3 references):
  1. TLB lookup: 10 ns (search, fail)
  2. Page table access: 100 ns → read PTE to get frame number
  3. Memory access: 100 ns → read actual data
  4. Cache in TLB: (hardware automatic)
  Total: 210 ns

The extra 10 ns for TLB lookup vs. the earlier 1 ns reflects different hardware implementations (larger, slower TLB vs. fast L1 cache).

Formula Breakdown:

: Hit case = (memory + TLB) × probability of hit
: Miss case = (TLB + page table + memory) × probability of miss

Note: The first (TLB lookup) happens in both hit and miss scenarios.

Calculation Details:

Performance Analysis:

Compare to direct memory (no paging):

Direct: 100 ns (baseline)
With paging: 130 ns
Overhead: 30%

This is higher than Problem 20(e)‘s 11% overhead due to:

Slower TLB: 10 ns vs. 1 ns (10× slower)
Single-level page tables: 3 refs vs. 2 refs on miss

TLB Efficiency Analysis:

With slower TLB (10 ns), the hit rate becomes more critical:

Hit Ratio	EMAT	Overhead
75%	ns	60%
80%	ns	50%
85%	ns	40%
90%	ns	30%
95%	ns	20%
99%	ns	12%

Key Insight:

With a slower TLB (10 ns), reaching 99% hit rate is essential to keep overhead reasonable. This explains why modern systems invest heavily in:

Large TLBs (1000+ entries)
TLB prefetching
Fast TLB access (priority over page table performance)

Comparison: Multi-level vs. Single-level Page Tables

With 2-level page tables on miss, we’d add another memory reference:

The extra level adds 10 ns to miss case (41 vs. 31), demonstrating the trade-off between page table memory efficiency and access time.

Real-World Implication:

This problem highlights why designers must balance:

TLB size: Larger TLBs → higher hit rates but slower access
Page table levels: More levels → save memory but slower on miss
System performance: Must achieve 90%+ TLB hit rates to remain practical

Problem 23: 32-bit Virtual Addresses with 2-level Page Table

Suppose your machine has 32 bit virtual addresses and uses a 2-level page table. Virtual addresses are split into a 9 bit top-level page table field, an 11 bit second level page table field, and an offset. How large are the pages? How many pages are there in the virtual address space? If the machine has 1GB physical memory, how many frames does the memory have?

Student Solution

Answer by Justin:

Page size:

32 bits total = 9 bits + 11 bits + offset → offset = 12 bits
Page size = bytes = 4 KB

Number of pages:

Top-level bits + second-level bits = 9 + 11 = 20 bits
Number of pages =

Number of frames:

Physical memory = 1 GB = bytes
Frame size = bytes
Number of frames =

Elaboration (by LLM):

This problem demonstrates how address bits are distributed in 2-level page table systems—a key concept for understanding modern virtual memory architectures.

Address Decomposition Logic:

The 32-bit virtual address must be split into three components:

Why Offset = 12 Bits:

The offset is determined by page size, not architectural design:

Given: Page size = 4 KB = bytes
Need to address bytes within a page
Therefore: offset bits = 12

This is fixed. The remaining bits (32 - 12 = 20) are split between the two page table levels.

Why 9 + 11 Split:

The designers chose an asymmetric split (9 + 11) rather than symmetric (10 + 10):

Design Options:
- Symmetric: 10 + 10 (cleaner, more common)
- Asymmetric: 9 + 11 (this problem's choice)

Both are valid; split depends on system characteristics.

2-Level Page Table Structure:

Directory and Table Sizes:

Top-level (Page Directory):

Entries: entries
Size: (fits in one frame)
Always resident in memory

Second-level (Page Tables):

Per table entries: entries
Per table size:
Multiple tables (up to 512, one per directory entry)
Total possible:
Actually allocated: only for used regions

Total Virtual Pages:

Each process needs 1 million page table entries spread across the 2-level hierarchy.

Memory Savings Comparison:

Approach	Size per Process
Single-level
2-level	4 KB (directory) + used tables (typical: 20 KB) = 24 KB
Savings	99.7% memory saved!

Address Translation Example:

Virtual address: 0x12345678

Binary: 0001 0010 0011 0100 0101 0110 0111 1000

Split:

Top-level index (bits 31-23): 000 0100 10 = 0x12 (entry 18 in directory)
Second-level index (bits 22-12): 011 0100 0101 = 0x345 (entry 837 in table)
Offset (bits 11-0): 0110 0111 1000 = 0x678 (offset 1656 in page)

Translation Hardware (2-level):

Virtual Address: 0x12345678
    ↓
1. Extract top-level index: 0x12 (9 bits)
   Access Page Directory[0x12] → Frame F1
    ↓
2. Extract second-level index: 0x345 (11 bits)
   Access Page Table (in frame F1) at entry 0x345 → Frame F2
    ↓
3. Extract offset: 0x678 (12 bits)
   Physical address = (F2 << 12) | 0x678
    ↓
4. Access physical memory at computed address

Physical Memory Calculations:

Physical memory: 1 GB = bytes

With 4 KB pages, number of frames:

Frame numbers are 18 bits (different from page numbers!):

Virtual vs. Physical Address Spaces:

Aspect	Virtual	Physical
Address bits	32	~18 (varies by memory size)
Page numbers	20 bits ( pages)	-
Frame numbers	-	18 bits ( frames)
Address space	4 GB	1 GB

Real-World Significance:

This 2-level scheme is very close to real x86-32:

x86-32: 10 + 10 split (symmetric)
Both provide massive memory savings for sparse address spaces
Enables 32-bit computing with efficient page tables

Key Insight:

With 2-level page tables:

Only allocate second-level tables as needed
Most of the 4 GB virtual space has no second-level table (doesn’t consume memory)
Process typically uses 20 KB of page tables (vs. 8 MB if single-level)
This made 32-bit virtual memory practical in the 1990s-2000s

Problem 24: 32-bit Computer with Single-level Paging and 8KB Frames

Consider a 32-bit computer that has 1GB of physical memory. This machine employs single-level paging with 8KB frames.

(a) Show the parts of a virtual address, i.e., how many bits are used for offset, how many bits for indexing the page table?

LLM Solution

Virtual Address Breakdown:

Since the frame/page size is 8KB = bytes, the offset requires 13 bits.

For a 32-bit architecture, the page number uses 19 bits.

Virtual Address Format:

The 19-bit page number is used to index into the page table, which contains entries.

(b) How many physical frames does the machine have? How many bytes does a process page table occupy in memory?

LLM Solution

Number of Physical Frames:

Physical memory = 1 GB = bytes

Frame size = 8 KB = bytes

Number of frames = frames

Page Table Size:

The page table has entries (one for each page in the virtual address space).

Each Page Table Entry (PTE) is typically 4 bytes (contains frame number + flags).

Page table size =

(c) A user process generates the virtual address 0x345612F3. Explain how the system establishes the corresponding physical location. Distinguish between software and hardware operations.

LLM Solution

Address Translation: 0x345612F3

First, convert to binary and split into components:

Virtual address: 0x345612F3 = 0011 0100 0101 0110 0001 0010 1111 0011 (binary)
Page number (upper 19 bits): 011 0100 0101 0110 0001 0 = 0x1A2C1 (hex)
Offset (lower 13 bits): 001 0010 1111 0011 = 0x12F3 (hex)

Translation Process:

Hardware (MMU):
- Extract page number (0x1A2C1) and offset (0x12F3) from virtual address
- Check TLB for cached translation (if TLB hit, skip to step 4)
Software (OS if TLB miss):
- Use page number as index into page table
- Read Page Table Entry at offset bytes in page table
- Check validity bit (is page in memory?)
- If page fault, invoke page fault handler; otherwise, extract frame number
Hardware (MMU):
- Load frame number from PTE (e.g., frame 0xBEEF)
- Cache translation in TLB for future hits
Hardware (MMU):
- Form physical address = (frame number << 13) | offset
- Physical address = (0xBEEF << 13) | 0x12F3
- Access physical memory at computed address

(d) Assume that memory access takes 100 microseconds. This machine uses TLB and TLB access takes 100 nano seconds. Assume a 90% TLB hit ration, compute the effective memory access time.

LLM Solution

Effective Memory Access Time (EMAT):

With a single-level page table:

Where:

(TLB hit ratio)
ns (TLB lookup time)
µs = 100,000 ns (memory access time)
µs = 100,000 ns (page table lookup time)

Calculation:

$µ$

Problem 25: 2-level Page Tables with 64-byte Pages

A certain computer provides its users with a virtual address space of bytes. The computer has bytes of physical memory. The virtual memory is implemented by paging with 2-level page tables, and the page size is 64 bytes.

(a) Show the parts of a virtual address, i.e., how many bits are used for offset, how many bits for indexing the outer page table, and how many bits for indexing the inner page table?

LLM Solution

Virtual Address Breakdown:

Page size = 64 bytes = → Offset: 6 bits

Virtual address space = bytes

Number of pages = pages → Need 10 bits for page indexing

For 2-level page tables with even split:

Outer (page directory): 5 bits
Inner (page table): 5 bits

Virtual Address Format:

This gives page directories, each with page table entries.

(b) A user process generates the virtual address 0x12F3. Explain how the system establishes the corresponding physical location. Distinguish between software and hardware operations.

LLM Solution

Address Translation: 0x12F3

Convert to binary: 0x12F3 = 0001 0010 1111 0011

Parse into components:

Outer page index (bits 15-11): 00010 = 2
Inner page index (bits 10-6): 01011 = 11
Offset (bits 5-0): 110011 = 51

Translation Process:

Software (OS): Look up page table address base (stored in a privileged register)
Hardware (MMU):
- Index into outer page table using outer page index (2)
- Fetch page table descriptor from outer[2]
- Extract physical address of inner page table
Hardware (MMU):
- Index into inner page table using inner page index (11)
- Fetch PTE from inner_table[11]
- Check validity bit; if not present, trigger page fault
Hardware (MMU):
- Extract frame number from PTE
- Form physical address = (frame_number << 6) | offset
- Access physical memory at computed address

Key Difference: Two page table lookups required (outer → inner) before reaching data, increasing access time but reducing page table memory overhead.

Problem 26: 2-level Page Tables with 64-bit Addresses

Why would it be a bad idea to use a 2-level page table (page directory and page table) for a processor that supports 64-bit memory addresses (using a 4KB page size)?

Student Solution

Answer by Justin:

Using a 2-level page table with 64-bit addresses would be a bad idea because:

With a 4KB page size, the offset requires 12 bits, leaving 52 bits for the page number. Even with a 2-level page table (e.g., 26 bits for top level, 26 bits for second level), the page table would be enormously large. Each top-level page table would have entries, consuming enormous amounts of memory. This results in excessive page table overhead and memory waste, making a 2-level structure insufficient for 64-bit address spaces. Multi-level page tables (3 or more levels) would be necessary, which further complicates address translation and reduces performance.

Elaboration (by LLM):

This problem illustrates why page table design must scale with address space size—a critical consideration in modern 64-bit systems.

The Scale Problem: 52-bit Page Numbers

For 64-bit addresses with 4 KB pages:

Offset bits: bits
Page number bits: bits
Total virtual pages: = 4.5 quadrillion pages

Why 2-Level Fails:

Even with symmetric 26+26 split:

Each entry is 8 bytes, so:

This is for JUST the top-level directory per process—completely impractical!

Real-World Solution: 4-Level Page Tables

Modern 64-bit systems use 4-level hierarchies with 9-bit splits per level, reducing top-level size to just 4 KB. This makes 64-bit virtual memory practical while maintaining efficient address translation.

Problem 27: Translation Look Aside Buffer

What’s Translation Look Aside Buffer (TLB)? What is it used for? Briefly explain.

Student Solution

Answer by Justin:

The Translation Look Aside Buffer (TLB) is used as a way to cache PTEs to store recent virtual-to-physical address translations. This makes it so that page lookup is faster if the page has already been cached. If it has not been cached then the CPU must then go to the page table to retrieve the mapping which takes more time. Once retrieved, the mapping is cached.

Elaboration (by LLM):

The TLB is one of the most critical performance components in modern CPUs. It’s a specialized, very fast hardware cache inside the MMU that stores recent virtual-to-physical address translations.

Why TLB is Essential:

Without a TLB, every memory access requires page table lookup (100+ ns). With a TLB hit (99% of time), address translation is nearly instantaneous (1-10 ns), making paging transparent to performance.

Example: Memory Access with TLB

TLB Hit (1 ns):

Check TLB cache
Get frame number immediately
Form physical address
Access RAM (100 ns) Total: ~110 ns

TLB Miss (200 ns):

Check TLB—no match
Access page table (100 ns)
Get frame number
Access RAM (100 ns) Total: ~200 ns

Why Hit Rate is So High:

Programs exhibit locality of reference—they repeatedly access nearby memory. With 100-1024 TLB entries, most working sets stay cached, achieving 95-99% hit rates.

Real-World Design:

Modern CPUs have two-level TLBs:

L1 TLB: 64 entries, 1 ns (fast, small)
L2 TLB: 512+ entries, 5 ns (larger, slightly slower)

This hierarchical design balances speed with capacity.

Problem 28: TLB Flushing on Context Switch

Why does an OS need to flush the TLB during a context switch? Can you propose a fix to this problem so that the TLB need not be flushed after a context switch?

Student Solution

Answer by Justin:

The OS needs to flush the TLB during a context switch because after a context switch, the TLB will contains entries from the previous process, which does not relate to the current process. To fix this problem so that the TLB does not need to be flushed after a context switch is by adding a context identifier to each entry in the TLB. This makes it so that each entry is mapped to a specific process, so that when a context switch occurs, the new process can just look at the TLB entries that pertain to that process and no others!

Elaboration (by LLM):

This solution elegantly avoids the enormous cost of TLB invalidation by using hardware-managed process tags. It represents a clever hardware-software co-design pattern.

The Problem: TLB Invalidation Cost

On context switch, without ASIDs, the entire TLB must be flushed (thousands of entries):

Cost: 1,000–10,000 CPU cycles per switch
With modern CPUs doing 100+ context switches/second: catastrophic performance hit
Systems with many processes suffer severe degradation

The ASID Solution:

Each process gets a unique Address Space Identifier (4-12 bits, supporting 16-4,096 concurrent processes). TLB entries now store:

On context switch: just update ASID register, not the TLB.

Example with Two Processes:

Process A (ASID=1) accesses virtual page 0x1000 → Frame 0x500

TLB Entry: | 0x1000 | 0x500 | ASID=1 |

Process B (ASID=2) accesses virtual page 0x1000 → Frame 0x400

TLB Entry: | 0x1000 | 0x400 | ASID=2 |

Both entries coexist. During translation, MMU checks: (VPN AND current_ASID) match?

Real-World Implementations:

ARM, MIPS: Full ASID support, 6-12 bits
x86: Intel lacks ASIDs, uses PCIDs (Process Context IDs) in newer chips
Cost: ~2-4% additional TLB entry size for ASID field

Impact:

Preserves 95-99% of TLB hits across context switches, eliminating a major performance bottleneck.

Problem 29: Memory Reference Logic with TLB and Page Faults

Consider an operating system that uses paging in order to provide virtual memory capability; the paging system employs both a TLB and a single-level page table. Assume that page tables are pinned in physical memory. Draw a flow chart that describes the logic of handling a memory reference. Your chart must include the possibility of TLB hits and misses, as well as page faults. Be sure to mark which activities are accomplished by hardware, and which are accomplished by the OS’s page fault exception handler.

LLM Solution

Memory Reference Logic Flow:

CPU Issues Virtual Address
       ↓
   [HARDWARE: MMU]
   Search TLB
   ├─ HIT: Translation found
   │      ↓
   │  Extract Frame Number
   │      ↓
   │  Form Physical Address
   │      ↓
   │  Access Physical Memory
   │      ↓
   │  Return Data → Process
   │
   └─ MISS: Not in TLB
          ↓
      [HARDWARE: MMU]
      Access Page Table
      (Page tables pinned in memory)
          ↓
          ├─ Page Present (Valid bit = 1)
          │     ↓
          │  Extract Frame Number
          │     ↓
          │  Update TLB with translation
          │     ↓
          │  Form Physical Address
          │     ↓
          │  Access Physical Memory
          │     ↓
          │  Return Data → Process
          │
          └─ Page Not Present (Valid bit = 0)
                 ↓
             [SOFTWARE: Page Fault Handler]
             Trap to OS
                 ↓
             Check if page is valid
             (in valid address space?)
                 ├─ Invalid: Protection violation
                 │      ↓
                 │   Terminate Process
                 │   (Segmentation fault)
                 │
                 └─ Valid: Load from disk
                        ↓
                    Check for free frame
                        ├─ Free frame available
                        │     ↓
                        │  Allocate frame
                        │
                        └─ No free frame
                               ↓
                           [Page Replacement]
                           Select victim page
                                ↓
                           Is victim dirty?
                           ├─ Yes: Write to disk
                           └─ No: Just free frame
                                ↓
                           Load page from disk
                                ↓
                        Update PTE with
                        new frame number
                                ↓
                        Update TLB with
                        new translation
                                ↓
                        [HARDWARE: Restart]
                        Reexecute memory
                        reference instruction
                                ↓
                        (Now TLB hit occurs
                         for this reference)

Key Points:

TLB Lookup: Fastest path (nanoseconds)
Page Table Lookup: Medium speed (microseconds)
Page Fault: Slowest path (milliseconds) - requires disk I/O
Hardware Operations: Extraction, address formation, memory access
Software Operations: Page replacement, validity checking, I/O handling
TLB Update: Only after successful page retrieval or TLB miss resolution

Problem 30: Page Fault Rate and Effective Access Time

Supposed we have an OS that uses paged virtual memory and further suppose that the OS stores the page table in CPU registers. (Thus, in this problem, you may assume that looking into the page table has negligible time cost.) Assume that it takes 8 milliseconds to service a page fault if an empty page frame is available or if the victim page is not dirty, and 20 milliseconds if the victim page is dirty. Suppose further that physical memory takes 1 microsecond to access. If the page replacement algorithm selects a dirty victim page 60 percent of the time, then what is the maximum acceptable page fault rate for an effective access time of no more than 2 microseconds? Express the page fault rate “p” as the probability that a memory reference results in a page fault.

Student Solution

Answer by Justin:

Let be the page fault rate (probability of a page fault).

Average page fault service time:

60% of faults involve dirty pages: 20 ms
40% of faults involve clean pages: 8 ms
Average = 0.60(20) + 0.40(8) = 12 + 3.2 = 15.2 ms

Effective Access Time (EAT): $µ$

We want EAT ≤ 2 microseconds = 0.000002 seconds:

Elaboration (by LLM):

This problem illustrates why page fault rates are catastrophic performance bottlenecks—disk I/O costs dwarf CPU speeds by a factor of 10 million.

The Scale Problem: Disk vs. RAM

RAM access: ~1 microsecond (1 × 10⁻⁶ seconds)
Page fault service: ~15 milliseconds (15 × 10⁻³ seconds)
Ratio: 15,000,000× slower!

One page fault wipes out 15 million RAM hits.

Interpreting the Result: 0.00658% Maximum Fault Rate

This means:

1 fault per 15,199 memory references
With modern CPUs at ~10 billion instructions/second
Acceptable: 656,000 faults/second
Unacceptable: One extra fault = catastrophe

Why This Threshold Matters:

If fault rate exceeds this:

Effective memory speed drops below 2 μs
System becomes “thrashing”—spending more time swapping pages than executing
Real work halts; performance collapses

Real-World System Design:

Systems must ensure the working set (actively used pages) fits in physical RAM:

4 GB RAM, 4 KB pages: 1,048,576 page slots
If working set > physical RAM: thrashing begins
Modern SSDs (~1 ms latency) still incur 1,000× penalty

Key Insight:

The maximum acceptable fault rate directly constrains:

Physical memory sizing
Process scheduling (how many concurrent processes)
Page replacement algorithm effectiveness
System viability threshold

Problem 31: Logical Address Space with Eight Pages

Consider a logical (virtual) address space of eight pages of 1024 words each, mapped onto a physical memory of 32 frames.

(a) How many bits are there in the logical address space?

(b) How many bits are there in the physical address space?

(c) Assume that the OS requires all pages of a process to be in memory to execute. What’s the maximum number of processes that can be admitted? What’s the minimum number of processes that can be admitted?

Student Solution

Answer by Justin:

(a) Logical address bits:

8 pages → pages, so page number bits = 3
1024 words per page = words, so offset bits = 10
Total logical address bits = 3 + 10 = 13 bits

(b) Physical address bits:

32 frames → frames, so frame number bits = 5
1024 words per page = words, so offset bits = 10
Total physical address bits = 5 + 10 = 15 bits

(c) Processes that can be admitted:

Each process requires 8 pages = 8 frames
Total frames = 32
Maximum processes = 32 / 8 = 4 processes
Minimum processes = 1 process (at least one process must be in memory)

Elaboration (by LLM):

This problem connects logical address decomposition to memory capacity constraints. It reveals how page table design translates to process scheduling limits.

Address Decomposition Fundamentals:

When a process generates a logical address, the MMU splits it:

$Missing close brace\boxed{\text{Logical Address} = \underbrace{[\text{Page Number}]}_{\text{Index into page table}} \, || \, \underbrace{[\text{Offset}]}_{\text{Within-page position}}$

For this system:

Physical Address Space:

Similarly, physical addresses are split:

$Missing close brace\boxed{\text{Physical Address} = \underbrace{[\text{Frame Number}]}_{\text{Which physical page}} \, || \, \underbrace{[\text{Offset}]}_{\text{Same offset as logical}}$

Why Offset Bits Match:

Pages and frames are same size (1024 words = bytes). The offset within a page never changes during translation—MMU simply replaces page number with frame number.

Process Capacity: The Bottleneck

This is critical:

System total: 32 frames for all processes
OS overhead: 4 frames (reserved)
Available for user processes: 28 frames
Each process needs: 8 frames
Result: maximum 3-4 user processes

Real-World Scaling:

System	Pages/Process	Frames	Max Processes
Small	8 (8 KB)	32	4
Modern	1,000 (4 MB)	262,144	260
Large	262,144 (1 GB)	1,000,000	3-4

Key Insight:

The page table structure (bits needed to index) directly determines:

Maximum addressable memory per process
Total capacity of the system
Feasible process count

This explains why 64-bit systems need multi-level page tables—you can’t waste 40 GB of RAM on a single process’s page table!

Problem 32: Three-level Page Table with Memory Access Time

Consider a system with an average memory access time of 100 nano-seconds, a three level page table (meta-directory, directory, and page table). For full credit, your answer must be a single number and not a formula.

(a) If the system had an average page fault rate of for any page accessed (data or page table related), and an average page fault took 1msec to service, what is the effective memory access time (assume no TLB or memory cache).

Instructor Solution

EAT = 4 x 100ns + // You have to access the page no matter what 10^-4 x 1ms // Time to handle a page fault EAT = 4 x 100ns + 1 x 100ns = 500ns

(b) Now assume the system has no page faults, we are considering adding a TLB that will take 1 nanosecond to lookup an address translation. What hit rate in the TLB is required to reduce the effective access time to 160ns?

Instructor Solution

EAT = 1ns + p x 100ns + // TLB hit (1-p) x 400ns // TLB miss We want EAT to be 160ns. So 160ns = 401ns - 300ns x p p = 241/300 = 0.8033. So 80.33% hit rate is necessary

(c) Somebody suggests increasing the page size to improve the system’s performance. List 2 advantages and 2 disadvantages of making such a decision.

Advantages: (1) Smaller page tables, (2) Less page faults may result due to locality of reference. Disadvantages: (1) More internal fragmentation, (2) More data to move between memory and disk during page fault handling.

Problem 33: Paging System with 512-byte Pages

Consider a memory management system that uses paging. The page size is 512 bytes. Assume that the memory contains 16 pages labeled 0 through 15 and the OS is resident on the first 4 pages of the memory. Assume that there is a single process in the system using up 3 pages of memory. Specifically, the process is resident on physical blocks 5, 7, and 11.

(a) Show the current snapshot of the memory and the page table of the current process.

Student Solution

Answer by Justin:

Current memory snapshot:

Pages 0-3: OS
Pages 4: Empty
Page 5: Process 1 (Page 0)
Page 6: Empty
Page 7: Process 1 (Page 1)
Pages 8-10: Empty
Page 11: Process 1 (Page 2)
Pages 12-15: Empty

Process 1 page table:

Page	Frame
0	5
1	7
2	11

Elaboration (by LLM):

This part demonstrates how page tables map virtual pages to physical frames in a concrete system.

Reading the Page Table:

The page table is Process 1’s translation map:

Virtual page 0 → Physical frame 5 (offset 0 stays the same)
Virtual page 1 → Physical frame 7
Virtual page 2 → Physical frame 11

Memory Layout Visualization:

Key Point:

Process 1 uses only 3 pages (0-2), requiring 3 × 512 = 1536 bytes of its allocated logical address space. Physical frames 5, 7, 11 are non-contiguous, demonstrating paging’s key advantage: pages don’t need to be physically adjacent.

(b) Assume now that a new process arrives and requires 2000 bytes of memory. Allocate space for this process, show its page table, and depict the current snapshot of the memory.

Student Solution

Answer by Justin:

New process requires 2000 bytes = 4 pages (since 2000 / 512 = 3.9, need 4 pages)

Allocated frames for Process 2: 4, 6, 8, 9

New memory snapshot:

Pages 0-3: OS
Page 4: Process 2 (Page 0)
Page 5: Process 1 (Page 0)
Page 6: Process 2 (Page 1)
Page 7: Process 1 (Page 1)
Pages 8-9: Process 2 (Pages 2-3)
Page 10: Empty
Page 11: Process 1 (Page 2)
Pages 12-15: Empty

Process 2 page table:

Page	Frame
0	4
1	6
2	8
3	9

Elaboration (by LLM):

Part (b) shows how the OS allocates non-contiguous frames to a new process and manages shared physical memory across multiple processes.

Process 2 Memory Requirement Calculation:

Process 2 needs 2000 bytes:

Frame Allocation Strategy:

Available frames (after OS): 4, 6, 8, 9, 10, 12-15

OS allocates the first available contiguous/free frames to minimize fragmentation:

Frame 4 (was empty)
Frame 6 (was empty)
Frame 8 (was empty)
Frame 9 (was empty)

Multi-Process Memory State:

Critical Insight:

Both processes coexist in the same physical memory with independent page tables. The OS maintains isolation through separate page tables—Process 1’s virtual page 0 and Process 2’s virtual page 0 map to different physical frames (5 and 4). This demonstrates virtual memory’s fundamental feature: process isolation with efficient memory sharing.

(c) What’s internal fragmentation as it applies to memory management. How many bytes are wasted for the newly arriving process in (b) due to internal fragmentation?

Student Solution

Answer by Justin:

Internal fragmentation occurs when a process does not use all of the memory allocated to it. In this case, Process 2 needs only 2000 bytes but is allocated 4 pages (4 × 512 = 2048 bytes). Therefore, 2048 - 2000 = 48 bytes are wasted due to internal fragmentation.

Elaboration (by LLM):

Internal fragmentation is the inherent cost of fixed-size paging. Unlike variable partitioning (which causes external fragmentation), paging trades one problem for another.

Why 48 Bytes Are Wasted:

Process 2’s allocation structure:

The Wasted Space Equation:

For any process with size and page size :

In the range: bytes maximum

For this problem: Waste = bytes

Real-World Impact Comparison:

Page Size	Process Size	Waste	Waste %
512 B	2000 B	48 B	2.3%
4 KB	2000 B	2096 B	51.2%
4 KB	1 MB	~0-4 KB	<1%

Key Insight:

Small page size (512 B): Low fragmentation for small processes, but large page tables
Large page size (4 KB+): High fragmentation for tiny processes, but efficient for typical workloads
Modern systems: Use mixed approaches (small pages for precision, huge pages for large allocations)

Paging’s Trade-off:

Paging eliminates external fragmentation (no compaction needed) but introduces internal fragmentation. Most systems accept this trade-off because:

No compaction overhead
Pages can be anywhere in memory
Waste is bounded and typically small for well-matched page/process sizes

Problem 34: Two-level Page Table with TLB and Page Faults

Consider an OS running on the following machine: (1) a two level page table (the pointer to the page directory is stored in a register) (2) a memory reference takes 100 nanoseconds

(a) If there is no TLB in the system, and 0.001% of attempts to access physical memory cause a page fault that takes 10msec to service, what is the effective memory access time?

Instructor Solution

Assume that page directories and tables are pinned down (always) in memory. We need to access the page dir, the page table, and then the data reference. The reference to data could cause a page fault. EAT = 300ns + 10^-5 x 10ms = 400 ns.

(b) If we add a TLB and 75% of all page-table references are found there, and 0.001% of the remaining memory references to access physical memory cause a page fault that takes 10msec to service, what is the effective access time of user space memory.

Instructor Solution

Adding a TLB implies that 75% of the time we can skip the memory address translation using page tables. Also if the item is in the TLB, we know it is in memory (not paged out). EAT = .75 x 100ns + .25 x 400 ns = 175ns.

Problem 35: Page Fault Handling

Briefly explain how a page fault occurs and the steps that the memory management system takes to handle a page fault.

Student Solution

Answer by Justin:

A page fault occurs when the CPU checks the page table to see if the page that corresponds to the logical address, and that specific page has not been loaded into main memory yet. When this happens, the OS traps the kernel and invokes the page fault handler. Here, the OS first validates the page. If invalid, the process terminates. If the page is valid, the OS uses a page replacement algorithm if there are no free frames for the page. If there is a free page then that free frame is assigned to the page. Once either is completed, the OS updates the PTE so that the page references the newly assigned frame number.

Elaboration (by LLM):

Page faults are central to virtual memory. They’re the mechanism by which the OS loads pages from disk on demand, and they’re incredibly expensive (millisecond-scale events).

When Does a Page Fault Occur?

A page fault is generated by the MMU when:

Process accesses virtual address
MMU looks up page in page table
Valid bit = 0 (page not in physical memory)
CPU raises exception/trap

MMU Translation Logic:
  if (Page in memory) {    ← Valid bit = 1
    Translation continues
  } else {                 ← Valid bit = 0
    TRAP: Page Fault Handler
  }

Page Fault Handler Steps (Software Exception Handler):

Step 1: Validation

Check if virtual page is in valid address space
If invalid: Process violated memory protection
- Action: Terminate process (segmentation fault)
If valid: Proceed to Step 2

Step 2: Frame Allocation

Two cases:

Case A - Free Frame Available:

Allocated frame = find_free_frame()

Case B - No Free Frames (common):

1. Run page replacement algorithm
2. Select victim page to evict
3. If victim is dirty: Write to disk (I/O wait)
4. Mark victim's PTE as invalid
5. Use victim's frame for new page

Step 3: I/O Operation

Load page from disk into allocated frame
Disk read: ~10-20 milliseconds (stall)
Page is copied into physical memory

Step 4: Page Table Update

Update PTE:
- Valid bit = 1 ✅
- Frame number = newly allocated frame
- Access bits reset
TLB is updated if applicable

Step 5: Resume Execution

CPU restarts the faulting instruction
MMU retranslates using updated page table
Memory access proceeds normally

Cost Analysis:

Operation	Time
Page table lookup	< 1 ns
Free frame search	< 1 μs
Victim page write (dirty)	~10 ms
Page load from disk	~10 ms
Total	~20 ms

Compare: 20 milliseconds = 20,000,000 nanoseconds (20 million normal memory accesses)

Hardware vs. Software Responsibilities:

Component	Role
MMU (Hardware)	Detect fault (page not in memory)
CPU (Hardware)	Raise exception/trap
Page Fault Handler (Software)	Everything else (validation, I/O, updates)

Protection Mechanism:

The page table’s valid bit also enforces process isolation—processes can only access pages in their page table, and missing pages are detected immediately.

Problem 36: Virtual Memory References and Page Faults

Consider the following sequence of virtual memory references generated by a single process in a pure paging system: 10, 11, 104, 170, 73, 309, 185, 245, 246, 434, 458, 364

(a) Assuming a page size of 100 words, what is the reference string corresponding to these memory references?

Student Solution

Answer by Justin:

For a page size of 100 words, each reference xxx maps to page number . Applying that: 0, 0, 1, 1, 0, 3, 1, 2, 2, 4, 4, 3

Elaboration:

For a page size of 100 words, each reference maps to page number .

Applying this to each reference:

10 →
11 →
104 →
170 →
73 →
309 →
185 →
245 →
246 →
434 →
458 →
364 →

Reference string: 0, 0, 1, 1, 0, 3, 1, 2, 2, 4, 4, 3

(b) Determine the number of page faults under each of the following page replacement strategies, assuming that two page frames are available to the process: Optimal, FIFO, LRU

Student Solution

Answer by Justin:

alt text

Problem 37: Belady’s Anomaly

What’s Belady’s anomaly? Briefly explain.

Student Solution

Answer by Justin:

Belady’s anomaly is when you supply a system with more frames but still result in higher rate of page faults, despite there being more frames. For example, one reference string may have a lower page fault ratio compared to the same reference string in a 4 frame environment!

Elaboration (by LLM):

Belady’s anomaly is a counterintuitive phenomenon that violates our intuition: more resources should always improve performance. Yet with certain page replacement algorithms and reference patterns, the opposite occurs.

The Paradox:

Intuitively, adding memory frames should reduce or keep page faults constant:

But with some algorithms:

Classic Example:

Reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5

With 3 Frames (FIFO):

[1]
[1,2]
[1,2,3]
[2,3,4]  ← Fault (evict 1)
[3,4,1]  ← Fault (evict 2)
[4,1,2]  ← Fault (evict 3)
[1,2,5]  ← Fault (evict 4)
[1,2,5]
[1,2,5]
[2,5,3]  ← Fault (evict 1)
[5,3,4]  ← Fault (evict 2)
[3,4,5]  ← Fault (evict 5)
Total: 9 faults

With 4 Frames (FIFO):

[1]
[1,2]
[1,2,3]
[1,2,3,4]
[1,2,3,4]
[1,2,3,4]
[2,3,4,5]  ← Fault (evict 1)
[2,3,4,5]
[2,3,4,5]
[3,4,5,1]  ← Fault (evict 2)
[4,5,1,2]  ← Fault (evict 3)
[5,1,2,3]  ← Fault (evict 4)
Total: 3 faults

Wait, that shows improvement! The real anomaly occurs with specific reference patterns.

Why Does It Happen?

FIFO evicts the oldest page, regardless of future use. With more frames, the older pages stay longer before being evicted, and then when they ARE evicted, they’re accessed sooner in the future reference string.

Which Algorithms Are Vulnerable?

FIFO: ✅ Susceptible to Belady’s anomaly
LRU: ❌ Not susceptible (provably optimal within its class)
Optimal: ❌ Not susceptible (by definition, it’s optimal)

The Fix:

Use algorithms like LRU or Optimal that can’t exhibit Belady’s anomaly. These algorithms understand page usage patterns and make better eviction decisions that improve monotonically with more frames.

Practical Impact:

This was discovered in 1969 and motivated research into better page replacement algorithms. Modern systems use variants of LRU (like clock algorithms) that avoid this pathological behavior.

Problem 38: Page Replacement Algorithms with 3 Frames

Given the reference string 1, 2, 2, 3, 1, 2, 1, 3, 4, 3, 5, 6, 4, 3 and a physical memory of 3 frames, show the state of the memory with Optimal, FIFO, LRU page replacement algorithms. Count the total number of page faults.

Student Solution

Answer by Justin:

alt text

Problem 39: Page Replacement Algorithms with Reference String

Given the reference string 1,2,4,3,2,4,3,5,4,3,5,6,1,3,5 and a physical memory of 3 frames, show the state of the memory after each reference with Optimal, FIFO, LRU page replacement algorithms. Count the total number of page faults.

Student Solution

Answer by Justin:

alt text

Problem 40: Page Reference String with 4 Memory Frames

Consider the following page reference string: 1, 2, 3, 4, 2, 1, 5, 6, 2, 1, 2, 3, 7, 6, 3, 2, 6, 5, 3, 2, 2, 1. How many page faults would occur for Optimal, FIFO, LRU page replacement algorithms assuming 4 memory frames. Remember that all frames are initially empty, so your first unique pages will all cost one fault each. Also show the state of the memory after each page reference and indicate the victim page when a page fault occurs.

LLM Solution

Reference String: 1, 2, 3, 4, 2, 1, 5, 6, 2, 1, 2, 3, 7, 6, 3, 2, 6, 5, 3, 2, 2, 1

FIFO Page Replacement:

Ref	Memory	Fault	Victim
1	[1]	F	-
2	[1,2]	F	-
3	[1,2,3]	F	-
4	[1,2,3,4]	F	-
2	[1,2,3,4]	-	-
1	[1,2,3,4]	-	-
5	[2,3,4,5]	F	1
6	[3,4,5,6]	F	2
2	[4,5,6,2]	F	3
1	[5,6,2,1]	F	4
2	[5,6,2,1]	-	-
3	[6,2,1,3]	F	5
7	[2,1,3,7]	F	6
6	[1,3,7,6]	F	2
3	[1,3,7,6]	-	-
2	[3,7,6,2]	F	1
6	[3,7,6,2]	-	-
5	[7,6,2,5]	F	3
3	[6,2,5,3]	F	7
2	[6,2,5,3]	-	-
2	[6,2,5,3]	-	-
1	[2,5,3,1]	F	6

FIFO Page Faults: 15

LRU Page Replacement:

Ref	Memory	Fault	Victim
1	[1]	F	-
2	[1,2]	F	-
3	[1,2,3]	F	-
4	[1,2,3,4]	F	-
2	[1,2,3,4]	-	-
1	[1,2,3,4]	-	-
5	[1,2,3,5]	F	4
6	[1,2,5,6]	F	3
2	[1,2,5,6]	-	-
1	[1,2,5,6]	-	-
2	[1,2,5,6]	-	-
3	[2,5,6,3]	F	1
7	[5,6,3,7]	F	2
6	[5,6,3,7]	-	-
3	[5,6,3,7]	-	-
2	[5,6,3,2]	F	7
6	[5,6,3,2]	-	-
5	[5,6,3,2]	-	-
3	[5,6,3,2]	-	-
2	[5,6,3,2]	-	-
2	[5,6,3,2]	-	-
1	[6,3,2,1]	F	5

LRU Page Faults: 9

Optimal Page Replacement:

Uses look-ahead to replace page needed furthest in future.

Optimal Page Faults: 8 (Theoretical minimum - optimal algorithm)

Problem 41: Page Fault Analysis with Optimal, FIFO, LRU

Consider a reference string 1, 2, 3, 4, 2, 5, 7, 2, 3, 2, 1, 7, 8. How many page faults would occur for Optimal, FIFO, LRU page replacement algorithms assuming 4 memory frames? Remember all frames are initially empty, so your first unique pages will all cost one fault each.

LLM Solution

Reference String: 1, 2, 3, 4, 2, 5, 7, 2, 3, 2, 1, 7, 8

FIFO Analysis:

Ref	Memory	Fault
1	[1]	F
2	[1,2]	F
3	[1,2,3]	F
4	[1,2,3,4]	F
2	[1,2,3,4]	-
5	[2,3,4,5]	F
7	[3,4,5,7]	F
2	[4,5,7,2]	F
3	[5,7,2,3]	F
2	[5,7,2,3]	-
1	[7,2,3,1]	F
7	[7,2,3,1]	-
8	[2,3,1,8]	F

FIFO Page Faults: 9

LRU Analysis:

Ref	Memory	Fault
1	[1]	F
2	[1,2]	F
3	[1,2,3]	F
4	[1,2,3,4]	F
2	[1,2,3,4]	-
5	[2,3,4,5]	F
7	[3,4,5,7]	F
2	[3,4,5,2]	F
3	[4,5,2,3]	F
2	[4,5,2,3]	-
1	[5,2,3,1]	F
7	[2,3,1,7]	F
8	[3,1,7,8]	F

LRU Page Faults: 10

Optimal Page Faults: 8 (Replaces pages not needed soonest)

Problem 42: Page Reference String Analysis

Consider the following page reference string: 1, 2, 3, 4, 2, 1, 2, 5, 7, 6, 3, 2, 1, 2, 3, 4. How many page faults would occur for Optimal, FIFO, LRU page replacement algorithms assuming 4 memory frames? Remember all frames are initially empty, so your first unique pages will all cost one fault each.

LLM Solution

Reference String: 1, 2, 3, 4, 2, 1, 2, 5, 7, 6, 3, 2, 1, 2, 3, 4

FIFO Analysis:

First 4 unique pages load without replacement: 1, 2, 3, 4 (4 faults)

Continuing:

Page 2: hit
Page 1: hit
Page 2: hit
Page 5: replace 1 → [2,3,4,5] (1 fault)
Page 7: replace 2 → [3,4,5,7] (1 fault)
Page 6: replace 3 → [4,5,7,6] (1 fault)
Page 3: replace 4 → [5,7,6,3] (1 fault)
Page 2: replace 5 → [7,6,3,2] (1 fault)
Page 1: replace 7 → [6,3,2,1] (1 fault)
Page 2: hit
Page 3: hit
Page 4: replace 6 → [3,2,1,4] (1 fault)

FIFO Page Faults: 11

LRU Analysis:

First 4 unique pages: 1, 2, 3, 4 (4 faults)

Page 2: hit
Page 1: hit
Page 2: hit
Page 5: replace least recent (3) → [1,2,4,5] (1 fault)
Page 7: replace least recent (4) → [1,2,5,7] (1 fault)
Page 6: replace least recent (1) → [2,5,7,6] (1 fault)
Page 3: replace least recent (2) → [5,7,6,3] (1 fault)
Page 2: replace least recent (5) → [7,6,3,2] (1 fault)
Page 1: replace least recent (7) → [6,3,2,1] (1 fault)
Page 2: hit
Page 3: hit
Page 4: replace least recent (6) → [3,2,1,4] (1 fault)

LRU Page Faults: 10

Optimal Page Faults: 9 (Replaces pages needed furthest in future)

Problem 43: Page Faults with 3 Memory Frames

Consider the following page reference string: 1, 3, 2, 5, 1, 3, 4, 1, 3, 2, 5. How many page faults would occur for Optimal, FIFO, LRU page replacement algorithms assuming 3 memory frames? Remember all frames are initially empty, so your first unique pages will all cost one fault each.

LLM Solution

Reference String: 1, 3, 2, 5, 1, 3, 4, 1, 3, 2, 5

FIFO Analysis:

Ref	Memory	Fault	Victim
1	[1]	F	-
3	[1,3]	F	-
2	[1,3,2]	F	-
5	[3,2,5]	F	1
1	[2,5,1]	F	3
3	[5,1,3]	F	2
4	[1,3,4]	F	5
1	[1,3,4]	-	-
3	[1,3,4]	-	-
2	[3,4,2]	F	1
5	[4,2,5]	F	3

FIFO Page Faults: 9

LRU Analysis:

Ref	Memory	Fault	Victim
1	[1]	F	-
3	[1,3]	F	-
2	[1,3,2]	F	-
5	[3,2,5]	F	1 (LRU)
1	[2,5,1]	F	3 (LRU)
3	[5,1,3]	F	2 (LRU)
4	[1,3,4]	F	5 (LRU)
1	[1,3,4]	-	-
3	[1,3,4]	-	-
2	[3,4,2]	F	1 (LRU)
5	[4,2,5]	F	3 (LRU)

LRU Page Faults: 9

Optimal Analysis:

Optimal replaces the page not referenced for longest time in future.

Looking ahead: 1, 3, 2, 5, 1, 3, 4, 1, 3, 2, 5

Ref	Memory	Fault	Victim (not needed soonest)
1	[1]	F	-
3	[1,3]	F	-
2	[1,3,2]	F	-
5	[1,3,5]	F	2 (next ref at pos 10)
1	[1,3,5]	-	-
3	[1,3,5]	-	-
4	[1,3,4]	F	5 (never referenced again)
1	[1,3,4]	-	-
3	[1,3,4]	-	-
2	[1,3,2]	F	4 (never referenced again)
5	[1,3,5]	F	2 (never referenced again)

Optimal Page Faults: 7

Problem 44: Working Set Strategy

Consider the reference string 1 2 3 4 5 2 3 4 3 2 4 4 2 4 4 4. Assuming the working set strategy, determine the minimum window size such that the string generates at most 5 page faults. Show which pages are in the working set (and, thus, in memory) at each reference.

LLM Solution

Working Set Strategy: Pages referenced in the most recent k references form the working set. Faults occur when a page is referenced but not in the current working set.

Reference String: 1 2 3 4 5 2 3 4 3 2 4 4 2 4 4 4

Testing Window Size δ = 6:

Ref #	Page	Window (last 6 refs)	Working Set	Fault?
1	1	[1]	{1}	F
2	2	[1,2]	{1,2}	F
3	3	[1,2,3]	{1,2,3}	F
4	4	[1,2,3,4]	{1,2,3,4}	F
5	5	[1,2,3,4,5]	{1,2,3,4,5}	F
6	2	[1,2,3,4,5,2]	{1,2,3,4,5}	-
7	3	[2,3,4,5,2,3]	{2,3,4,5}	-
8	4	[3,4,5,2,3,4]	{2,3,4,5}	-
9	3	[4,5,2,3,4,3]	{2,3,4,5}	-
10	2	[5,2,3,4,3,2]	{2,3,4,5}	-
11	4	[2,3,4,3,2,4]	{2,3,4}	-
12	4	[3,4,3,2,4,4]	{2,3,4}	-
13	2	[4,3,2,4,4,2]	{2,3,4}	-
14	4	[3,2,4,4,2,4]	{2,3,4}	-
15	4	[2,4,4,2,4,4]	{2,4}	-
16	4	[4,4,2,4,4,4]	{2,4}	-

Total Faults with δ = 6: 5 ✅

Minimum Window Size: δ = 6

Problem 45: LRU Page Replacement Implementation

Why is it difficult to perfectly implement LRU page replacement algorithm? What bit(s) in the PTE do OSs use to approximate LRU?

Student Solution

Answer by Justin:

It is difficult to perfectly implement the LRU page replacement algorithm because any time a page is referenced and it is in the cache, you need to update its reference bit, and when a new page is referenced you need to figure out which was used least recently using its reference bit and an external data structure like a list to figure out which was used least recently.

Elaboration (by LLM):

Perfect LRU is theoretically optimal but practically infeasible due to the overhead of maintaining exact recency information. This reveals a fundamental systems trade-off: optimal algorithms are often too expensive to implement.

The Perfect LRU Problem:

To implement true LRU, you must:

Track exact access time for every page in memory
Update this timestamp on every memory access—both reads and writes
Search all frames to find minimum timestamp on every fault
Maintain ordering in a data structure (linked list, heap, etc.)

The Overhead Cost:

For each memory access:

Normal access:  Load data from cache/RAM (~1-100 ns)
Perfect LRU:    + Update timestamp in PTE (~5-10 ns)
                + Update data structure (~10-100 ns)
                + Acquire locks for thread safety (~10-50 ns)

Overhead: 10-50% of memory access time—unacceptable!

The PTE Bits Approach:

Modern systems use a simpler approximation:

Value = 1: Page was referenced recently
Value = 0: Page hasn’t been referenced

How It Works:

1. MMU sets reference bit = 1 on every access (hardware)
2. OS periodically clears all bits (every ~100 ms, software)
3. On page fault: evict page with reference bit = 0

Example Timeline:

Time 0:     Clear all reference bits
            [Page 1: 0] [Page 2: 0] [Page 3: 0] [Page 4: 0]

Page 1 accessed:
            [Page 1: 1] [Page 2: 0] [Page 3: 0] [Page 4: 0]

Page 3, Page 1 accessed:
            [Page 1: 1] [Page 2: 0] [Page 3: 1] [Page 4: 0]

Time 100 ms: Clear all bits again
            [Page 1: 0] [Page 2: 0] [Page 3: 0] [Page 4: 0]

On fault: Evict any page with ref=0

Additional Bits Used:

Modern PTEs often include:

Bit	Name	Purpose
1	Reference (R)	Used recently? (approximates LRU)
1	Dirty/Modified (D)	Page modified? (prioritize clean eviction)
1	Valid (V)	Page in memory?

Why This Works:

The reference bit provides a coarse approximation:

Pages accessed in the last ~100 ms: ref=1
Pages not accessed in the last ~100 ms: ref=0

This divides pages into two classes, not a perfect ordering, but:

Cost: 1 bit per page
Hardware support: Already built-in
Effectiveness: 80-90% as good as perfect LRU

Real-World Implementations:

OS	Algorithm	Bits Used
Linux	Clock/Second Chance	1-2 bits (R, D)
Windows	Working Set	Multiple bits
macOS	Approximate LRU	Aging counter

Key Insight:

Perfect algorithms are often replaced by good-enough approximations with manageable overhead. This is a common pattern in systems design: trade a small amount of optimality for massive practical gains.

Problem 46: LRU Clock Algorithm

Briefly explain how LRU clock or second chance algorithm works.

LLM Solution

LRU Clock Algorithm (Second Chance Algorithm):

The clock algorithm uses a circular buffer of page frames with a reference bit associated with each page. It works as follows:

Page Replacement: When a page fault occurs and all frames are full, the algorithm points to the oldest page in memory (using a clock pointer).
Reference Bit Check:
- If the reference bit is 0: The page hasn’t been referenced recently → evict it immediately
- If the reference bit is 1: The page has been used recently → give it a “second chance”
Second Chance Process:
- When a page with reference bit 1 is encountered, set its reference bit to 0
- Advance the clock pointer to the next page
- Continue searching for a page with reference bit 0
Clock Advancement: The pointer moves circularly through all frames in the order they were loaded.

Example:

Suppose we have 4 frames and need to replace a page:

Frame 0: Page 2 (ref=1)  ← Clock pointer
Frame 1: Page 5 (ref=0)
Frame 2: Page 3 (ref=1)
Frame 3: Page 7 (ref=0)

Check Frame 0: ref=1 → Set ref=0, advance pointer
Check Frame 1: ref=0 → Evict Page 5 (insert new page here)
Load new page into Frame 1

Advantages:

Low overhead compared to true LRU
Uses only one bit per page
Approximates LRU behavior reasonably well

Disadvantages:

Not as accurate as true LRU
In worst case, may need multiple passes through frames

Problem 47: LRU Enhanced Second Chance Algorithm

Briefly explain how LRU enhanced second chance algorithm works.

LLM Solution

Enhanced Second Chance Algorithm (Using Reference and Dirty Bits):

This algorithm improves upon the basic clock/second chance algorithm by considering both the reference bit and the dirty (modified) bit.

How it Works:

The algorithm classifies pages into 4 categories based on (reference bit, dirty bit):

(0, 0): Not referenced, not dirty → Best candidate for eviction (no I/O needed)
(0, 1): Not referenced, but dirty → Good candidate (requires write to disk)
(1, 0): Referenced, not dirty → Fair candidate (will be evicted on next clock pass)
(1, 1): Referenced and dirty → Worst candidate (needs write AND may be used soon)

Algorithm Steps:

First Pass: Look for a page with (0, 0) - evict it immediately
Second Pass (if needed): Look for a page with (0, 1) - evict it (but requires writing to disk)
Third Pass (if needed): Repeat passes 1 and 2, but reset reference bits to 0 for pages with (1, 0) and (1, 1)
Evict the first page with (0, 0) found in third pass

Clock Pointer Movement:

On each pass, the clock pointer advances to next frame
When a page with ref=1 is encountered, set ref=0 and move on
Only evict when you find a page with ref=0

Advantages:

Reduces disk I/O by preferring clean pages
Still uses minimal overhead (2 bits per page)
Better approximates LRU than basic second chance
Balances page recency with modification status

Example:

Frame 0: Page A (ref=0, dirty=1)
Frame 1: Page B (ref=1, dirty=0)  ← Clock pointer
Frame 2: Page C (ref=0, dirty=0)
Frame 3: Page D (ref=1, dirty=1)

First pass: Check B(1,0)→skip, D(1,1)→skip, A(0,1)→consider, C(0,0)→EVICT

Problem 48: Working Set of a Process

What’s meant by the working set of a process. Given the reference string 1, 2, 3, 2, 2, 1, 3, 2, 2, 1 and a reference window size of 5, what’s the process’s working set at the end of the last page reference?

LLM Solution

Definition of Working Set:

The working set of a process is the set of pages referenced during the most recent k memory references. This represents the pages a process is “actively using” within a given time window. Pages in the working set should be kept in physical memory to minimize page faults.

Notation: WS(t, δ) = working set at time t with window size δ

Window Size: δ = number of most recent references to consider

Reference String: 1, 2, 3, 2, 2, 1, 3, 2, 2, 1 with δ = 5

Computing Working Set at Each Step:

Step	Ref	Last 5 Refs	Working Set	Size
1	1	[1]	{1}	1
2	2	[1,2]	{1,2}	2
3	3	[1,2,3]	{1,2,3}	3
4	2	[1,2,3,2]	{1,2,3}	3
5	2	[1,2,3,2,2]	{1,2,3}	3
6	1	[2,3,2,2,1]	{1,2,3}	3
7	3	[3,2,2,1,3]	{1,2,3}	3
8	2	[2,2,1,3,2]	{1,2,3}	3
9	2	[2,1,3,2,2]	{1,2,3}	3
10	1	[1,3,2,2,1]	{1,2,3}	3

Working Set at End of Last Reference (Step 10):

WS(10, 5) = {1, 2, 3} with size 3

This means the process should keep pages 1, 2, and 3 in physical memory at this point.

Key Insights:

Working set size varies as execution progresses
A larger working set means more memory needed to avoid faults
Working set window (δ) can be adjusted based on system resources
If available frames ≥ working set size, no page faults occur

Problem 49: Trashing in Virtual Memory

What’s trashing and how does it occur? Briefly explain.

LLM Solution

Definition of Trashing:

Trashing is a condition where the CPU spends most of its time managing page faults rather than executing user processes. The system experiences severe performance degradation due to excessive paging activity.

How Trashing Occurs:

Insufficient Physical Memory:
- Too many processes in the system
- Each process has a large working set
- Total working set sizes exceed available physical memory
Page Fault Cascade:
- Process references a page not in memory → Page fault
- OS evicts a page to make room
- The evicted page is needed shortly → Another page fault
- This repeats continuously
Disk Thrashing:
- Continuous paging to/from disk
- Disk I/O becomes the bottleneck
- CPU utilization drops (mostly waiting for disk)

Example Scenario:

System has 4 MB of RAM
4 processes, each with 2 MB working set = 8 MB total demand
Only 4 MB available → pages constantly swapped in and out

Timeline:
1. Process A executes
2. Needs page not in memory → Page fault
3. Evicts page from Process B
4. Process B resumes
5. Needs evicted page → Page fault
6. Evicts page from Process C
...
CPU utilization: 5% | Disk I/O: 95% (mostly paging)

Symptoms:

Very low CPU utilization
Very high disk I/O activity
System appears “frozen” or very slow
Long response times for user commands

Solutions:

Increase physical memory - More frames available
Reduce process load - Run fewer concurrent processes
Implement better page replacement - Minimize unnecessary evictions
Use working set strategy - Suspend processes with large working sets
Adjust page size - Balance between fragmentation and I/O overhead

Prevention:

Monitoring the degree of multiprogramming:

If CPU utilization drops below 20% due to I/O waits → Trashing likely
Reduce number of processes in memory
Maintain balance between CPU and I/O operations