View syscall table

01 - Introduction

1. Short Answer Questions

(a) List 2 benefits of having an OS?

Student Solution

Answer by Justin:

You do not have to interact directly with the hardware, the OS does that for you. It’s also very secure against outsiders

Elaboration (by LLM):

The two benefits mentioned capture fundamental purposes of an operating system:

Abstraction/Hardware Abstraction Layer: Without an OS, every programmer would need to understand and directly program the hardware (CPU, memory, disk, network devices). This would be extraordinarily complex and error-prone. The OS provides a simplified interface (system calls) that abstracts away hardware details. For example, instead of programming disk controllers directly, you call open(), read(), write()—the OS handles the specifics of your particular disk model.
Security & Protection: The OS enforces isolation between user programs so that a malicious (or buggy) program cannot crash the entire system or access another user’s data. The OS uses hardware privilege levels (user mode vs kernel mode) and virtual memory (page tables) to ensure that one process cannot directly access another process’s memory or perform privileged operations. This protects both system integrity and user privacy.

Additional Benefits Worth Mentioning:

Resource Management: The OS multiplexes limited hardware resources (CPU, memory, disk) among multiple programs so they appear to have exclusive access
Convenience: Higher-level services like file systems, networking stacks, and device drivers are provided by the OS

(b) What are the major components of an operating system? Briefly explain their responsibilities.

Student Solution

Answer by Justin:

The major components of an operating system are the process manager, memory manager, and I/O manager.

The process manager manages the creation, scheduling, and termination of processes.
The memory manager manages the allocation and deallocation of memory.
The I/O manager manages input devices such as the keyboard, mouse, disks, etc.

Elaboration (by LLM):

These three components work together to manage all aspects of system resources:

1. Process Manager (CPU Management)

Responsibilities:

Process Creation: Allocates process control blocks (PCBs), virtual address spaces, and initializes file descriptors
Scheduling: Decides which process runs on the CPU at any given time—implements scheduling algorithms (FCFS, Round-Robin, Priority)
Process Termination: Reclaims all resources when a process exits (memory, open files, child processes)
Context Switching: Rapidly switches between processes so all appear to run concurrently

2. Memory Manager (RAM Management)

Responsibilities:

Allocation: Assigns memory to processes when they’re created or request more
Protection: Isolates process memory using virtual memory and page tables so one process cannot access another’s memory
Deallocation: Reclaims memory when processes terminate
Paging/Swapping: Moves data between RAM and disk to handle programs larger than RAM
Caching: Keeps frequently-accessed data in fast memory hierarchies

3. I/O Manager (Device Management)

Responsibilities:

Device Abstraction: Provides uniform interfaces for diverse devices (disks, keyboards, network cards)
Device Drivers: Manages hardware-specific drivers that communicate with physical devices
Buffering: Buffers I/O data to smooth differences in speed between fast CPU and slow devices
Interrupt Handling: Responds to device interrupts (disk operation complete, packet arrived)
Scheduling: Determines order of I/O requests (disk head scheduling, network packet ordering)

Interaction Example:

When a user types on the keyboard:

I/O Manager: Keyboard interrupt occurs → I/O manager’s interrupt handler reads keyboard buffer
Process Manager: I/O manager wakes up the text editor process (if sleeping)
Process Manager: Scheduler assigns CPU time to text editor
Memory Manager: Virtual memory ensures text editor’s memory pages are present
The text editor reads the character and displays it

(c) Briefly explain what’s multi-tasking and what is the main motivation for it? Is Linux a multi-tasking OS? Why or why not?

Student Solution

Answer by Justin:

Multitasking is allowing the CPU to continue executing a process while another process waits for I/O. This makes it so that the CPU is not idle while a process waits for I/O, optimizing throughput! Linux is a multitasking OS

Elaboration (by LLM):

Core Concept:

Multitasking is the OS’s ability to execute multiple processes (or threads) such that while one is blocked waiting for I/O, another can run on the CPU. Without multitasking, the CPU would sit idle whenever a process performs I/O (disk read, network request, keyboard input), wasting precious computing resources.

Motivation - CPU Utilization:

Consider a typical process timeline:

Process A:  [ compute ] [I/O wait ......] [ compute ] [I/O wait ......]
            └─ ~1 ms ─┘ └── 10-100 ms ──┘ └─ 1 ms ──┘ └── 10-100 ms ──┘

Without multitasking, the CPU is idle during I/O waits, resulting in low utilization:

With multitasking, while Process A waits for I/O, the OS switches to Process B:

Process A:  [compute] [I/O wait ........] [compute] [I/O wait ........]
Process B:           [compute] [I/O wait ........] [compute] [I/O wait ........]
Process C:                     [compute] [I/O wait ........] [compute]
CPU:        [A works] [B works] [C works] [A works] [B works] [C works]

Now the CPU is nearly always executing something, dramatically improving utilization.

Why Linux is Multitasking:

Linux implements full multitasking support:

Context Switching: Can interrupt any process at any time via timer interrupts
Scheduler: Uses sophisticated scheduling algorithms to fairly distribute CPU time
I/O Abstraction: Provides blocking I/O calls that trigger context switches (process sleeps, OS runs another)
Multiple Processes: Can run hundreds of processes simultaneously

Evidence that Linux is Multitasking:

ps aux  # Shows 100+ processes running
htop    # Visual CPU utilization across multiple processes

Let a single CPU core run multiple processes—this multitasking capability is fundamental to modern Linux.

(d) Briefly explain when you will use a timesharing OS? Is Windows a timesharing OS?

Student Solution

Answer by Justin:

You want to use a time-sharing OS when you have multiple users using one centralized system but want to give the illusion that they all have their own independent machine. An example of this would be employees at a company each using remote desktop connections that are connected to a centralized device that all employees are using at the same time. Windows is an example of a time-sharing OS, as it supports multitasking and allows multiple users to share system resources.

Elaboration (by LLM):

When to Use Timesharing OS:

A timesharing OS is appropriate when:

Multiple Independent Users: Several users need to use the same computer simultaneously without interfering with each other
Interactive Access Required: Users expect responsive systems where their keystrokes/commands are handled quickly
Resource Sharing: The cost of the hardware makes sharing more economical than individual machines
Central Management: System administrators need unified control over security, software, and data

Classic Example: Mainframe Era (1960s-1980s)

Before personal computers:

Single expensive mainframe served dozens of users
Each user had a dumb terminal (keyboard + screen)
OS rapidly switched CPU between user sessions
Each user perceived exclusive machine access

Modern Examples:

Remote Desktop/Cloud Services (as mentioned in the answer):
- Multiple remote users → single centralized server
- Each sees their own desktop
- Server multitasks between them
Web Servers:
- Single server handles many concurrent clients
- Each “feels” they have exclusive connection
- Behind the scenes: OS context-switches between serving them
University Lab Computers:
- One high-spec machine shared by many students
- Each student logs in remotely or locally
- All run their code simultaneously

Is Windows a Timesharing OS?

Yes, with nuance:

✅ Windows supports key timesharing features:

Multiple users can log in simultaneously (different sessions)
Multitasking: many processes run concurrently
Context switching at millisecond timescales
Virtual memory isolation between users
Interactive response (handles user input rapidly)

⚠️ However: Modern Windows is primarily designed as a single-user, multitasking OS (one interactive user at a time, but many background processes). Traditional timesharing OSs like Unix/Linux are designed for many simultaneous interactive users.

Comparison:

Feature	Timesharing (Unix/Linux)	Windows	Modern Mobile OS
Multiple interactive users	Yes	Limited (one desktop session)	No
Multitasking	Yes	Yes	Yes
Context switching	Yes	Yes	Yes
Response time optimization	Critical	Important	Critical
Resource isolation	Strong (multi-user)	Moderate	Strong

(e) Define the essential properties of the following 2 types of operating systems: batch, timesharing.

Student Solution

Answer by Justin:

Batch: Execute jobs in batches without human interaction during execution
Timesharing: Timesharing Operating Systems allow users to access system resources at the same time through rapid context switching

Elaboration (by LLM):

Batch Operating Systems:

Definition: A batch system collects multiple jobs, processes them in groups, and executes them sequentially with no human interaction during execution.

Essential Properties:

No Interactivity: Jobs cannot accept user input while running. All input is provided upfront (punch cards, input files)
Sequential Execution: One job completes fully before the next begins
Throughput Optimization: Goals are to maximize CPU utilization and job completion rate, not response time
Simple Scheduling: Typically FCFS (First-Come-First-Served)—no complex algorithms needed
Minimal Overhead: No need for rapid context switching or multi-user isolation

Historical Context: 1950s-1960s mainframes (IBM System/360)

Example:

Submit 50 jobs at 8 AM
System processes all 50 overnight
Results printed out at 8 AM next day

Timesharing Operating Systems:

Definition: A timesharing system allows multiple users to access the same computer simultaneously, each perceiving exclusive access through rapid context switching.

Essential Properties:

Interactive: Users expect immediate response to commands (sub-second response time)
Concurrent Execution: Multiple user processes appear to run simultaneously
Context Switching: OS rapidly switches CPU between processes (every 10-100 ms)
Response Time Optimization: Critical metric is how fast user sees results, not total throughput
Sophisticated Scheduling: Fair allocation of CPU time, priority management
Multi-User Isolation: Strong separation between user sessions and data
Virtual Memory: Each user has independent virtual address space

Modern Examples: Unix/Linux, older mainframe systems with multiple terminals

Example:

User A: types "ls" → results appear in 0.1 seconds
User B: runs computation → sees progress updates as CPU switches between users
User C: downloads file → runs in background without blocking User A/B

Key Differences:

Property	Batch	Timesharing
User Interaction	None (offline)	Continuous (online)
Number of Users	Single (job sequence)	Multiple (simultaneous)
Response Time	Not important	Critical (<1 second expected)
CPU Scheduling	Simple (FCFS)	Complex (fairness, priority)
Typical Workload	Data processing, reports	Interactive commands, development
Historical Era	1950s-1960s	1970s onward

(f) Briefly define the following terms: (1) multi-tasking (2) timesharing. Does every timesharing OS have to be multi-tasking?

Student Solution

Answer by Justin:

(1) Multi-tasking: allow the CPU to execute processes while another process is waiting on I/O to optimize throughput
(2) Timesharing: allow multiple users to use one centralized device as if they are the only ones using it

Does every time-sharing OS have to be multitasking?
Yes! To be a timesharing OS, you need to execute multiple processes for multiple users at once, and to do that, you need to be a multitasking system so the CPU won’t idle any time a process waits on I/O and the CPU can switch between processes rapidly.

Elaboration (by LLM):

Multitasking vs Timesharing - Different Dimensions:

These are two independent concepts that describe different aspects of an OS:

Multitasking = CPU Concurrency (One dimension)

Axis: How many processes execute concurrently?
Focus: Optimizes CPU utilization by switching between processes
Motivation: Avoid CPU idleness during I/O waits
Scale: Typically optimizes throughput for a single user or batch workload

Timesharing = User Concurrency (Different dimension)

Axis: How many simultaneous interactive users?
Focus: Give each user the illusion of exclusive machine access
Motivation: Multiple users sharing expensive hardware
Scale: Optimizes response time for all users

Relationship Between Multitasking and Timesharing:

The student’s answer is correct: every timesharing OS must be multitasking, but not every multitasking OS must be timesharing.

Why Timesharing Requires Multitasking:

For timesharing to work, the OS must rapidly switch between user processes. This requires context switching—the core mechanism of multitasking. Here’s why:

Timesharing without Multitasking = Disaster:

User A: Types command "find / -name '*.txt'" (takes 60 seconds)
User B: Waits... waits... waits... (locked out for 60 seconds)
User C: Waits... waits... waits... (locked out for 60 seconds)

Result: Terrible experience, defeats purpose of timesharing

With multitasking (context switching):

User A: [run 10ms] [I/O wait...] [run 10ms] [I/O wait...]
User B: [        ] [run 10ms] [I/O wait...] [run 10ms] [I/O wait...]
User C: [        ] [        ] [run 10ms] [I/O wait...] [run 10ms]
CPU:    [A] [B] [C] [A] [B] [C] [A] [B] [C]

Each user feels responsive even though they’re sharing the CPU.

Conversely: Multitasking Without Timesharing is Possible:

A batch system could be multitasking (context-switching between jobs to optimize throughput) but NOT timesharing (no interactive users). Example: A data center running 1000 background compute jobs:

Multitasking: ✅ OS context-switches between jobs
Timesharing: ❌ No interactive users, so no timesharing

The Venn Diagram:

┌──────────────────────────────────┬──────────────────────────────────┐
│                                  │                                  │
│   Multitasking                   │   Timesharing                    │
│   (CPU concurrency)              │   (User concurrency)             │
│                                  │                                  │
│  ┌──────────────────────────────────┬─────────────────────────┐    │
│  │                                                             │    │
│  │  Modern OS                                                 │    │
│  │  (Multitasking + Timesharing)                             │    │
│  │  - Linux, Windows, macOS                                   │    │
│  │  - Context switching (multitasking)                        │    │
│  │  - Multiple interactive users (timesharing)                │    │
│  │                                                             │    │
│  └──────────────────────────────────┬─────────────────────────┘    │
│                                      │                              │
│  Batch with Multitasking (no timesharing)                          │
│  - Data center with 1000 background jobs                           │
│  - Context switches between jobs                                   │
│  - No interactive users                                            │
└──────────────────────────────────┴──────────────────────────────────┘

(g) If your workload consists only of batch jobs, i.e., jobs that work in the background requiring no user interaction, would you use a timesharing OS? Why or why not? Briefly explain.

LLM Solution

No, you would not use a timesharing OS for a batch-only workload. Batch systems execute jobs sequentially without human interaction, so there is no need for rapid context switching between multiple users. A simpler batch processing OS would be more efficient and have less overhead.

Elaboration:

Timesharing operating systems are fundamentally designed to serve multiple users simultaneously, which introduces substantial complexity and overhead. Every user needs the illusion of exclusive machine access, which requires rapid context switching, sophisticated scheduling algorithms, and continuous management of multiple user sessions. However, if your workload consists only of batch jobs that run in the background without any human interaction, all of this machinery becomes pure waste. Batch processing systems can be much simpler—they just process jobs one after another, without needing to pause to handle user input or maintain interactive sessions.

Design Trade-offs:

Timesharing OS: Optimized for multiple users sharing system resources simultaneously, requiring rapid context switching and sophisticated scheduling
Batch System: Process jobs sequentially in the background with no human interaction

Why Timesharing is Wasteful for Batch Workloads:

For a batch-only workload, the overhead of timesharing features becomes pure waste:

Context switching overhead:
Maintaining user sessions and terminal I/O handling
Interrupt handling for interactive user inputs
Complex scheduling algorithms

Benefits of Dedicated Batch System:

Simpler scheduling algorithms (FCFS—first-come-first-served)
Minimal context switching (possibly none)
Optimized for throughput rather than response time

Example: A data warehouse running nightly ETL jobs would use batch processing, not timesharing. The jobs run sequentially, each consuming full CPU resources without interruption.

(h) Consider a system where I/O devices can only be programmed (controlled) with polling. Would a multi-tasking OS be able to increase CPU utilization in this system? Justify your answer.

Student Solution

Answer by Justin:

A multi-tasking OS would not be able to increase CPU utilization in this system, mainly because the CPU would be busy polling the I/O devices and waiting for a response. In this case, it would not be able to continue another process while another waits for I/O.

Elaboration (by LLM):

This is a correct and insightful answer that highlights a fundamental limitation of polling-based I/O control.

The Core Problem:

The key to multitasking’s effectiveness is interrupt-driven I/O, which allows the CPU to context-switch to another process while waiting for I/O to complete. With polling, this advantage is eliminated:

Interrupt-Driven I/O (Normal Situation):

Process A: [compute] [issues disk read] → sleeps
           (CPU: 0ms overhead)
CPU:       → switches to Process B [compute for 10ms]
           → switches to Process C [compute for 10ms]
           → disk interrupt fires → wakes Process A
CPU:       → switches back to Process A [continue]

Result: High CPU utilization, all processes make progress

Polling-Based I/O (This System):

Process A: [compute] [issues disk read] → MUST poll
CPU:       [continuously checking: "is it done?"] [is it done?] [is it done?]
           (100% of CPU cycles wasted on polling)
Process B: Can never run because CPU is busy polling
Process C: Can never run because CPU is busy polling

Result: Low CPU utilization, other processes starve

Why Polling Wastes CPU Cycles:

In a polling system, once a process issues an I/O request, it must actively check the device status repeatedly in a tight loop:

// Polling loop - wastes CPU cycles
while (!device_ready()) {
    // Spin, spin, spin...
}
read_data();

The CPU never goes idle and never context-switches. Even with multitasking, only one process can run (the one polling), making multitasking useless.

When Multitasking Would Help (Interrupt-Driven):

With interrupts, the I/O device signals completion asynchronously:

// Interrupt-driven - CPU can do other work
device_read_async();  // returns immediately
// CPU context-switches to another process
// ... when device finishes, interrupt fires
// CPU resumes original process

Real-World Analogy:

Interrupt-driven: You submit a form at the DMV and sit down. When your number is called, you respond. Meanwhile, others are served.
Polling: You submit a form and repeatedly jump up every second asking “Is it ready?” Everyone else has to wait because you’re monopolizing the staff’s attention.

Conclusion:

Without interrupt support, polling-based I/O completely negates the benefits of multitasking because one process’s I/O activity monopolizes the CPU, preventing context-switching to other processes.

(i) Explain why it is necessary to re-implement your Linux program under Windows (for most nontrivial code).

LLM Solution

Linux and Windows have different system call interfaces, memory layouts, and binary formats (ELF vs PE), making direct program porting impossible without recompilation and often code changes.

Elaboration:

While Linux and Windows are both modern operating systems, they were built from the ground up with completely different architectural decisions. The system calls that applications use to interact with the kernel are entirely different between the two platforms. Not only do individual system calls have different names and behaviors, but even basic operations like creating processes or managing memory follow fundamentally incompatible patterns. On top of that, the binary formats that executable files use are completely different, which means a Windows executable cannot even be loaded on Linux, let alone run. This combination of API incompatibility, different system call interfaces, and incompatible binary formats means you cannot simply recompile Linux code for Windows—you need to substantially rewrite the parts that interact with the operating system.

While both are POSIX-like systems, Linux and Windows differ significantly in their underlying architectures and APIs. The fundamental incompatibility stems from multiple factors:

System Call & API Differences:

Operation	Linux	Windows
File I/O	`open()`, `read()`, `write()`	`CreateFile()`, `ReadFile()`
Process Creation	`fork()` + `exec()`	`CreateProcess()`
Memory Management	`mmap()`	`VirtualAlloc()`
Threading	POSIX threads (`pthread`)	Windows threads (via kernel API)

Binary Format Incompatibility:

Linux: ELF (Executable and Linkable Format)
Windows: PE (Portable Executable)

These formats are structurally incompatible at the binary level; recompilation alone is insufficient.

Library & API Variations:

Even standard functions like those in libc have platform-specific behavior. System-level code requiring I/O or process management must be completely rewritten.

Example: A Linux program using fork() to spawn child processes:

// Linux version
if (fork() == 0) {
    // child process
    exec(program);
}

Cannot run on Windows without reimplementing using CreateProcess() with completely different semantics.

(j) Why is it necessary to have two modes of operation, e.g., user and kernel mode, in modern OSs? Briefly explain the difference between the user mode and kernel mode.

Student Solution

Answer by Justin:

This is important because we need to be able to protect the system from malicious users! We do not want to give users the ability to run any system call anytime they want! That is why we have user mode and kernel mode, where, while in user mode, only certain permitted system calls can be made. In kernel mode, all system calls are accessible and can be called, which makes it so that the user must first securely trap to the kernel to access these system calls!

Elaboration (by LLM):

This answer captures the security motivation perfectly. Let’s expand on the technical and practical aspects:

Fundamental Problem Without Privilege Separation:

Imagine a single-mode system where all programs run at the same privilege level:

Application 1: [can directly access hardware] [can modify kernel memory] [can shut down system]
Application 2: [can directly access hardware] [can spy on other apps] [can corrupt data]
User Data: [completely vulnerable] [no isolation] [no protection]

Result: One malicious or buggy program crashes the entire system or steals data.

Solution: Privilege Levels

Modern CPUs provide hardware-enforced privilege levels:

User Mode (Ring 3) - Restricted Privileges:

What’s Allowed:
- Execute application code (read, write, compute)
- Call system calls (via syscall/int 0x80)
- Access only allocated memory pages (enforced by MMU)
- I/O operations only through OS (no direct hardware access)
What’s Forbidden:
- Execute privileged instructions (modify page tables, disable interrupts, halt CPU)
- Access other processes’ memory
- Direct hardware I/O
- Disable CPU interrupts

Kernel Mode (Ring 0) - Full Privileges:

What’s Allowed: Everything—full hardware access
- Modify page tables and memory mappings
- Disable/enable interrupts
- Access all memory (user and kernel)
- Directly program hardware devices
- Switch between modes
- Configure privilege levels

Security Boundary Enforcement:

If a user-mode program tries to execute a privileged instruction:

user_mode_code:
    mov CR3, eax          ; Privileged instruction!
    ; CPU immediately: illegal_instruction exception
    ; Exception handler: terminate process (crash detection)

The CPU hardware enforces this—no software bypass possible.

Protected Resources:

Resource	User Mode	Kernel Mode
Own memory	✅ Read/Write	✅ Read/Write
Other process memory	❌ Blocked by MMU	✅ Full access
Page tables	❌ Cannot modify	✅ Full control
I/O devices	❌ Direct access forbidden	✅ Direct control
Interrupt enable/disable	❌ Cannot modify	✅ Full control
System clock	✅ Read via syscall	✅ Direct read

The Control Flow:

User Mode:
  ┌─ myprogram.c: open("/etc/passwd")
  │   Hits system call boundary
  │   Attempts: syscall instruction (privileged)
  │   CPU: Switches to Kernel Mode, saves state
  ▼
Kernel Mode:
  ┌─ kernel: sys_open() handler
  │   Performs privileged operations:
  │   - Checks permissions (only kernel can read security data)
  │   - Opens file
  │   - Returns file descriptor
  │   Executes: iret/sysret (privileged)
  │   CPU: Switches back to User Mode, restores state
  ▼
User Mode:
  └─ myprogram.c: continue with returned fd

Why Both Modes Are Necessary:

Performance: Kernel mode allows direct hardware access without overhead
Security: User mode prevents malicious/buggy apps from accessing privileged operations
Isolation: One app cannot crash or compromise others
System Stability: Critical operations protected from user interference

Real-World Analogy:

User mode = visitor in a hospital (can walk around public areas but not the pharmacy or surgery rooms)
Kernel mode = doctor/nurse (can access anything needed to perform critical functions)

(k) How does the OS change from the user mode to the kernel mode and vice versa?

Student Solution

Answer by Justin:

The OS changes from user mode to kernel mode when it gets a certain system interrupt (int 0x80/syscall). Once made, a bit is flipped that signifies that the system is now in kernel mode, allowing the execution of system calls. Then, the trap handler saves the application state and calls the system call function on behalf of the user. Once completed, the trap handler restores the application state and flips the bit, returning to user mode using iret/sysret!

Elaboration (by LLM):

This is an excellent answer that describes the transition at a high level. Let’s detail the specific mechanisms:

User Mode → Kernel Mode Transition:

The transition happens via special privileged instructions that only work in kernel mode but can be invoked from user mode:

Modern x86-64 Systems: syscall Instruction

; User mode code
mov rax, 1           ; syscall number: sys_write
mov rdi, 1           ; arg1: file descriptor (stdout)
mov rsi, rbx         ; arg2: buffer pointer
mov rdx, rcx         ; arg3: count
syscall              ; ← Privileged transition instruction
; CPU atomically:
;   1. Saves user RIP in RCX (return address)
;   2. Saves user RFLAGS in R11
;   3. Switches to Ring 0 (kernel mode)
;   4. Jumps to kernel entry point (in kernel's syscall_entry)

Legacy x86 Systems: int 0x80 Instruction

; User mode code
mov eax, 4           ; syscall number: sys_write
mov ebx, 1           ; arg1: fd
mov ecx, buffer      ; arg2: buf
mov edx, count       ; arg3: count
int 0x80             ; ← Software interrupt instruction
; CPU atomically:
;   1. Looks up IDT (Interrupt Descriptor Table) entry 0x80
;   2. Switches to Ring 0
;   3. Jumps to trap handler (kernel's system call entry point)

Inside the Kernel: Trap Handler

// arch/x86_64/entry/entry_64.S (Linux kernel)
.globl entry_SYSCALL_64
entry_SYSCALL_64:
    // Hardware already saved: user RIP, user RFLAGS
    // Hardware already switched to kernel stack (from TSS)
 
    push rbx            ; Save user's rbx
    push rcx            ; Save user's rcx
    push rdx            ; Save user's rdx
    push r8
    push r9
    // ... save all registers
 
    // Dispatch syscall
    mov r10, rax        ; Syscall number in rax
    cmp r10, NR_SYSCALLS
    jge bad_syscall
 
    call *syscall_table[r10*8]  ; Call sys_write() etc
 
    // rax now contains return value
    pop r9
    pop r8
    // ... restore all registers
    pop rdx
    pop rcx
    pop rbx
 
    sysret              ; Return to user mode
    // Hardware atomically:
    //   1. Restores user RIP from RCX
    //   2. Restores user RFLAGS from R11
    //   3. Switches back to Ring 3 (user mode)

Key Features of This Design:

Feature	Purpose
Atomic transition	Cannot be interrupted mid-switch
TSS (Task State Segment)	Specifies kernel stack to use for this process
State saving	Hardware saves minimal state (IP, flags), software saves registers
Exclusive instructions	`syscall`/`int` work from any mode but only kernel can use `sysret`/`iret`

Complete Example: write() System Call

┌─ User Code
│  mov rax, 1              ; syscall number
│  mov rdi, 1              ; fd = stdout
│  mov rsi, buffer         ; *buf
│  mov rdx, 10             ; count
│  syscall                 ← Transition point
│
├─ CPU Switching:
│  1. Save user RIP, RFLAGS
│  2. Load kernel stack pointer from TSS
│  3. Set privilege mode to Ring 0
│  4. Jump to syscall_entry
│
├─ Kernel Handler
│  syscall_entry:
│    push registers        ; Save user state
│    cmp rax, NR_SYSCALLS
│    call sys_write        ; Execute system call
│    pop registers         ; Restore user state
│    sysret                ← Return instruction
│
├─ CPU Switching Back:
│  1. Restore user RIP from RCX
│  2. Restore user RFLAGS from R11
│  3. Set privilege mode to Ring 3
│  4. Resume user code
│
└─ User Code Resumes
   mov rax, count         ; rax has bytes written
   cmp rax, -1
   je error_handler

Why Not Use Regular call/ret?

As the student’s answer implies, regular function calls cannot change privilege mode. syscall/int are the only instructions that:

Transition privilege levels
Load kernel stack
Save/restore control flow atomically

(l) Explain why it is not possible to use “call/ret” pair of instructions to call and return from an OS system call function in modern general-purpose OSs?

LLM Solution

Call/ret cannot be used because they don’t switch CPU privilege levels. System calls require changing from user mode to kernel mode, which requires special instructions (syscall/sysret or int/iret) that modify the CPU’s privilege bits.

Elaboration:

The reason call and ret instructions cannot be used for system calls comes down to a fundamental requirement of modern operating systems: the need to switch privilege levels from unprivileged user mode to privileged kernel mode. Regular function calls are designed for code running at the same privilege level—they just manipulate the return address and stack. They have no mechanism to elevate the privilege level of the CPU. System calls, by contrast, must cross a security boundary, which requires special hardware support. The x86 CPU provides specific privileged instructions that are designed exactly for this purpose—they are the only instructions that can change the CPU’s privilege mode.

Instruction Capabilities:

call/ret instructions only manage control flow:

Save/restore instruction pointer (EIP)
Manipulate stack pointer (ESP)
Cannot modify privilege bits or CPU mode

x86 Privilege Model:

The x86 CPU operates in privilege rings controlled by CPU flag bits:

Regular call/ret are non-privileged instructions—they cannot modify these rings.

Requirements for Mode Switching:

To transition from user mode (ring 3) to kernel mode (ring 0), the CPU must:

Save the current privilege level and return address
Set the privilege bit to kernel mode (ring 0)
Switch to the kernel stack (from TSS—Task State Segment)
Jump to the kernel entry point

Only Privileged Instructions Can Achieve This:

Modern x86: syscall / sysret
Legacy x86: int / iret

Why call Won’t Work:

In user mode, call kernel_function would jump to kernel code but remain in user mode. The kernel would immediately fault when executing privileged instructions.

Example:

; User mode
call kernel_function  ; Jumps to kernel code but stays in ring 3
; Kernel tries to execute privileged instruction → illegal instruction exception

Instead, syscall performs the entire transition atomically in one instruction.

(m) Which of the following instructions should be privileged, i.e., not allowed in user mode? (1) Read the system clock, (2) turn off interrupts, (3) switch from user to kernel mode. Briefly justify your answer.

Student Solution

Answer by Justin:

Reading the system clock should be allowed in user mode! There aren’t any harmful side effects from reading the system clock. But allowing users to switch from user to kernel mode whenever they want would be very harmful, as they would be able to execute any system call they like, and that’s why they have to use specific libraries to access system calls. Also, turning off interrupts SHOULD be privileged. Turning off interrupts would be very harmful for the system, as interrupts are vital to many of the operations/properties of the system.

Elaboration (by LLM):

This answer is fundamentally correct and demonstrates excellent security reasoning. Let’s analyze each instruction in depth:

1. Read the System Clock - ALLOW in User Mode ✅

Why It’s Safe:

No side effects: Reading the clock only retrieves a value; it doesn’t modify hardware or kernel state
Not exclusive: Multiple processes can read the clock simultaneously (read-only)
Useful for applications: Timers, timestamps, performance measurement all require reading the clock

How It Works:

Modern CPUs provide a user-readable timer register for exactly this reason:

// User mode code - allowed
uint64_t clock_nanoseconds() {
    return rdtsc();  // Read Time Stamp Counter (x86)
}

The x86 rdtsc instruction is not privileged and can run in user mode. Some CPUs also provide rdpmc (Read Performance Monitoring Counter) for user-mode profiling.

2. Turn Off Interrupts - PRIVILEGED ❌

Why It Must Be Privileged:

Interrupts are the OS’s mechanism to regain control from user programs. If user code could disable interrupts:

// Hypothetical (malicious code)
disable_interrupts();     // If this were allowed...
while(1) {}               // Infinite loop
                          // OS can never interrupt!

Consequences:

Entire system would hang/freeze
Other processes never get CPU time
Keyboard input never processed
Mouse events never handled
Disk I/O never completes
System becomes unresponsive

Why Interrupts Are Critical:

Interrupt	Event	Consequence if Disabled
Timer interrupt	Context switch timeout	No process switching - single task hangs system
Keyboard interrupt	User presses key	No keyboard input possible
Disk I/O interrupt	Disk read completes	I/O operations hang indefinitely
Network interrupt	Network packet arrives	No network communication
Exception interrupt	Illegal instruction	Process cannot be terminated

The cli Instruction (Clear Interrupts):

; Privileged instruction - only works in Ring 0
cli                    ; Clear interrupt flag (disable interrupts)
                       ; If executed in user mode → Illegal Instruction Exception

3. Switch from User to Kernel Mode - PRIVILEGED ❌

Why It Must Be Privileged:

As the student correctly states, if user code could directly switch modes:

// Hypothetical (if mode switching were allowed)
switch_to_kernel_mode();
disable_interrupts();
erase_all_files();
format_hard_drive();
// Total system compromise

The Attack Vector:

Without this restriction, privilege escalation becomes trivial:

Exploit 1: Malicious app
  1. switch_to_kernel_mode();
  2. Modify page tables to access other processes' memory
  3. Read/modify/delete data from any process

Exploit 2: Malware
  1. switch_to_kernel_mode();
  2. Disable security features
  3. Install backdoor in kernel
  4. Establish permanent system compromise

Why Control Is Necessary:

The kernel must mediate and validate all mode switches:

// Kernel's controlled entry point
syscall:
    // Hardware has already validated:
    // - We're transitioning from Ring 3 to Ring 0
    // - This is an approved transition mechanism
    // - User state is safely saved
 
    // Now kernel can validate what the user wants:
    syscall_number = get_syscall_number();
 
    if (syscall_number >= 0 && syscall_number < NR_SYSCALLS) {
        perform_syscall();      // Allowed
    } else {
        return -ENOSYS;         // Denied
    }

Without this control, any security checks become meaningless.

Summary Table:

Instruction	Allow in User Mode?	Reason
Read system clock	✅ Yes	Read-only, no side effects, useful for apps
Disable interrupts	❌ No	Would allow infinite loops, system hangs
Switch to kernel mode	❌ No	Would bypass all security and privilege checks

The Principle:

Privilege separation is based on capability:

Safe operations (read-only, no system state modification) → allowed
Dangerous operations (system state changes, privilege escalation) → privileged only

(n) What’s the difference between a trap and an interrupt?

Student Solution

Answer by Justin:

A trap involves receiving the specific interrupt (0x80, syscall) to switch to kernel mode, while an interrupt in general can be anything from the Interrupt Vector Table (IVT) that all don’t necessarily involve trapping to the kernel. Many involve exceptions, I/O, and more!

Elaboration (by LLM):

The student’s answer captures the distinction but uses overlapping terminology. Let’s clarify the precise definitions:

Interrupt Vector Table (IVT) / Interrupt Descriptor Table (IDT):

The CPU has a table (IDT on x86) of interrupt handlers indexed by vector number:

// linux/arch/x86/kernel/idt.c
struct idt_data idt_table[] = {
    INTG(0,   divide_error),           // Vector 0
    INTG(1,   debug),                  // Vector 1
    INTG(2,   nmi),                    // Vector 2
    INTG(3,   int3),                   // Vector 3 - breakpoint trap
    ...
    INTG(0x80, ia32_syscall),          // Vector 128 - System call
    INTR(32,  irq0),                   // Vector 32 - Timer interrupt
    INTR(33,  irq1),                   // Vector 33 - Keyboard interrupt
    ...
};

All of these are technically “exceptions” in CPU terminology, but software distinguishes them:

Terminology Clarification:

1. Interrupt (Hardware-Triggered) 🔔

Caused by external hardware events:

External Event Occurs
    ↓
Hardware signals CPU via interrupt line
    ↓
CPU stops current instruction
    ↓
CPU looks up IDT[interrupt_vector]
    ↓
Jumps to handler

Examples:

Timer interrupt (every 10ms on modern systems)
Keyboard interrupt (key pressed)
Disk I/O completion interrupt
Network packet arrival interrupt
Power failure interrupt

Implementation:

// Hardware interrupt handler
irq_handler_keyboard:
    save_user_state();
    read_keyboard_buffer();
    wake_terminal_process();
    restore_user_state();
    iret;  // Return to user code

Key Characteristic: Asynchronous - can occur at any time, even between instructions

2. Trap / Exception (Software or Hardware-Triggered) ⚠️

Caused by instruction execution or unusual conditions:

Instruction Executes
    ↓
Abnormal condition detected (by CPU or software)
    ↓
CPU raises exception
    ↓
CPU looks up IDT[exception_vector]
    ↓
Jumps to handler

Examples:

System call (syscall / int 0x80) - intentionally trap to kernel
Divide by zero exception
Page fault (memory access to unmapped page)
Illegal instruction exception
Breakpoint trap (debugging)
General protection fault

Implementation:

// Trap/exception handler
syscall_entry:
    save_user_state();
    dispatch_syscall();      // Process system call
    restore_user_state();
    iret;  // Return to user code
 
page_fault_handler:
    save_user_state();
    allocate_page();         // Handle missing memory
    restore_user_state();
    iret;  // Retry faulting instruction

Key Characteristic: Synchronous - occurs as a direct result of instruction execution

Visual Comparison:

Aspect	Interrupt	Trap/Exception
Trigger	Hardware event (external)	Instruction execution or CPU condition
Timing	Asynchronous (any time)	Synchronous (tied to instruction)
Predictability	Unpredictable	Predictable (code controls)
Return Address	Point after interrupted instruction	Point at or after faulting instruction
Examples	Timer, keyboard, disk, network	Syscall, page fault, divide-by-zero
Purpose	Notify CPU of external event	Handle error or request kernel service

The Trap Subtype (System Call):

A trap is specifically used for system calls in modern operating systems:

// User code
int retval = write(1, buffer, size);
 
// Gets compiled to:
mov rax, 1              ; syscall number
mov rdi, 1              ; fd
mov rsi, buffer         ; buf
mov rdx, size           ; count
syscall                 ; ← Intentional trap to request kernel service
 
// Kernel receives control, validates request, executes
// Syscall = structured trap for controlled kernel entry

Why Distinguish Interrupt vs Trap?

Understanding the difference is crucial for:

Interrupt handlers: Must be reentrant (handle nested interrupts) and very fast

irq_handler {
    // MUST be fast - no sleep, no lock contention
    // Can be interrupted by higher-priority IRQ
}

Trap handlers: Can be slower, blocking, sleepable

syscall_handler {
    // Can sleep, acquire locks, allocate memory
    // Not interrupted (runs to completion for same task)
}

Summary:

The student’s answer correctly notes that system calls (traps) are a subset of the broader interrupt/exception mechanism, but the key distinction is synchronicity: interrupts are asynchronous hardware events, while traps are synchronous instruction-triggered events (including intentional syscalls).

(o) Describe 2 methods by which arguments are passed from a user process to the kernel in system calls. How does Linux pass arguments to the kernel during a system call?

Student Solution

Answer by Justin:

One way arguments are passed from a user process to the kernel in system calls is using registers such as ebx, ecx, edx, etc. so that the values can be accessed while in the kernel. Another way is by passing them to the kernel stack. The method Linux uses is registers, but it only has 6 registers it can use for this. If there are over 6 parameters, Linux puts the first 6 in registers and the rest on the stack.

Elaboration (by LLM):

This answer is correct and complete. Let’s detail the mechanisms and the rationale:

Method 1: Register-Based Argument Passing 📝

Arguments are placed in CPU registers before invoking the syscall instruction:

x86-64 (Modern 64-bit Linux):

// User-mode C code
ssize_t written = write(1, buffer, 100);
 
// Compiler generates assembly:
mov rax, 1              ; Syscall #1 = sys_write
mov rdi, 1              ; arg1 (fd) in RDI
mov rsi, buffer         ; arg2 (buf) in RSI
mov rdx, 100            ; arg3 (count) in RDX
syscall                 ; Invoke syscall
 
// Control passes to kernel with args in registers
// Kernel handler reads from RDI, RSI, RDX directly

x86-32 (32-bit Legacy Linux):

mov eax, 4              ; Syscall #4 = sys_write
mov ebx, 1              ; arg1 (fd)
mov ecx, buffer         ; arg2 (buf)
mov edx, 100            ; arg3 (count)
int 0x80                ; Invoke syscall

Advantages of Register-Based:

✅ Very fast: No memory access needed
✅ Few instructions: Direct register load
✅ Cache-friendly: No stack operations
✅ Atomic: All args available to kernel simultaneously

Method 2: Stack-Based Argument Passing 📚

For syscalls with more than 6 arguments, additional args go on the user stack:

Scenario: System call with 8 arguments

// Function with 8 parameters
long syscall_with_many_args(
    int arg1, int arg2, int arg3, int arg4,  // Registers
    int arg5, int arg6,                       // Registers
    int arg7, int arg8                        // Stack
);

Stack Layout:

High Addresses
    ┌─────────────────┐
    │   arg8 (8th)    │  ← rsp + 8
    ├─────────────────┤
    │   arg7 (7th)    │  ← rsp + 0
    ├─────────────────┤
    │  Return Address │
    └─────────────────┘
Low Addresses

Register Allocation:

// Kernel side syscall entry
kernel_handler:
    arg1 = rdi;         // First 6 args from registers
    arg2 = rsi;
    arg3 = rdx;
    arg4 = rcx;
    arg5 = r8;
    arg6 = r9;
 
    arg7 = *(user_sp + 0);   // Remaining from stack
    arg8 = *(user_sp + 8);

Why This Hybrid Approach?

The register-only approach has a hardware limit: most CPUs only have ~15 general-purpose registers, and not all can be used for arguments (some needed for instruction pointer, stack pointer, etc).

Performance Analysis:

Method	Latency	Cache	# Args
Registers	~1-2 cycles	L1 cache hit	1-6
Stack	~4-10 cycles	L1-L3 cache miss possible	7+

For the common case (≤6 arguments), register passing is 4-5x faster than stack access.

Real-World Syscall Examples:

// 1 argument - register only
kill(pid_t pid, int sig)
// mov rax, 62          ; sys_kill
// mov rdi, pid         ; arg1
// mov rsi, sig         ; arg2
// syscall
 
// 6 arguments - all registers
mmap(addr, len, prot, flags, fd, off)
// mov rax, 9           ; sys_mmap
// mov rdi, addr        ; arg1 (registers)
// mov rsi, len         ; arg2
// mov rdx, prot        ; arg3
// mov r10, flags       ; arg4 (note: r10, not rcx!)
// mov r8, fd           ; arg5
// mov r9, off          ; arg6
// syscall
 
// 7 arguments - 6 registers + 1 stack
// (Less common, but example: sched_setattr)
// First 6 same as above
// 7th argument pushed on stack before syscall

Why Not Just Use Stack?

Reason 1: Performance

Stack access requires memory operation (~100+ cycles vs ~1 cycle for register)
Cache pollution from frequent stack accesses
Cross-CPU cache coherency issues

Reason 2: Security

Stack is in user-writable memory
Kernel needs to validate stack pointer before reading args from stack
Registers are CPU internal - automatically trustworthy

Reason 3: ABI Convention

All modern ABIs (x86-64 SysV ABI) standardize register usage
Allows compiler and kernel to have implicit agreement
No need to encode argument count in syscall

Modern x86-64 System Call Calling Convention:

Register	Purpose
RAX	Syscall number (in) / Return value (out)
RDI	Argument 1 (fd, address, etc.)
RSI	Argument 2 (buffer, length, etc.)
RDX	Argument 3 (count, prot, etc.)
R10	Argument 4 (flags) - Note: not RCX!
R8	Argument 5
R9	Argument 6
Stack	Arguments 7+ (very rare)

Note: R10 is used instead of RCX because RCX is clobbered by syscall instruction

(p) List the names of 5 system calls in Linux.

Student Solution

Answer by Justin:

sys_read
sys_write
sys_open
sys_close
sys_exit

Elaboration (by LLM):

This is a correct, fundamental set of system calls. Let’s understand what each does and why they’re essential:

Essential I/O System Calls:

1. sys_read - Read from file descriptor

// User code
ssize_t bytes_read = read(fd, buffer, count);
 
// Kernel implements:
ssize_t sys_read(unsigned int fd, char *buf, size_t count) {
    // 1. Validate file descriptor (is it open?)
    // 2. Check permissions (can this process read?)
    // 3. Perform I/O from disk/device
    // 4. Copy data to user buffer via copy_to_user()
    // 5. Return number of bytes read
}

Use Cases: Reading from files, pipes, sockets, devices

2. sys_write - Write to file descriptor

// User code
ssize_t bytes_written = write(fd, buffer, count);
 
// Kernel implements:
ssize_t sys_write(unsigned int fd, const char *buf, size_t count) {
    // 1. Validate file descriptor
    // 2. Check write permissions
    // 3. Copy data from user buffer via copy_from_user()
    // 4. Perform I/O to disk/device
    // 5. Return number of bytes written
}

Use Cases: Writing to files, terminal output (printf), network sockets

3. sys_open - Open a file

// User code
int fd = open("/etc/passwd", O_RDONLY);
 
// Kernel implements:
int sys_open(const char *filename, int flags, umode_t mode) {
    // 1. Validate filename (safe from user space)
    // 2. Check file permissions (can this process access it?)
    // 3. Perform security checks (SELinux, etc.)
    // 4. Create file descriptor entry in process table
    // 5. Return file descriptor (3, 4, 5, ... small integer)
}

Use Cases: Opening files for reading/writing, creating new files

4. sys_close - Close a file descriptor

// User code
int result = close(fd);
 
// Kernel implements:
int sys_close(unsigned int fd) {
    // 1. Validate file descriptor
    // 2. Flush any pending writes (if buffered)
    // 3. Release inode reference
    // 4. Release file descriptor entry
    // 5. Return 0 on success
}

Use Cases: Releasing file resources, cleanup

5. sys_exit - Terminate process

// User code
exit(0);  // or return 0 from main()
 
// Kernel implements:
void sys_exit(int exit_code) {
    // 1. Set process exit status
    // 2. Close all open file descriptors
    // 3. Release allocated memory (virtual memory areas)
    // 4. Send termination signal to parent process
    // 5. Set process state to ZOMBIE (awaiting reaping)
    // 6. Schedule process for removal
    // 7. Never returns to user code
}

Use Cases: Program termination, cleanup

Why These 5 Are Fundamental:

Together, these form the minimal I/O interface:

┌─────────────────────────────────────────────┐
│ Application Program                          │
├─────────────────────────────────────────────┤
│ open()      - Acquire resource              │
│ read()      - Get data                       │
│ write()     - Send data                      │
│ close()     - Release resource               │
│ exit()      - Terminate execution            │
├─────────────────────────────────────────────┤
│ Kernel / File System                         │
└─────────────────────────────────────────────┘

Real-World Example: Copying a File

int main() {
    // open() - get input file handle
    int input_fd = open("source.txt", O_RDONLY);
 
    // open() - create output file handle
    int output_fd = open("dest.txt", O_CREAT | O_WRONLY);
 
    char buffer[4096];
    ssize_t bytes_read;
 
    while(1) {
        // read() - get data from input
        bytes_read = read(input_fd, buffer, 4096);
 
        if (bytes_read == 0) break;      // EOF
        if (bytes_read < 0) exit(1);    // Error - exit()
 
        // write() - send data to output
        ssize_t bytes_written = write(output_fd, buffer, bytes_read);
 
        if (bytes_written < 0) exit(1);  // Error - exit()
    }
 
    // close() - release resources
    close(input_fd);
    close(output_fd);
 
    // exit() - terminate program
    exit(0);  // Success
}

Complete System Call List (Partial - Linux has 400+):

Category	Examples
File I/O	open, close, read, write, lseek, fsync
Process	fork, exec, exit, wait, kill, getpid
Memory	mmap, munmap, brk, sbrk
File System	mkdir, rmdir, unlink, rename, chmod
Networking	socket, connect, listen, accept, send, recv
Signals	signal, sigaction, sigprocmask
IPC	pipe, msgget, semget, shmget
Time	gettimeofday, clock_gettime

How to List All System Calls:

# View syscall table
less /proc/sys/kernel/syscall_table    # (if available)
 
# Count syscalls
grep "^SYSCALL" arch/x86/entry/syscalls/syscall_64.tbl | wc -l
 
# Recent Linux kernels have 400+ syscalls
man 2 syscalls  # View man page listing all syscalls

(q) Briefly explain why it is necessary to switch to a kernel stack allocated for the process during the execution of a system call?

Student Solution

Answer by Justin:

The kernel must use a “trusted stack” stored in the kernel address space in order to complete the execution of privileged system calls.

Elaboration (by LLM):

This answer is concise and correct. Let’s understand the deep reasons why this is architecturally necessary:

The Problem: User Stack is Untrusted

When a user process makes a system call, the kernel cannot trust the user-mode stack pointer (RSP):

Scenario 1: Malicious User Code

// Malicious user code
void *fake_stack = malloc(1000);  // Allocate attacker's memory
rsp = fake_stack;                 // Switch stack pointer
asm("syscall");                   // Invoke system call
 
// Now kernel runs with user-controlled stack:
// - Can manipulate return addresses
// - Can inject code
// - Can leak kernel memory
// - Total security compromise!

Scenario 2: Accidental Stack Corruption

Even without malice, the user stack might be:

Unmapped (segmentation fault waiting to happen)
Protected (read-only memory page)
Nearly full (stack overflow imminent)
In user-visible memory (privacy concerns for kernel data)

Solution: Kernel Stack per Process

Every process has TWO stacks:

Process Structure (in kernel memory):
┌─────────────────────────────┐
│    Task Structure (PCB)      │
│  - PID, state, priority      │
│  - Memory mappings (page tbl)│
│  - Open file descriptors     │
│  - **kernel_stack pointer** ←┤ (points to kernel stack)
│  - User stack pointer saved  │
└─────────────────────────────┘
         ↓
    ┌────────────────┐
    │  Kernel Stack  │  (8 KB typically)
    │  (in kernel    │  Protected: kernel-only access
    │   address      │  Trusted: allocated by kernel
    │   space)       │  Safe: isolated from user memory
    └────────────────┘

Process Virtual Address Space:
┌──────────────────────────┐
│ Kernel Address Space     │ ← Only accessible in kernel mode
│ (above 0xFFFF800000...) │
│ - Kernel stack (THIS)   │ ✅ SAFE
│ - Kernel data           │
└──────────────────────────┘
         ↑↑↑ Security boundary ↑↑↑
┌──────────────────────────┐
│ User Address Space       │ ← Accessible in user mode
│ - Code, data, BSS       │
│ - Heap                  │
│ - User Stack (untrusted)│ ❌ UNTRUSTED
└──────────────────────────┘

What Happens During System Call:

┌─────────────────────────────────────────────┐
│ User Mode Execution                         │
│ RSP points to user stack (in user space)    │
│ RBP points to user stack frame              │
└─────────────────────────────────────────────┘
         │
         │ syscall instruction
         ↓
┌─────────────────────────────────────────────┐
│ Kernel Mode Transition (Hardware)           │
│ 1. Save user RSP to kernel PCB              │
│ 2. Load kernel stack pointer from PCB       │
│    RSP ← kernel_stack_base + 8KB            │
│ 3. Save user state on kernel stack          │
│ 4. Jump to syscall handler                  │
└─────────────────────────────────────────────┘
         │
         ↓
┌─────────────────────────────────────────────┐
│ Kernel Handler Execution                    │
│ - RSP points to kernel stack (trusted)      │
│ - Push registers safely                     │
│ - Call functions with kernel stack frames   │
│ - Access kernel data structures             │
│ - Validate user arguments from user space   │
│ - Copy data safely with copy_to_user()      │
└─────────────────────────────────────────────┘
         │
         │ sysret/iret instruction
         ↓
┌─────────────────────────────────────────────┐
│ Return to User Mode (Hardware)              │
│ 1. Restore user RSP from kernel PCB         │
│ 2. Restore all user registers               │
│ 3. Switch back to user mode (ring 3)        │
│ 4. Jump to return address                   │
└─────────────────────────────────────────────┘
         │
         ↓
┌─────────────────────────────────────────────┐
│ User Mode Execution Resumes                 │
│ RSP back to user stack                      │
└─────────────────────────────────────────────┘

Why This Architecture Is Necessary:

1. Security (Stack Integrity)

// Kernel code in syscall handler
int arg1 = *((int*)rsp + 8);  // Read from stack
 
// If this were user stack:
// - User could have modified it while kernel reads
// - Classic TOCTOU race condition (Time-of-Check-Time-of-Use)
 
// With kernel stack:
// - Only kernel code can access it
// - Atomic with respect to user code

2. Isolation (Memory Protection)

// What happens if user stack overflows into data?
int user_array[100];
void recursive_call() {
    recursive_call();  // Infinite recursion
    // Stack grows and corrupts memory
}
 
// Kernel stack is separate:
// - User stack overflow doesn't affect kernel
// - Kernel operations continue safely
// - Process can be terminated without endangering OS

3. Privilege Escalation Prevention

// Attacker attempts privilege escalation
void *kernel_code = malloc_fake_kernel_code();
rsp = kernel_code;  // Set malicious stack
asm("syscall");
 
// Fails because:
// - CPU hardware switches to kernel stack (from TSS)
// - RSP is overwritten with kernel_stack_base
// - Malicious pointer ignored
// - Attacker gains nothing

4. Kernel Data Protection

Kernel must safely access data:

// Kernel receiving syscall arguments
struct myargs {
    void *user_buffer;
    size_t size;
};
 
// Kernel validates and copies:
kernel_buffer = malloc(args.size);
copy_from_user(kernel_buffer, args.user_buffer, args.size);
 
// This validation only works if kernel has its own stack
// to safely track what's being validated

Kernel Stack Size Limitations:

Kernel stacks are typically small (4-8 KB on x86-64):

Why small?
- Every process gets a kernel stack
- 1000 processes = 1000 * 8 KB = 8 MB just for stacks
- Must be contiguous in physical memory
- Trade-off between safety and resource usage

Implications:
- Recursive syscalls limited
- Kernel code must be careful with stack allocation
- No large local arrays in syscall handlers

Real Kernel Implementation (Linux x86-64):

// arch/x86/kernel/process.c
static inline void setup_new_exec(struct linux_binprm *bprm) {
    // When new process created, kernel allocates:
    task_struct = kmalloc(sizeof(struct task_struct));
 
    // Two stacks allocated:
    task_struct->user_stack_base = user_space_base;      // In user VA space
    task_struct->kernel_stack = kmalloc(8192);           // In kernel memory
 
    // Saved in Task State Segment (TSS) for CPU access:
    tss.rsp0 = task_struct->kernel_stack + 8192;  // Point to top of kernel stack
}
 
// During syscall, CPU automatically uses:
// RSP ← TSS.rsp0  (kernel stack pointer)

Summary:

The kernel stack is essential for security and correctness because:

Trust boundary: Kernel cannot trust user-supplied stack pointer
Isolation: Separates user and kernel memory access
Atomicity: Prevents TOCTOU race conditions
Protection: Prevents user code from corrupting kernel data
Transparency: User code cannot interfere with kernel execution

Without a dedicated kernel stack, the entire security model of modern operating systems would collapse.

(r) List 1 advantage and 1 disadvantage of monolithic kernels.

Student Solution

Answer by Justin:

One advantage is that the cost of module interactions is low. A disadvantage is that they are large and hard to maintain/change.

Elaboration (by LLM):

This answer captures the fundamental trade-off of monolithic kernel design. Let’s explore the architecture and implications:

What is a Monolithic Kernel?

A monolithic kernel is one where all OS services run in kernel mode within a single address space:

Monolithic Kernel Architecture:

User Space
┌──────────────────────────────────┐
│ Applications (user mode, ring 3)  │
└──────────────────────────────────┘
         Privilege Boundary
┌──────────────────────────────────┐
│    Kernel Space (ring 0)          │
│ ┌────────────────────────────────┤
│ │ - Process management            │
│ │ - Memory management             │
│ │ - File system                   │
│ │ - Networking                    │
│ │ - Device drivers                │
│ │ - IPC mechanisms                │
│ │ ALL in one address space        │
│ │ ALL with kernel privileges      │
│ └────────────────────────────────┤
└──────────────────────────────────┘

Examples: Linux, Unix, Windows NT kernel, macOS kernel

Advantage: Low Module Interaction Cost ✅

Why It’s Fast:

When different kernel subsystems need to communicate:

// File system needs to allocate memory
page = memory_manager_allocate_page();
 
// This is just a function call in the same address space
// Cost: ~2-5 cycles (L1 cache hit)

Compare to Microkernel:

// Microkernel architecture (file system in separate process)
message = create_message(ALLOCATE_PAGE, size);
send_message_to_memory_server(message);
wait_for_response();
page = extract_page_from_response(message);
 
// Cost: 1000s of cycles (context switches, IPC overhead)

Performance Data:

Operation	Monolithic	Microkernel	Overhead
Memory allocation syscall	100-200 cycles	2000-5000 cycles	10-50x
File read	500-1000 cycles	5000-10000 cycles	5-20x
IPC message	100-200 cycles	500-1000 cycles	5-10x

Why Monolithic is Faster:

No mode transitions needed between kernel subsystems
Direct function calls instead of message passing
Shared data structures in kernel memory
Cache-friendly - all kernel code in same memory space
No context switching overhead between kernel components

Disadvantage: Large and Hard to Maintain ❌

Size Problem:

Modern Linux kernel:

# Linux kernel statistics
Lines of code: ~25+ million (2023)
Modules: 200+ major subsystems
Drivers: 1000+ device drivers
Binary size: 20-50 MB on disk
 
# Windows NT kernel
Lines of code: ~40 million
Binary size: 100+ MB
 
# macOS kernel (XNU, partially monolithic)
Lines of code: ~15 million

Maintainability Issues:

// Problem 1: Tangled Dependencies
int memory_manager_function() {
    // Needs to call file system
    cache_flush();
    // Needs to call networking
    socket_send_data();
    // Needs to call device drivers
    interrupt_handler();
    // Circular dependencies become impossible to track
}
 
// Problem 2: One Bug Crashes Everything
char *buffer = malloc(size);
strcpy(buffer, source);  // Buffer overflow in device driver
// Entire kernel memory corrupted
// System crash, not just one device disabled
 
// Problem 3: Testing Difficulty
// Cannot test file system in isolation
// Must boot entire kernel with all 200+ subsystems
// Each test takes 30+ seconds
// Makes debugging extremely slow

Architectural Complexity:

The larger the kernel, the harder to understand subsystem interactions:

Module Dependency Graph:

VFS ←→ Block layer ←→ Device drivers ←→ Memory manager
 ↓         ↓             ↓                  ↓
Inode     Buffer      Interrupt      Page table
cache     cache       handler        walker
 ↓         ↓             ↓                  ↓
Proc FS ← ext4 ← SATA driver ←→ MMU
          ↓
        Networking stack (TCP/IP)
        which calls back into VFS

This web of dependencies makes it nearly impossible to modify one subsystem without affecting others.

Real-World Maintenance Cost:

Monolithic kernel bug example:
- Bug report: Memory corruption in ext4 filesystem
- Investigation: Bug is actually in page replacement algorithm
- Impact: Need to test all drivers that use page replacement
- Time: Weeks of debugging, multiple regressions

Microkernel bug example:
- Bug report: Memory corruption in ext4 server
- Investigation: Bug is in ext4 only
- Impact: Only ext4 server affected
- Time: Hours to fix, easy to regression test

Security Implications:

Monolithic kernel disadvantage: Security surface is huge

// Any vulnerable kernel subsystem can compromise everything
vulnerable_driver() {
    *(int*)0xDEADBEEF = 0;  // Arbitrary write
}
 
// Attacker can:
// 1. Exploit driver vulnerability
// 2. Write to kernel memory
// 3. Modify page tables
// 4. Compromise entire system
// 5. Cannot be isolated

Summary:

Aspect	Monolithic	Benefit/Cost
Performance	Very fast	✅ Advantage
Function calls	In-process	✅ Advantage (2-5 cycles)
IPC cost	N/A - direct calls	✅ Advantage
Code size	20-50 million LoC	❌ Disadvantage
Maintainability	Difficult	❌ Disadvantage
Testing	Slow (full kernel)	❌ Disadvantage
Robustness	One bug crashes all	❌ Disadvantage
Security	Large attack surface	❌ Disadvantage

Modern Trend:

Linux is moving toward hybrid approach:

Core kernel functions in monolithic core (fast)
Device drivers as loadable modules (easier to update)
Some subsystems can be disabled (reduce attack surface)

(s) List 1 advantage and 1 disadvantage of microkernels.

Student Solution

Answer by Justin:

One advantage is that it is easier to maintain/change, but one disadvantage is that it has a small kernel, which leads to poor performance

Elaboration (by LLM):

This answer captures the inverse trade-off of microkernel design compared to monolithic kernels. Let’s explore:

What is a Microkernel?

A microkernel provides only the absolute minimum in kernel space, moving services to user-mode servers:

Microkernel Architecture:

User Space
┌──────────────────────────────────┐
│  File Server     Memory Server    │
│  Device Server   Network Server   │
│  IPC Server      (user mode, ring 3)
│  Applications                     │
└──────────────────────────────────┘
         IPC (Messages)
┌──────────────────────────────────┐
│   Microkernel (ring 0)            │
│ ┌────────────────────────────────┤
│ │ - Thread management            │
│ │ - IPC (message passing)        │
│ │ - Basic interrupt handling     │
│ │ - Memory protection (MMU)      │
│ │ ~10,000-50,000 lines of code   │
│ └────────────────────────────────┤
└──────────────────────────────────┘

Examples: QNX, MINIX, seL4, Mach (Apple’s XNU uses hybrid)

Advantage: Easier to Maintain and Change ✅

Why It’s Easier:

The kernel is drastically smaller, making it understandable:

// Microkernel approach: File system as separate server
// File system bug: Only file_server process affected
struct microkernel_vfs {
    // 5000 lines of code for VFS
    // Can be compiled, tested, deployed independently
    // Bug doesn't crash kernel
    // Can be updated without rebooting
}
 
// Monolithic approach: File system in kernel
// File system bug: Entire kernel corrupted
// Requires reboot to deploy fix

Code Size Comparison:

Component	Monolithic (Linux)	Microkernel
Core kernel	5+ million LoC	20-50 K LoC
File system	In kernel	User server
Network stack	In kernel	User server
Device drivers	In kernel	User servers
Total OS	Monolithic	Distributed

Advantages of Small Kernel:

1. Verification & Formal Proof
   Microkernel: Possible to formally verify
   seL4 microkernel: Formally proven correct (500K+ person-hours)

2. Security Isolation
   Microkernel: One driver crash → only that driver dies
   Monolithic: One driver crash → entire kernel dies

3. Modularity
   Microkernel: Change file system → recompile/redeploy file server
   Monolithic: Change file system → recompile/reboot entire kernel

4. Testability
   Microkernel: Test file server in isolation
   Monolithic: Must boot entire kernel for any test

5. Reliability
   Microkernel: Restart failed service without rebooting
   Monolithic: Service failure = system failure

Real-World Example: QNX (Microkernel OS)

# QNX system architecture
Process A ←(IPC)→ File Server ←(IPC)→ Disk Driver
          ↓
    Microkernel
          ↑
Process B ←(IPC)→ Network Server ←(IPC)→ Network Driver

Disadvantage: Poor Performance ❌

The IPC Overhead Problem:

Every kernel service requires inter-process communication (message passing):

// User code wants to read file
ssize_t n = read(fd, buffer, 1024);
 
// What actually happens:
1. User process creates message struct
2. Copies user buffer pointer into message
3. Calls microkernel (syscall) to send IPC message
4. Microkernel context-switches to file server
5. File server reads message from IPC queue
6. File server performs disk I/O
7. File server sends response message
8. Microkernel context-switches back to user
9. User process unpacks response message
10. User process copies data to buffer
 
// Total cost: 1000s of cycles vs 100-200 cycles in monolithic

Performance Measurements:

# System call cost (lower is better)
Monolithic Linux:     100-200 cycles
Microkernel (QNX):   1000-5000 cycles (5-50x slower!)
 
# File read latency
Monolithic Linux:     500-1000 cycles
Microkernel (QNX):   3000-10000 cycles
 
# Context switch cost
Both: ~1000 cycles (but microkernel needs MORE switches)

Why Microkernels Are Slower:

Cost Component	Monolithic	Microkernel	Difference
Syscall entry	10 cycles	10 cycles	Same
Function call to filesystem	2-5 cycles	N/A	N/A
IPC send	N/A	100-200 cycles	+100-200
Context switch	N/A	1000 cycles	+1000
IPC receive	N/A	100-200 cycles	+100-200
Kernel processing	50-100 cycles	50-100 cycles	Same
Total	100-200 cycles	1200-1500 cycles	10-15x slower

Real-World Performance Impact:

// Reading 1MB file with 1024-byte reads
iterations = 1000;
monolithic_time = 1000 * 200 = 200,000 cycles = 1ms (at 2GHz)
microkernel_time = 1000 * 1500 = 1,500,000 cycles = 7.5ms
 
// Microkernel is 7.5x slower for this workload!

When Does Microkernel Performance Matter?

Matters (High-Performance Systems):

❌ Database servers (needs fast I/O)
❌ Real-time systems with timing requirements
❌ High-frequency trading (sub-millisecond latency)
❌ Video encoding/processing

Doesn’t Matter (Tolerant Systems):

✅ Embedded systems with simple workloads
✅ Phones (user-facing latency is human-scale: 10+ ms)
✅ IoT devices (not real-time)
✅ High-reliability systems (correctness > performance)

Can Microkernel Performance Be Improved?

Yes, with optimizations:

// Optimization 1: Shared Memory Buffers (Avoid Copying)
// Instead of copying data via IPC message:
shared_buffer = allocate_shared_memory(file_server);
file_server->read_into_buffer(fd, shared_buffer);
// Cost: ~200 cycles (much better)
 
// Optimization 2: Kernel-Assisted IPC
// Microkernel accelerates common IPC patterns
// Cost: ~400-600 cycles (better but still slower)
 
// Optimization 3: Caching
// File server caches recently-read data
// Cost: ~50 cycles (cache hit, no IPC needed)

Modern Hybrid Approaches:

Most modern OS use hybrid architectures:

Linux:        Mostly monolithic + loadable drivers (compromise)
Windows:      Monolithic core + driver isolation
macOS (XNU):  Hybrid (Mach microkernel + BSD monolithic)
QNX:          Pure microkernel (reliability > performance)

Summary:

Aspect	Microkernel	Benefit/Cost
Kernel size	20-50K LoC	✅ Advantage
Maintainability	Very easy	✅ Advantage
Modularity	Excellent	✅ Advantage
Security	Excellent isolation	✅ Advantage
Robustness	One crash ≠ system crash	✅ Advantage
Performance	5-50x slower	❌ Disadvantage
IPC overhead	1000s of cycles	❌ Disadvantage
Cache efficiency	Poor (context switching)	❌ Disadvantage
Developer complexity	Higher (distributed)	❌ Disadvantage

The Trade-off:

If you value: Correctness, Security, Maintainability → Microkernel
If you value: Performance, Throughput, Speed → Monolithic

Modern trend: Monolithic kernels with privilege separation (e.g., Linux with seccomp, Landlock) provide many microkernel benefits without the performance penalty.

2. Hypothetical Linux System Call

Consider the hypothetical Linux system call int add(int a, int b, int *c), which computes the sum of a and b in the kernel and copies the result into the buffer pointed to by c. The system call returns 0 if there is no error, -1 in the case of an error. Further assume that Linux runs on a 32-bit x86 CPU.

(a) Describe the set of actions that occur from the time the user invokes this system call until the system call returns. Specifically, describe in detail how the user program traps to the kernel, how the parameters a, b and c are passed to the kernel, and the set of actions that the Linux kernel takes to execute the system call and return to the user code.

LLM Solution

(1) User invokes syscall, passing a and b in registers (eax, ebx, ecx), pointer c in edx. (2) Syscall instruction traps to kernel, switching to kernel mode. (3) Kernel trap handler saves user state (registers, return address). (4) Kernel executes add: computes sum and uses copy_to_user() to write result at address c. (5) Kernel returns 0 or -1 in eax. (6) sysret/iret returns to user mode, restoring state.

Elaboration:

The journey from a user program calling a system call to the kernel executing it and returning involves a carefully choreographed sequence of steps. The CPU must transition from user mode to kernel mode, the user’s execution context must be preserved so it can be restored later, the kernel must extract the arguments that were passed, execute the requested operation, and then safely return to user mode. Each step has a specific purpose—some are required for correctness (preserving state), some are required for security (validating pointers), and some are required for functionality (actually performing the computation). Let’s walk through this process in detail.

Phase 1: User Mode Setup

The calling program sets up parameters in registers:

eax ← syscall number (for add)
ebx ← a
ecx ← b
edx ← pointer to c

Phase 2: Trap to Kernel

The syscall instruction (or int 0x80 on older x86) transitions to kernel mode:

CPU saves instruction pointer (RIP/EIP)
Privilege bit switched to ring 0 (kernel mode)
Stack switched to kernel stack (via TSS—Task State Segment)

Phase 3: Exception Handler & State Preservation

The kernel’s trap handler (entry point from IDT—Interrupt Descriptor Table):

Saves entire user state on kernel stack:
- All general-purpose registers (EAX, EBX, ECX, …)
- Stack pointer (ESP)
- CPU flags (EFLAGS)
- Return address

Phase 4: Syscall Dispatch

The kernel:

Reads syscall number from EAX
Looks up in syscall table:
Invokes handler function sys_add(a, b, c)

Phase 5: Kernel Execution

Inside sys_add():

sum = a + b  // Kernel space computation
copy_to_user(c, &sum, sizeof(int))  // Safe copy to user space

The copy_to_user() function:

Validates that pointer c is in user address space
Performs boundary checking
Returns error (-1) if c is invalid; success (0) otherwise

Phase 6: Return Value Setup

EAX ← 0 (success) or -1 (error)

Phase 7: Return to User Mode

sysret/iret instruction:

Restores user state from kernel stack
Switches back to ring 3 (user mode)
Returns control to user code at next instruction after syscall

Execution Example:

int result;
add(5, 3, &result);  // User code

Flow:

User mode loads: EAX=add_syscall#, EBX=5, ECX=3, EDX=&result
syscall → kernel mode transition
Kernel computes:
Kernel executes: copy_to_user(&result, 8)
Returns EAX=0, transitions back to user mode
User variable result now equals 8

(b) Under what circumstances would the system call return an error, i.e., returns -1.

LLM Solution

The syscall returns -1 when the pointer c is invalid: it points outside the user’s address space (above TASK_SIZE), into kernel memory, or to a non-allocated page. The kernel protects itself using: (1) boundary checks in copy_to_user() that validate the address range; (2) privilege-level protection (kernel memory is ring-0-only); (3) page table validation to detect unmapped memory; (4) fault handlers that catch invalid accesses.

Elaboration:

The kernel must be extremely careful when a user program passes a pointer into the kernel. The kernel cannot blindly trust that the pointer points to valid memory—a malicious program could pass an invalid pointer to trick the kernel into accessing memory it shouldn’t, or a buggy program could accidentally pass a garbage pointer. To prevent this, the kernel uses multiple overlapping layers of protection. The first line of defense is to check that the pointer is within the valid user address space. If that passes, the kernel checks whether the actual memory pages are allocated. Even if both of those checks pass, the kernel uses special copy functions that can handle page faults gracefully. Let’s examine each of these protection mechanisms.

1. Address Space Boundary Validation

copy_to_user() first validates that destination address is within valid user space:

Typical value for 32-bit systems:

If this check fails, the function returns without attempting any copy.

2. Privilege-Level Hardware Protection

Even though syscall runs in kernel mode (ring 0):

Kernel cannot blindly dereference user pointers
CPU hardware prevents ring-3 code from accessing kernel memory
Must use safe APIs like copy_to_user() to bridge address spaces

3. Page Table Validation

copy_to_user() uses mechanisms like get_user_pages() to:

Walk the process’s page tables
Verify memory pages are allocated and present
If page not in page table (unmapped), returns error

4. Fault Handler Catchall

If validation somehow passes but page fault occurs during copy:

MMU (Memory Management Unit) generates page fault exception
Kernel exception handler intercepts it
Returns to user code (preventing kernel crash)

5. Bounds Overflow Check

Validates that pointer arithmetic doesn’t overflow:

Prevents off-by-one errors in boundary validation.

Error Scenarios:

Condition	Check Failed	Return Value
`c = 0xFFFFFFFF` (kernel space)	Boundary check
`c = 0x50000000` (beyond TASK_SIZE)	Address range check
`c` in unmapped page	Page table walk
`c + sizeof(int)` overflows	Overflow check
Page swapped to disk	Page table/fault handler