1 Chapter 4 Interrupts and Exceptions

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 186

Chapter 4

Interrupts and Exceptions

Chapter 4 Interrupts and Exceptions 1


Introduction
 An interrupt is an event that alters the
sequence of instructions executed by a
processor
 In corresponding to electrical signals generated by
HW circuits both inside and outside CPU
 Interrupts: asynchronous interrupts
 Generated by HW devices (e.g., internal timers and
I/O devices) at arbitrary times
 Exceptions: synchronous interrupts
 Produced by CPU control unit only after completion
of an executing instruction
 E.g., divide-by-0, page faults
Chapter 4 Interrupts and Exceptions 2
Role of Interrupt/Exception Signals
 When an interrupt/exception signal occur
s, CPU
 Saves current process status (eip and cs) in th
e Kernel Mode stack
 Places addr of IH into program counter
 The code executed in IH is not a process
 It is a kernel control path that runs on behalf
of the same process

Chapter 4 Interrupts and Exceptions 3


Interrupt/Exception Handler Requirements
 As short as possible
 Deferring as much processing as it can
 E.g., A block of data arrives on a network line
 Top-half vs. bottom-half
 Nested interrupt handling
 Should be allowed as much as possible to keep I/O devices busy
 Interrupt handlers in Linux need not to be reentrant
 When an IH is executing, the corresponding interrupt line is
masked out on all processors
 The same IH is never invoked concurrently to service a nested
interrupt
 Maskable interrupts
 Some critical regions will not allow interrupts
 Be limited as much as possible

Chapter 4 Interrupts and Exceptions 4


Interrupts and
Exceptions

Chapter 4 Interrupts and Exceptions 5


Interrupts Definition
 Maskable interrupts
 AllIRQ issued by I/O devices
 Can be in 2 states: masked or unmasked
 Nonmaskable interrupts
 Criticalevents such as HW failures
 Always recognized by CPU

Chapter 4 Interrupts and Exceptions 6


Exceptions Definition
 Processor-detected exceptions: when CPU detects anom
alous condition while executing an instruction
 Faults: The saved eip is the addr of the instruction causing faul
t  re-execute same inst after IH
 Usage: e.g. page fault handler
 Traps: saved eip is the addr of inst after the one causing traps
 Main usage: debugging purpose (e.g. reaching a breakpoint)
 Aborts: a serious error that may be unable to determine exact i
nst causing this error  terminate affected process

 Programmed exceptions: occur at the request of progra


mmer
 Triggered by int, int3, into, bound instructions
 Handled by control unit as traps
 Often called SW interrupts
 Usage: to implement system calls and to notify a debugger of a
specific event
Chapter 4 Interrupts and Exceptions 7
Interrupt or Exception Vector
 Each interrupt or exception is identified b
y a number from 0 to 255
 Such a number is called its vector
 The vectors of nonmaskable interrupts an
d exceptions are fixed
 Maskable interrupts can be altered by pro
gramming the Interrupt Controller

Chapter 4 Interrupts and Exceptions 8


IRQs
 Each HW device controller capable of issuing interrupts
has an output line IRQ
 All existing IRQ lines are connected to the input pins of
the Interrupt Controller
 Interrupt Controller (IC) executes
 Monitoring IRQ lines, checking for raised signals
 If a raised signal is detected on an IRQ line
1. Converts signal into a corresponding vector
2. Stores vector in an IC I/O port, for CPU to read
3. Sends a signal to CPU’s INTR pin (i.e., issues an interrupt)
4. CPU recognizes and writes one of Programmable Interrupt
Controller (PIC) I/O ports
5. Clear INTR line
 Go back to monitoring step

Chapter 4 Interrupts and Exceptions 9


I/O Interrupt Handling
HARDWARE SOFTWARE
(Interrupt Handler)
Device 1 Device 2

IRQn

INT IDT[32+n]

PIC IRQn_interrupt()

do_IRQ(n)

Interrupt service Interrupt service


routine 1 routine 2

Chapter 4 Interrupts and Exceptions 10


IRQ Lines
 The first IRQ line is IRQ0
 The # of available IRQ lines is limited to 15 for now
 Intel default vector for IRQn = n + 32
 Mapping between IRQs and vectors can be modified b
y suitable I/O insts to IC ports
 PIC can be told to stop issuing interrupts referri
ng to a given IRQ line
 Disabled interrupts are not lost but delayed
 Selective enabling/disabling IRQs is not the sam
e as global masking/unmasking interrupts
 When IF flag of eflags register is clear  maskable i
nterrupts are temporarily ignored by CPU
Chapter 4 Interrupts and Exceptions 11
Homework Practice
 How do you find out your Linux PC IRQ assign
ment?
 Ans: go to
/proc/interrupts

Chapter 4 Interrupts and Exceptions 12


Exceptions
 80x86 issues ~20 different exceptions
 Each exception type is associated with a
dedicated exception handler
 For some exceptions, CPU also generates a
HW error code and pushes it in Kernel Mode
stack before jumping to exception handler
 An exception handler usually sends a Unix
signal to the process
 Exceptions 20-31 are reserved by Intel

Chapter 4 Interrupts and Exceptions 13


Interrupt Vectors
 IRQ vector assignment
 Vector assignment range: 32-238
 128 is reserved for system call exception

Vector range Use


0-19 (0x0 – 0x13) Nonmaskable interrupts and exceptions
20-31 (0x14 – 0x1f) Intel-reserved
32-127 (0x20 – 0x7f) External interrupts (IRQs)
128 (0x80) System call exception
129-238 (0x81 – 0xee) External interrupts (IRQs)
239 (0xef) Local APIC timer interrupt
240-250 (0xf0 – 0xfa) Reserved by Linux for future use
251 – 255 (0xfb – 0xff) Interprocessor interrupts

Chapter 4 Interrupts and Exceptions 14


# Exception Handler Signal
0 Divide error divide_error() SIGFPE

1 Debug debug() SIGTRAP

2 NMI nmi() None

3 Breakpoint int3() SIGTRAP

4 Overflow overflow() SIGSEGV

5 Bounds check bounds() SIGSEGV

6 Invalid opcode invalid_op() SIGILL

7 Device not device_not_available() SIGSEGV


available
8 Double fault double_fault() SIGSEGV

9 Coprocessor coprocessor_segment_o SIGFPE


segment overrun verrun()
Chapter 4 Interrupts and Exceptions 15
# Exception Handler Signal
10 Invalid TSS invalid_tss() SIGSEGV

11 Segment not segment_not_present() SIGBUS


present
12 Stack exception stack_segment() SIGBUS

13 General protection general_protection() SIGSEGV

14 Page Fault page_fault() SIGSEGV

15 Intel reserved None None

16 Floating-point coprocessor_error() SIGFPE


error
17 Alignment check alignment_check() SIGBUS

18 Machine check machine_check() None

19 SIMD floating point simd_coprocessor_error() SIGFPE


Chapter 4 Interrupts and Exceptions 16
Review Slide
 Interrupts? Exceptions?
 Interrupt handler? Requirements?
 Maskable vs. nonmaskable interrupts?
 Processor-detected exceptions?
 Faults, traps, aborts
 Programmed exceptions?
 SW interrupts?
 Interrupt vector? Range? Vector assignment?
 Interrupt controller processing steps?

Chapter 4 Interrupts and Exceptions 17


Review Slide
 Intel default vector for IRQn?
 Disabled interrupts? Masked interrupts?
 Number of exceptions defined for Intel?
 Homework #3: User-mode vs. kernel-mod
e stack
 Required for EOS new students
 Optional for others. Not graded.
 忠毅 : please present your report next week

Chapter 4 Interrupts and Exceptions 18


Interrupt Descriptor
Table

Chapter 4 Interrupts and Exceptions 19


Interrupt Descriptor Table
 IDT associates each interrupt (exception) vector with one interrupt
handler
 IDT must be properly initialized before kernel enable interrupts
 Each entry in IDT is 8 bytes descriptor
 A maximum of 256x8 = 2048 bytes are required to store IDT
 The register idtr stores base addr of IDT
 The P bit indicates whether it is currently in memory
 3 types of descriptors in IDT (40-43 bits)
 Task Gate (Linux does not use it)
 Interrupt Gate: before jumping to proper segment, CPU clears I
F flag  disabling maskable interrupts
 Trap Gate: before jumping to proper segment, CPU does not m
odify IF flag

Chapter 4 Interrupts and Exceptions 20


Task Gate Descriptor
63 48 47 46 45 44 43 42 41 40 39 32
D
RESERVED PP P 0000110011 RESERVED
L

TSS SEGMENT SELECTOR RESERVED


31 16 15 0
Interrupt Gate Descriptor
63 48 47 46 45 44 43 42 41 40 39 38 37 36 32
D
OFFSET(16-31) PP P 0011111100000000 RESERVED
L

SEGMENT SELECTOR OFFSET(0-15)


31 16 15 0

Trap Gate Descriptor


63 48 47 46 45 44 43 42 41 40 39 38 37 36 32
D
OFFSET(16-31) PP P 0011111111000000 RESERVED
L

SEGMENT SELECTOR OFFSET(0-15)


31 16 15 0
Chapter 4 Interrupts and Exceptions 21
HW Handling of Interrupts (Exceptions)
 In between instructions, control unit (CPU) checks if a
ny interrupt or exception occurs
1. Determines vector i (0<=i<=255) associated with the interrupt
(exception)
2. Read i-th entry of IDT
3. Obtain IH addr (by entry’s segment selector  gdtr  GDT 
segment base addr)
4. Check privilege level by comparing cs’s CPL and IH’s segment’
s DPL
5. Use the right stack (after checking privilege level)
6. If a fault has occurs, load cs and eip with the add of the inst c
ausing fault
7. Saves contents of eflags, cs, and eip in the stack
8. Load cs and eip of the IH routine

Chapter 4 Interrupts and Exceptions 22


Interrupt Handler Return Path
1. Load cs, eip, and eflags registers with the val
ues stored in the stack
 If a HW error code has been pushed in the stack on
top of eip, it must be popped before taking the ret
urn path
2. Check if CPL of ISR’s cs == the CPL value of th
e restored cs. If so, ISR is done.
3. Otherwise, load ss and esp from stack and ret
urn to the stack associated with old privilege l
evel
4. Take care of user-mode process return case to
avoid using wrong segment selectors
Chapter 4 Interrupts and Exceptions 23
Nested Execution of IHs
 Linux does not allow process switching during an interru
pt handler routine
 But, an interrupt handler may be interrupted by another one
 The current process does not change during nested IHs

 The only kernel exception is Page Fault exception


 The rest exceptions should only be raised in user mode
 Otherwise (raised in kernel mode), it caused a kernel panic

 Page fault exception handlers may suspend current proc


ess (until requested page is in memory)
 Context switch is possible inside this handler

 Interrupts raised by I/O devices do not refer to data str


uctures specific to current process
Chapter 4 Interrupts and Exceptions 24
Nested Execution of IHs
 Interrupt handlers cannot allow page fault
 No exception handler may preempt interrupt handler
 No context switch will take place inside interrupt ha
ndler
 Nested execution of IHs for
 To improve throughput of PIC and device controllers
 Before CPU acks an interrupt, both PIC and a device control
ler are blocked
 To implement an interrupt model without priority m
odel
 An interrupt handler can be preempted by another one

Chapter 4 Interrupts and Exceptions 25


IDT Initialization
 The base addr of IDT should be loaded into idtr before kernel enab
les interrupts
 lidt idt_descr # (arch/i386/kernel/head.S)
 idt_descr:
 .word IDT_ENTRIES*8-1 # idt contains 256 entries
 .long idt_table

 The int instruction allows a User Mode process to issue any interru
pt signal with any vector in 0 and 255
 To block illegal int from a user-mode process, set DPL of gate descript
or to 0
 When an int from a user-mode process, its CPL (3) > DPL (0)  “gener
al protection” exception

 In a few cases, a user-mode process must be able to issue a progra


mmed exception
 set DPL of gate descriptor to 3

Chapter 4 Interrupts and Exceptions 26


Interrupt, Trap, System Gates
 Intel IDT provides 3 types of interrupt descriptors
 Task, Interrupt, Trap gate descriptors
 Linux’s classification
 Interrupt gate (DPL = 0)
 Cannot be accessed by a user-mode process
 All Linux interrupt handlers use this one
 System gate (DPL = 3)
 An Intel trap gate that can be accessed by a user process
 Vectors 3 (int3), 4 (into), 5 (bound), 128 (int $0x80)
 Trap gate (DPL = 0)
 An Intel trap gate that cannot be accessed by a user process
 Most Linux exception handlers use this one
Chapter 4 Interrupts and Exceptions 27
IDT Operations
 set_intr_gate (n,addr)
 Insert an interrupt gate in the n-th IDT entry
 Segment selector  kernel code’s selector
 Offset  addr, DPL  0
 set_system_gate (n,addr)
 Insert a trap gate in the n-th IDT entry
 Segment selector  kernel code’s selector
 Offset  addr, DPL  3
 set_trap_gate (n,addr)
 Insert a trap gate in the n-th IDT entry
 Segment selector  kernel code’s selector
 Offset  addr, DPL  0
 Code trace: trap_init()
Chapter 4 Interrupts and Exceptions 28
IDT Preliminary Initialization
 IDT is first initialized and used by BIOS
 Once Linux takes over (protected mode), IDT is initialized
again by Linux
 idt_table: 256 entries
 During kernel initialization
 setup_idt() fills all entries in idt_table with ignore_int()
 arch/i386/kernel/head.S
 ignore_int()
 save registers in stack  printk()  restore registers from stack
 execute iret to resume
 Second initialization: kernel replaces some entries with re
al interrupt handlers
 trap_init()

Chapter 4 Interrupts and Exceptions 29


Review Slide
 IDT? # of entries in IDT? Size of each entry?
 Base addr of IDT?
 Types of descriptors in IDT?
 The only kernel exception?
 How to block illegal interrupt from a user-mode
process?
 How to enable a user-mode process issue a
programmed exception?
 Linux interrupt descriptor classification?
 Interrupt gate, System gate, Trap gate?

Chapter 4 Interrupts and Exceptions 30


Review Slide
 set_intr_gate(), set_system_gate(), set_tr
ap_gate()?

Chapter 4 Interrupts and Exceptions 31


Exception Handling

Chapter 4 Interrupts and Exceptions 32


Introduction
 Most exceptions issued by CPU are interpreted by Linux
as error conditions
 A signal is sent to current process
 If no signal handler is set for that signal, it aborts current proc
ess
 Special case: page fault exception
 Exception handler handling steps:
 Save registers in Kernel Mode stack
 Call a high-level C function to handle exception
 Exit from handler by call ret_from_exception()
 Code trace: page_fault exception
 arch/i386/kernel/entry.S
 arch/i386/kernel/traps.C
Chapter 4 Interrupts and Exceptions 33
Exception Handler Registration
void __init trap_init(void) set_trap_gate(19,&simd_coprocessor_error)
{ ;

set_trap_gate(0,&divide_error); set_system_gate(SYSCALL_VECTOR,&syste
set_intr_gate(1,&debug); m_call);
set_intr_gate(2,&nmi);
set_system_gate(3,&int3); set_call_gate(&default_ldt[0],lcall7);
set_call_gate(&default_ldt[4],lcall27);
/* int3-5 can be called from all */
set_system_gate(4,&overflow);
set_system_gate(5,&bounds); cpu_init();
set_trap_gate(6,&invalid_op); trap_init_hook();
}
set_trap_gate(7,&device_not_available);
set_task_gate(8,GDT_ENTRY_DOUBLEFAULT_TSS);
set_trap_gate(9,&coprocessor_segment_overrun);
set_trap_gate(10,&invalid_TSS);
set_trap_gate(11,&segment_not_present);
set_trap_gate(12,&stack_segment);
set_trap_gate(13,&general_protection);
set_intr_gate(14,&page_fault);
set_trap_gate(15,&spurious_interrupt_bug);
set_trap_gate(16,&coprocessor_error);
set_trap_gate(17,&alignment_check);

Chapter 4 Interrupts and Exceptions 34


Entering/Leaving Exception Handler
 A high-level C handler often stores error code and vecto
r in task_struct and sends a suitable signal to current pr
ocess
current->tss.error_code = error_code;
current->tss.trap_no = vector;
force_sig(sig_num, current);
 Code trace: do_general_protection()
 The current process takes care of signal right after term
ination of exception handler
 Signal will be processed by process’s signal handler
 If no handler is available, kernel will handle it and kill process
 When exception handler returns, it goes to
addl $8, %esp
jmp ret_from_exception
Chapter 4 Interrupts and Exceptions 35
Interrupt Handling

Chapter 4 Interrupts and Exceptions 36


Introduction
 No signal is sent to process for interrupts
 Signal is sent to process for exceptions
 Interrupt handler for a device is part of t
he device’s driver
 Interrupt types:
 I/Ointerrupts: to handle I/O devices
 Timer interrupts: Chapter 6
 Self-reading material
 Interprocessor interrupts: to interrupt anothe
r CPU in a MP system
Chapter 4 Interrupts and Exceptions 37
I/O Interrupt Handling
 An I/O IH should be capable of servicing several devices
at the same time
 Several devices may share same IRQ
 Refer to Table 4.3 in next slide
 IRQ sharing
 One interrupt handler executes several ISRs
 Each ISR is related to a single device sharing this IRQ line
 Each ISR is executed when an interrupt occurs
 IRQ dynamic allocation
 An IRQ line is associated with a device when accessed
 E.g. floppy disk device
 Same IRQ vector may be used by several devices, but not at the
same time

Chapter 4 Interrupts and Exceptions 38


I/O Interrupt Handling
HARDWARE SOFTWARE
(Interrupt Handler)
Device 1 Device 2

IRQn

INT IDT[32+n]

PIC IRQn_interrupt()

do_IRQ(n)

Interrupt service Interrupt service


routine 1 routine 2

Chapter 4 Interrupts and Exceptions 39


Sample: IRQ Assignment to I/O Devices
IRQ INT Device IRQ INT Device

0 32 Timer 10 42 Network interface

1 33 Keyboard 11 43 USB, sound card

2 34 PIC cascading 12 44 PS/2 mouse

3 35 2nd serial port 13 45 Math coprocessor

4 36 1st serial port 14 46 EIDE disk controller


1st chain
6 38 Floppy disk 15 47 EIDE disk controller
2nd chain
8 40 System clock

Chapter 4 Interrupts and Exceptions 40


Interrupt Handler Structure
 Linux divides the actions in an IH into 3 classes
 Critical, Noncritical, Noncritical deferrable
 Critical
 E.g. ack an interrupt to PIC so it can take another interrupt at t
he same IRQ line
 Executed in IH, with maskable interrupts disabled
 Noncritical
 E.g. updating data structures accessed only by processor
 Should be finished quickly
 Executed in IH, with maskable interrupts enabled
 Noncritical deferrable
 E.g. copying buffer content into addr space of some process
 Can be delayed for a long time
 Executed outside IH, called bottom-half section
Chapter 4 Interrupts and Exceptions 41
Interrupt Vectors
 Some devices be statically connected to
specific IRQ lines
 Internal timer  IRQ0
 Salve 8259A PIC  IRQ2
 External math-coprocessor  IRQ13
 3 ways to dynamically select a line for IRQ-
configurable devices
 By setting HW jumpers
 By a utility program shipped with the device
 By HW protocol executed at system startup

Chapter 4 Interrupts and Exceptions 42


Interrupt Handler
Implementation

Chapter 4 Interrupts and Exceptions 43


I/O Interrupt Handler Tasks
1. Save IRQ value and register contents in
Kernel Mode stack
2. Sends an ack to PIC that is servicing the
IRQ line, allowing it to issue further inte
rrupts
3. Execute ISRs associated with all devices
sharing this IRQ
4. Terminating by ret_from_intr()
Chapter 4 Interrupts and Exceptions 44
typedef struct irq_desc {
unsigned int status; /* IRQ line status, next slide */
hw_irq_controller *handler;
struct irqaction *action; /* IRQ action ISR list */
unsigned int depth; /* nested irq disables */
unsigned int irq_count; /* For detecting broken interrupts */
unsigned int irqs_unhandled;
spinlock_t lock;
} ____cacheline_aligned irq_desc_t;

extern irq_desc_t irq_desc [NR_IRQS]; // global variable

typedef struct hw_interrupt_type hw_irq_controller;

struct hw_interrupt_type {
const char * typename;
unsigned int (*startup) (unsigned int irq);
void (*shutdown) (unsigned int irq);
void (*enable) (unsigned int irq);
void (*disable) (unsigned int irq);
void (*ack) (unsigned int irq);
void (*end) (unsigned int irq);
void (*set_affinity) (unsigned int irq, cpumask_t dest);
};
Chapter 4 Interrupts and Exceptions 45
IRQ Descriptors
0 i 224
irq_desc
hw_interrupt_type

irq_desc_t

:
irqaction irqaction

Chapter 4 Interrupts and Exceptions 46


IRQ Status Listing
/*
* IRQ line status.
*/

#define IRQ_INPROGRESS 1 /* IRQ handler active - do not enter! */


#define IRQ_DISABLED 2 /* IRQ disabled - do not enter! */
#define IRQ_PENDING 4 /* IRQ pending - replay on enable */
#define IRQ_REPLAY 8 /* IRQ has been replayed but not acked yet */
#define IRQ_AUTODETECT 16 /* IRQ is being autodetected */
#define IRQ_WAITING 32 /* IRQ not yet seen - for autodetection */
#define IRQ_LEVEL 64 /* IRQ level triggered */
#define IRQ_MASKED 128 /* IRQ masked - shouldn't be seen again */
#define IRQ_PER_CPU 256 /* IRQ is per CPU */

Chapter 4 Interrupts and Exceptions 47


.data /* The include is where all of the SMP etc. interrupts co
ENTRY(interrupt) me from */
.text #include "entry_arch.h"

vector=0 ENTRY(divide_error)
ENTRY(irq_entries_start) pushl $0 # no error code
.rept NR_IRQS pushl $do_divide_error
ALIGN ALIGN
error_code:
1: pushl $vector-256
pushl %ds
jmp common_interrupt
pushl %eax
.data
xorl %eax, %eax
.long 1b
pushl %edx
.text
decl %eax # eax = -1
vector=vector+1
pushl %ecx
.endr
pushl %ebx
cld
ALIGN movl %es, %ecx
common_interrupt: movl ORIG_EAX(%esp), %esi # get the error code
SAVE_ALL movl ES(%esp), %edi # get the function address
call do_IRQ movl %eax, ORIG_EAX(%esp)
jmp ret_from_intr movl %ecx, ES(%esp)
movl %esp, %edx
#define BUILD_INTERRUPT(name, nr) \ pushl %esi # push the error code
ENTRY(name) \ pushl %edx # push the pt_regs pointer
pushl $nr-256; \ movl $(__USER_DS), %edx
SAVE_ALL \ movl %edx, %ds
call smp_/**/name; \ movl %edx, %es
jmp ret_from_intr; call *%edi
addl $8, %esp
jmp ret_from_exception

Chapter 4 Interrupts and Exceptions 48


irq_desc_t irq_desc[NR_IRQS] __cacheline_aligned = { void __init pre_intr_init_hook(void)
[0 ... NR_IRQS-1] = { {
.handler = &no_irq_type, init_ISA_irqs();
.lock = SPIN_LOCK_UNLOCKED } }
}; void __init init_ISA_irqs (void)
{
asmlinkage void __init start_kernel(void) init_8259A(0);
{ … for (i = 0; i < NR_IRQS; i++) {
sort_main_extable(); irq_desc[i].status = IRQ_DISABLED;
trap_init(); irq_desc[i].action = 0;
rcu_init(); irq_desc[i].depth = 1;
init_IRQ();
… } if (i < 16) {
irq_desc[i].handler = &i8259A_irq_type;
void __init init_IRQ(void) } else {
{ irq_desc[i].handler = &no_irq_type;
pre_intr_init_hook(); }
}
for (i = 0; i < (NR_VECTORS - FIRST_EXTERNAL_VEC }
TOR); i++) {
int vector = FIRST_EXTERNAL_VECTOR + i; static struct hw_interrupt_type i8259A_irq_type = {
if (i >= NR_IRQS) "XT-PIC",
break; startup_8259A_irq,
if (vector != SYSCALL_VECTOR) shutdown_8259A_irq,
set_intr_gate(vector, interrupt[i]); enable_8259A_irq,
} disable_8259A_irq,
intr_init_hook(); mask_and_ack_8259A,
setup_timer(); … end_8259A_irq,
} NULL
};

Chapter 4 Interrupts and Exceptions 49


asmlinkage unsigned int do_IRQ(struct pt_regs regs) asmlinkage int handle_IRQ_event(unsigned int irq,
{ struct pt_regs *regs, struct irqaction *action)
int irq = regs.orig_eax & 0xff; {
/* high bits used in ret_from_ code */ int status = 1;
irq_desc_t *desc = irq_desc + irq; /* Force the "do bottom halves" bit */
int retval = 0;
irq_enter();
kstat_this_cpu.irqs[irq]++;
if (!(action->flags & SA_INTERRUPT))
spin_lock(&desc->lock);
local_irq_enable(); // RA
desc->handler->ack(irq);
status = desc->status & ~(IRQ_REPLAY | IRQ_WAITIN
G); do {
status |= IRQ_PENDING; status |= action->flags;
/* we _want_ to handle it */ retval |= action->handler(irq,
action->dev_id, regs);
for (;;) { action = action->next;
irqreturn_t action_ret; } while (action);
spin_unlock(&desc->lock);
… if (status & SA_SAMPLE_RANDOM)
action_ret = handle_IRQ_event(irq, &regs, action); add_interrupt_randomness(irq);
… local_irq_disable(); // RA
spin_lock(&desc->lock); return retval;
desc->status &= ~IRQ_PENDING; }
}
desc->status &= ~IRQ_INPROGRESS;

out:
desc->handler->end(irq);
spin_unlock(&desc->lock);
irq_exit();
return 1;
}

Chapter 4 Interrupts and Exceptions 50


Registering Interrupt Service Routine
 Drivers can register an IH and enable a given int
errupt line via
int int request_irq(unsigned int irq,
irqreturn_t (*handler)(int, void *, struct pt_regs *),
unsigned long irqflags, const char * devname, void *dev_id);
 irq: the interrupt line # to allocate
 For legacy PC device, this value is hard-coded
 For most other devices, it is probed or determined dynamically
 handler: pointer to actual ISR
 irqflags: discussed in next slide
 devname: an ASCII text representation such as “keyboard”
 dev_id: is used as an unique cookie when this line is shared
 A common practice is to pass driver’s device structure
Chapter 4 Interrupts and Exceptions 51
irqflags Options
 irqflags may be either 0 or a bit mask of one or more of
following flags
 SA_INTERRUPT
 The given IH is a fast IH: it runs with all interrupts disabled on l
ocal processor
 By default (w/o this flag), all interrupts are enabled except the
interrupt lines of any running handlers
 SA_SAMPLE_RANDOM
 Interrupts generated by this device should contribute to the ke
rnel random pool
 Used on devices with non-deterministic interrupt intervals
 SA_SHIRQ
 The interrupt line cab be shared among multiple ISRs

Chapter 4 Interrupts and Exceptions 52


request_irq Usage
 To request an interrupt line and install a handler
if (request_irq(irqn, my_interrupt, SA_SHIRQ, “my-device”, dev)) {
printk(KERN_ERR “my_device: cannot register IRQ %d\n”, irqn);
return –EIO;
}
 This call may block, so it cannot be called from interrupt conte
xt or other situations where code cannot block
 If return 0  handler was successfully installed

 To free an interrupt line, call


void free_irq(unsigned int irq, void *dev_id);
 If line is not shared, it removes handler and disables the line
 Otherwise, the line is only disabled at removal of last handler
 dev_id is used to uniquely identify an interrupt handler
 This call can be made from process context
Chapter 4 Interrupts and Exceptions 53
int request_irq(unsigned int irq, int setup_irq(unsigned int irq, struct irqaction * new)
irqreturn_t (*handler)(int, void *, struct pt_regs *), {
unsigned long irqflags, const char * devname, irq_desc_t *desc = irq_desc + irq;
void *dev_id)
{ if (desc->handler == &no_irq_type)
int retval; return -ENOSYS;
struct irqaction * action; spin_lock_irqsave(&desc->lock,flags);
p = &desc->action;
if (irq >= NR_IRQS) return -EINVAL; if ((old = *p) != NULL) {
if (!handler) return -EINVAL; if (!(old->flags & new->flags & SA_SHIRQ)) {
spin_unlock_irqrestore(&desc->lock,flags);
action = (struct irqaction *) return -EBUSY;
kmalloc(sizeof(struct irqaction), GFP_ATOMIC); }
if (!action)
return -ENOMEM; do { p = &old->next; old = *p;
} while (old);
action->handler = handler; shared = 1;
}
action->flags = irqflags;
action->mask = 0;
*p = new;
action->name = devname;
if (!shared) {
action->next = NULL;
desc->depth = 0;
action->dev_id = dev_id;
desc->status &= ~(IRQ_DISABLED | IRQ_AUTODETEC
T | IRQ_WAITING | IRQ_INPROGRESS);
retval = setup_irq(irq, action); desc->handler->startup(irq);
if (retval) kfree(action); }
return retval; spin_unlock_irqrestore(&desc->lock,flags);
}
register_irq_proc(irq);
return 0;
}

Chapter 4 Interrupts and Exceptions 54


Processing Steps in Detail
1. A device issues an interrupt by sending an electric signal to the i
nterrupt controller
2. If the interrupt line is enabled (can be disabled), IC sends interr
upt to processor
3. If interrupts are not disabled in processor, it immediately stops
current execution
4. It disables interrupt system // RA: where does this take place?
5. It jumps to a predefined location memory and executes code (en
try code) by its vector
6. Entry code saves IRQ# and current register values on stack and c
alls do_IRQ()
7. do_IRQ() acks receipt of interrupt and disable interrupt delivery
on this IRQ line
8. do_IRQ() calls handle_IRQ_event() to execute registered ISRs
9. do_IRQ() returns to entry code
10. Entry code jumps to ret_from_intr()
Chapter 4 Interrupts and Exceptions 55
.data /* The include is where all of the SMP etc. interrupts co
ENTRY(interrupt) me from */
.text #include "entry_arch.h"

vector=0 ENTRY(divide_error)
ENTRY(irq_entries_start) pushl $0 # no error code
.rept NR_IRQS pushl $do_divide_error
ALIGN ALIGN
error_code:
1: pushl $vector-256
pushl %ds
jmp common_interrupt
pushl %eax
.data
xorl %eax, %eax
.long 1b
pushl %edx
.text
decl %eax # eax = -1
vector=vector+1
pushl %ecx
.endr
pushl %ebx
cld
ALIGN movl %es, %ecx
common_interrupt: movl ORIG_EAX(%esp), %esi # get the error code
SAVE_ALL movl ES(%esp), %edi # get the function address
call do_IRQ movl %eax, ORIG_EAX(%esp)
jmp ret_from_intr movl %ecx, ES(%esp)
movl %esp, %edx
#define BUILD_INTERRUPT(name, nr) \ pushl %esi # push the error code
ENTRY(name) \ pushl %edx # push the pt_regs pointer
pushl $nr-256; \ movl $(__USER_DS), %edx
SAVE_ALL \ movl %edx, %ds
call smp_/**/name; \ movl %edx, %es
jmp ret_from_intr; call *%edi
addl $8, %esp
jmp ret_from_exception

Chapter 4 Interrupts and Exceptions 56


asmlinkage unsigned int do_IRQ(struct pt_regs regs) asmlinkage int handle_IRQ_event(unsigned int irq,
{ struct pt_regs *regs, struct irqaction *action)
int irq = regs.orig_eax & 0xff; {
/* high bits used in ret_from_ code */ int status = 1;
irq_desc_t *desc = irq_desc + irq; /* Force the "do bottom halves" bit */
int retval = 0;
irq_enter();
kstat_this_cpu.irqs[irq]++;
if (!(action->flags & SA_INTERRUPT))
spin_lock(&desc->lock);
local_irq_enable();
desc->handler->ack(irq);
status = desc->status & ~(IRQ_REPLAY | IRQ_WAITIN
G); do {
status |= IRQ_PENDING; status |= action->flags;
/* we _want_ to handle it */ retval |= action->handler(irq,
action->dev_id, regs);
for (;;) { action = action->next;
irqreturn_t action_ret; } while (action);
spin_unlock(&desc->lock);
… if (status & SA_SAMPLE_RANDOM)
action_ret = handle_IRQ_event(irq, &regs, action); add_interrupt_randomness(irq);
… local_irq_disable();
spin_lock(&desc->lock); return retval;
desc->status &= ~IRQ_PENDING; }
}
desc->status &= ~IRQ_INPROGRESS;

out:
desc->handler->end(irq);
spin_unlock(&desc->lock);
irq_exit();
return 1;
}

Chapter 4 Interrupts and Exceptions 57


ret_from_intr()
 It is written in assembly code
 It first checks whether a reschedule is pe
nding (need_resched)
 If need_resched and kernel is returning to
user-space, schedule() is called
 If need_resched and kernel is returning to
kernel-space, schedule() is called only if
(preempt_count == 0)

Chapter 4 Interrupts and Exceptions 58


Review Slide
 Which exception does not generate signal to process?
 Exception handler initialization? Processing step?
 Types of interrupts?
 I/O, timer, interprocessor?
 IRQ sharing? IRQ dynamic allocation?
 Linux classification of actions in IH?
 Critical, Noncritical, Noncriticial Deferrable
 3 ways to select IRQ lie for configurable device?
 HW jumpers, utility program, HW protocol
 Interrupt handler initialization? Processing step?

Chapter 4 Interrupts and Exceptions 59


Review Slide
 How to register an ISR?
 request_irq() usage? Parameters?
 irqline, routine, flags, devname, dev_id?
 Flags usage?
 SA_INTERRUPT, SA_SAMPLE_RANDOM, SA_SHIRQ
 free_irq() usage?
 RA: Study usage of SA_SAMPLE_RANDOM
 How it affects random-number generator
 Homework #4: IDT Table Initialization
 Required for everyone
 Mail your report to TA before deadline
Chapter 4 Interrupts and Exceptions 60
8259A PIC

Chapter 4 Interrupts and Exceptions 61


8259A PIC History
 在 IBM PC 及其相容機上所使用的 PIC 是 Intel 8259A 晶片
 一個 8259A 晶片的可以接最多 8 個中斷源,但由於可以將
2 個或多個 8259A 晶片 cascade ,最多可以到 8 個
 所以可以接 64 個中斷源
 早期 IBM PC/XT 只有 1 個 8259A ,但設計師們馬上意識到
這是不夠的,於是到了 IBM PC/AT , 8259A 被增加到 2 個
 其中一個稱作 Master ,另外一個為 Slave
 Slave cascade 連接在 Master 上
 如今大多數的 PC 都擁有 2 個 8259A ,最多可以接收
15 個中斷
 通過 8259A 可以對單個中斷源進行遮罩

Chapter 4 Interrupts and Exceptions 62


8259A Architecture
 一個 8259A 晶片有以下幾
個內部暫存器
 Interrupt Mask Register (IMR)
 過濾被遮罩的中斷
 Interrupt Request Register (IRR)
 暫時放置未被進一步處理的 In
terrupt
 In Service Register (ISR)
 當一個 Interrupt 正在被 CPU
處理時,此中斷被放置在 ISR

Chapter 4 Interrupts and Exceptions 63


More on 8259A PIC
 8259A 還有一個單元叫做 Priority Resolver
 當多個中斷同時發生時, Priority Resolver 根據它
們的優先順序,將高優先順序者優先傳遞給 CPU
 Pentium 以及後來的 CPU 將 PIC 集成 Advanc
ed Programmable Interrupt Controller (APIC)
 不過為了向前相容,即便有 APIC 的機器也會有 825
9A
 現在的主機板上, 8259A 都是由南橋晶片提供

Chapter 4 Interrupts and Exceptions 64


Interrupt Control on SMP
 當 Intel 考慮如何在 IA-32 上架
構 SMP 時,原來的中斷控制器 8
259A 就顯得力不從心。
 在 SMP 上,必須考慮外部設備來
的中斷信號如何傳遞給某個合適的
CPU 以及 IPI ( Inter-Processo
r Interrupt )問題。
 Intel 自 Pentium 之後,在 CP
U 中集成了 APIC ,在 SMP 上,
主板上有一個(至少一個,有的主
板有多個 IO-APIC ,用來更好的分
發中斷信號)全局的 APIC
 它負責從外設接收中斷信號,再分
發到 CPU 上,這個全局的 APIC
被稱作 IO-APIC

Chapter 4 Interrupts and Exceptions 65


8295A Processing Flow (1/2)
1. 當一個中斷請求從 IR0 ~ IR7 中的某條線到達 IMR 時, IMR 首先判斷此
IR 是否被遮罩,若是,則此中斷請求被丟棄;否則,則將其放入 IRR 中
2. 在此中斷請求不能進一步處理之前,它一直被放在 IRR 中。一旦時機已到,
Priority Resolver 將從所有被放置於 IRR 中的中斷裡挑選出一個優先順
序最高的,將其傳遞給 CPU 處理。 IR 號碼越低的中斷優先順序別越高, (IR
0 Timer 有最高優先權 )
3. 8259A 通過發送一個 INTR (Interrupt Request) 信號給 CPU ,通知 CPU
有一個中斷到達。 CPU 收到此信號後,會暫停執行下一條指令,然後發送一
個 INTA (Interrupt Acknowledge) 信號給 8259A
4. 8259A 收到這個信號之後,馬上 set ISR 中對應此中斷的 bit ,同時 rese
t IRR 中相應的 bit ,表示此中斷正在被 CPU 處理,而不是正在等待 CPU
5. 隨後, CPU 會再次發送一個 INTA 信號給 8259A ,要求它告訴 CPU 此中
斷請求的中斷向量是什麼,這是一個從 0 ~ 255 的一個數
6. 8259A 根據被設置的起始向量(起始向量通過中斷控制字 ICW2 被初始化)
加上中斷請求號碼計算出中斷向量號,並將其放置在 Data Bus 上

Chapter 4 Interrupts and Exceptions 66


8295A Processing Flow (2/2)
 CPU 從 Data Bus 上得到這個中斷向量之後,就去 IDT 中找到相
應的中斷服務程式 ISR routine
 如果 8259A 的 End of Interrupt (EOI) 通知被設為手動模式,
那麼當 ISR 處理後,應該發送一個 EOI 給 8259A
 8259A 得到 EOI 通知之後, ISR 中對應此中斷請求的 bit 會被
reset
 如果 EOI 通知被設定為自動模式,則在收到第 2 個 INTA 信號
後, 8259A ISR 中對應於此中斷請求的 bit 就會被 reset
 在此期間,如果又有新的中斷請求到達,並被放置於 IRR 中,如果
這些新的請求中有比在 ISR 中放置的所有中斷優先順序別還高的話
,則這些高優先級別的中斷請求將會被馬上按照上述過程處理;否則
,這些中斷將會被放在 IRR 中,直到 ISR 中高優先的中斷被處理
結束,也就是說直到 ISR 中高優先級別的 bit 被 reset 為止

Chapter 4 Interrupts and Exceptions 67


IRQ2 / IRQ9 Redirection
 為什麼要將 IRQ2 重定向到 IRQ9 上?這是由於相容性問題造成的
 到了 IBM PC/AT ,以 cascade 的方式增加了一個 8259A ,這樣可
以多處理 7 種 IRQ 。原來的 8259A 被稱作 Master PIC ,新增的
被稱作 Slave PIC
 由於 CPU 只有 1 條中斷線, Slave PIC 只好 cascade 在 Master
PIC 上,佔用 IRQ2 ,但是導致在 IBM PC/XT 上使用 IRQ2 的設備將
無法再使用它
 為了解決此ㄧ問題,設計者從 Slave PIC 中挑出 IRQ9 ,要求軟體
設計者將原來的 IRQ2 重定向到 IRQ9 上,也就是說 IRQ9 的 IS
R routine 必須呼叫 IRQ2 的 ISR routine
 這樣,原來接在 IRQ2 上的設備現在接在 IRQ9 上,在軟體上只需
要增加 IRQ9 的 ISR ,就可以和原有系統相容。而在當時,增加的
IRQ9 ISR 是由 BIOS 所提供,所以從根本上保證了相容。

Chapter 4 Interrupts and Exceptions 68


I/O Port & Address
/ * arch/i386/mach-generic/io_ports.h  每一顆 8259A 晶片都有 2 個
Machine specific IO port address definition I/O ports ,通過其控制 8259A
for generic. */
 Master 8259A 是 0x20 , 0x2
/* i8259A PIC registers */
#define PIC_MASTER_CMD 0x20 1
#define PIC_MASTER_IMR 0x21  Slave 8259A 是 0xA0 , 0xA1
#define PIC_MASTER_ISR
PIC_MASTER_CMD
#define PIC_MASTER_POLL
PIC_MASTER_ISR  可向 8259A 寫入 2 種命令
#define PIC_MASTER_OCW3
PIC_MASTER_ISR  Initialization Command Word
#define PIC_SLAVE_CMD 0xa0
#define PIC_SLAVE_IMR 0xa1
(ICW) :對 8259A 晶片初始

/* i8259A PIC related value */
 Operation Command Word (O
#define PIC_CASCADE_IR 2
#define MASTER_ICW4_DEFAULT 0x01 CW) :向 8259A 發佈命令,
#define SLAVE_ICW4_DEFAULT 0x01
#define PIC_ICW4_AEOI 2
以對其進行控制

Chapter 4 Interrupts and Exceptions 69


Linux 8259A Interrupt Handler
/* linux-2.6.14.1\arch\i386\kernel\I8259.c */

static struct hw_interrupt_type


i8259A_irq_type =
{
.typename = "XT-PIC",
.startup = startup_8259A_irq,
.shutdown = shutdown_8259A_irq,
.enable = enable_8259A_irq,
.disable = disable_8259A_irq,
.ack = mask_and_ack_8259A,
.end = end_8259A_irq,
};

/* This contains the irq mask for both 8259A irq controllers, */
unsigned int cached_irq_mask = 0xffff;

Chapter 4 Interrupts and Exceptions 70


startup_8259A_irq and shutdown_8259A_irq
(arch/i386/kernel/i8259.c)
54 unsigned int startup_8259A_irq(unsigned int irq)
55 {
56 enable_8259A_irq(irq);
57 return 0;
58 }

50 #define shutdown_8259A_irq disable_8259A_irq

Chapter 4 Interrupts and Exceptions 71


enable_8259A_irq
(arch/i386/kernel/i8259.c)
105 void enable_8259A_irq(unsigned int irq)
106 {
107 unsigned int mask = ~(1 << irq);
// Mask will be 11101111 11111111b if irq = 12d
108 unsigned long flags;
109
110 spin_lock_irqsave(&i8259A_lock, flags);
111 cached_irq_mask &= mask;
// 00110011 00111000b (Ori cached_irq_mask)
// 11101111 11111111b (mask)
// 00100011 00111000b (New cached_irq_mask)
112 if (irq & 8) // whether irq >= 8
113 outb(cached_slave_mask, PIC_SLAVE_IMR);
114 else
115 outb(cached_master_mask, PIC_MASTER_IMR);
116 spin_unlock_irqrestore(&i8259A_lock, flags);
117 }

Chapter 4 Interrupts and Exceptions 72


disable_8259A_irq
(arch/i386/kernel/i8259.c)
91 void disable_8259A_irq(unsigned int irq)
92 {
93 unsigned int mask = 1 << irq;
94 unsigned long flags;
95
96 spin_lock_irqsave(&i8259A_lock, flags);
97 cached_irq_mask |= mask;
98 if (irq & 8)
99 outb(cached_slave_mask, PIC_SLAVE_IMR);
100 else
101 outb(cached_master_mask, PIC_MASTER_IMR);
102 spin_unlock_irqrestore(&i8259A_lock, flags);
103 }

Chapter 4 Interrupts and Exceptions 73


include/asm-i386/mach-default/io_ports.h

15 /* i8259A PIC registers */


16 #define PIC_MASTER_CMD 0x20
17 #define PIC_MASTER_IMR 0x21
18 #define PIC_MASTER_ISR PIC_MASTER_CMD
19 #define PIC_MASTER_POLL PIC_MASTER_ISR
20 #define PIC_MASTER_OCW3 PIC_MASTER_ISR
21 #define PIC_SLAVE_CMD 0xa0
22 #define PIC_SLAVE_IMR 0xa1

Chapter 4 Interrupts and Exceptions 74


include/asm-i386/i8259.h
4 extern unsigned int cached_irq_mask;
5
6 #define __byte(x,y) (((unsigned char *) &(y))[x])
7 #define cached_master_mask (__byte(0, cached_irq_mask))
8 #define cached_slave_mask (__byte(1, cached_irq_mask))

Chapter 4 Interrupts and Exceptions 75


/* Not all IRQs can be routed through the IO-APIC, eg. on certain (older)
* boards the timer interrupt is not really connected to any IO-APIC pin,
* it's fed to the master 8259A's IR0 line only.
*
* Any '1' bit in this mask means the IRQ is routed through the IO-APIC.
* this 'mixed mode' IRQ handling costs nothing because it's only used
* at IRQ setup time. */

void disable_8259A_irq(unsigned int irq)


{
unsigned int mask = 1 << irq;
unsigned long flags;

// 確定對 master & slave 8259A 的 operation 是 mutual exclusion


// for SMP system ?
spin_lock_irqsave(&i8259A_lock, flags);

// 設定相對應的 bit 為 1 以 disable 此 IRQ line


cached_irq_mask |= mask;

// 判斷是否 irq >= 8


if (irq & 8)
// store slave IRQ mask
outb(cached_slave_mask, PIC_SLAVE_IMR);
else
// store master IRQ mask
outb(cached_master_mask, PIC_MASTER_IMR);

spin_unlock_irqrestore(&i8259A_lock, flags);
}

Chapter 4 Interrupts and Exceptions 76


static void mask_and_ack_8259A(unsigned int irq) // 向 PIC 發送 EOI 表示 Int. Service 結束
{
unsigned int irqmask = 1 << irq;
unsigned long flags;

spin_lock_irqsave(&i8259A_lock, flags);
if (cached_irq_mask & irqmask) // 判斷是否指定的 IRQ line 已經被 mask
// 8259A 在 IMR Reg 中相應位置被設為 1 情況下
// 仍向 CPU 發出相應的中斷信號 , 因此是ㄧ個假中斷
goto spurious_8259A_irq;
cached_irq_mask |= irqmask;

handle_real_irq:
if (irq & 8) { // slave
inb(PIC_SLAVE_IMR); /* DUMMY - (do we need this?) */
// mask 此 IRQ line
outb(cached_slave_mask, PIC_SLAVE_IMR);
// 寫入 0x60+(irq&7) 'Specific EOI' 操作 slave IRQ (irq&7)
outb(0x60+(irq&7), PIC_SLAVE_CMD); /* 'Specific EOI' to slave */
// 再寫入 0x60+PIC_CASCADE_IR 'Specific EOI' 操作 master IRQ2
outb(0x60+PIC_CASCADE_IR, PIC_MASTER_CMD); /* 'Specific EOI' to master-IRQ2 */
} else { // master
inb(PIC_MASTER_IMR); /* DUMMY - (do we need this?) */
outb(cached_master_mask, PIC_MASTER_IMR);
outb(0x60+irq,PIC_MASTER_CMD); /* 'Specific EOI to master */
}
spin_unlock_irqrestore(&i8259A_lock, flags);
return;
Chapter 4 Interrupts and Exceptions 77
spurious_8259A_irq:
/** this is the slow path - should happen rarely. */
if (i8259A_irq_real(irq))
/*
* oops, the IRQ _is_ in service according to the
* 8259A - not spurious, go handle it.
*/
goto handle_real_irq;

{
static int spurious_irq_mask;
/*
* At this point we can be sure the IRQ is spurious,
* lets ACK and report it. [once per IRQ]
*/

if (!(spurious_irq_mask & irqmask)) { // 判斷是否已經處理過此 spurous IRQ


printk(KERN_DEBUG "spurious 8259A interrupt: IRQ%d.\n", irq);
spurious_irq_mask |= irqmask;
}

atomic_inc(&irq_err_count); // 累加 irq_err_count
/*
* Theoretically we do not have to handle this IRQ,
* but in Linux this does not cause problems and is
* simpler for us.
*/
// 在 Linux 中 , 按照處理真實 IRQ 方式處理 spurous IRQ 不會有問題
goto handle_real_irq;
}
}
Chapter 4 Interrupts and Exceptions 78
/*
* This function assumes to be called rarely. Switching between
* 8259A registers is slow.
* This has to be protected by the irq controller spinlock
* before being called.
*/
static inline int i8259A_irq_real(unsigned int irq)
{
int value;
int irqmask = 1<<irq;

if (irq < 8) { // master


// default 為 IRR Reg, 因此寫入 OCW3 = 0x0B 以切換到 ISR Reg
outb(0x0B,PIC_MASTER_CMD); /* ISR register */
// 是否此中斷真的在被 CPU 處理
value = inb(PIC_MASTER_CMD) & irqmask;
outb(0x0A,PIC_MASTER_CMD); /* back to the IRR register */
return value;
}
// slave
outb(0x0B,PIC_SLAVE_CMD); /* ISR register */
value = inb(PIC_SLAVE_CMD) & (irqmask >> 8);
outb(0x0A,PIC_SLAVE_CMD); /* back to the IRR register */
return value;
}

Chapter 4 Interrupts and Exceptions 79


static void end_8259A_irq (unsigned int irq)
{
// 判斷 IRQ 是否被 disable 或 in-progress 中
if (!(irq_desc[irq].status & (IRQ_DISABLED|IRQ_INPROGRESS)) &&
irq_desc[irq].action)
enable_8259A_irq(irq);
}

Chapter 4 Interrupts and Exceptions 80


Interrupt Control
Interface

Chapter 4 Interrupts and Exceptions 81


Control Interfaces
 Purpose: to allow disabling the interrupt system
for current CPU or mask out an interrupt line fo
r entire machine
 Disable/enable interrupts locally for current pr
ocessor:
 local_irq_disable();
 local_irq_enable();
 local_irq_save(flags); // save and disable
 local_irq_restore(flags); // restore and enable

Chapter 4 Interrupts and Exceptions 82


Control Interfaces (2)
 Disable only a specific interrupt line for entire system
 disable_irq(unsigned int irq);
 Wait until any currently executing handler completes
 disable_irq_nosync(unsigned int irq);
 Will not wait
 enable_irq(unsigned int irq);
 If disable_irq() is called twice, only the 2nd enable_irq() will actually enabl
e the interrupt line
 synchronize_irq(unsigned int irq);
 Wait for a specific IH to exit, if executing, before returning
 Status checking
 irqs_disable()
 returns nonzero if interrupt system on local CPU is disabled, or 0 otherwise
 in_interrupt()
 return nonzero if kernel is in interrupt context (including in IH or BH)
 return zero if kernel is in process context
 in_irq()
 return nonzero if kernel is executing an interrupt handler

Chapter 4 Interrupts and Exceptions 83


disable_irq_nosync (1/2)
<LINUX SRC>/kernel/irq/manage.c
void disable_irq_nosync(unsigned int irq)
{
// get the IRQ descriptor we are going to
// disable
irq_desc_t *desc = irq_desc + irq;
unsigned long flags;
// acquire lock
spin_lock_irqsave(&desc->lock, flags);

Chapter 4 Interrupts and Exceptions 84


disable_irq_nosync (2/2)
// disable IRQ
if (!desc->depth++) {
desc->status |= IRQ_DISABLED;
desc->handler->disable(irq);
}
// release lock
spin_unlock_irqrestore(&desc->lock, flags);
}

Chapter 4 Interrupts and Exceptions 85


disable_irq
<LINUX SRC>/kernel/irq/manage.c
void disable_irq(unsigned int irq) {
// get the IRQ descriptor we are going to
// disable
irq_desc_t *desc = irq_desc + irq;
disable_irq_nosync(irq);
// let current IRQ handler to finish
if (desc->action)
synchronize_irq(irq);
}

Chapter 4 Interrupts and Exceptions 86


synchronize_irq
<LINUX SRC>/kernel/irq/manage.c
#ifdef CONFIG_SMP
void synchronize_irq(unsigned int irq) {
struct irq_desc *desc = irq_desc + irq;
while (desc->status & IRQ_INPROGRESS)
cpu_relax();
}

#ifndef CONFIG_SMP
# define synchronize_irq(irq) barrier()

Chapter 4 Interrupts and Exceptions 87


enable_irq (1/4)
<LINUX SRC>/kernel/irq/manage.c
void enable_irq(unsigned int irq)
{
// get the IRQ descriptor we are going to
// disable
irq_desc_t *desc = irq_desc + irq;
unsigned long flags;
// acquire lock
spin_lock_irqsave(&desc->lock, flags);

Chapter 4 Interrupts and Exceptions 88


enable_irq (2/4)
switch (desc->depth) {
// cannot enable IRQ when its depth = 0
case 0:
WARN_ON(1);
break;

Chapter 4 Interrupts and Exceptions 89


enable_irq (3/4)
case 1: {
// clear IRQ_DISABLED bit in desc->status
unsigned int status = desc->status &
~IRQ_DISABLED;
desc->status = status;
if ((status & (IRQ_PENDING | IRQ_REPLAY))
== IRQ_PENDING)
{
desc->status = status | IRQ_REPLAY;
hw_resend_irq(desc->handler,irq);
}

Chapter 4 Interrupts and Exceptions 90


enable_irq (4/4)
default:
desc->depth--;
}

// release lock
spin_unlock_irqrestore(&desc->lock, flags);
}

Chapter 4 Interrupts and Exceptions 91


hw_resend_irq
#ifdef CONFIG_X86_IO_APIC
static inline void hw_resend_irq(struct
hw_interrupt_type *h, unsigned int i)
{
if (IO_APIC_IRQ(i))
// write io_apic_vector into APIC
send_IPI_self(IO_APIC_VECTOR(i));
}
#ifndef CONFIG_X86_IO_APIC
static inline void hw_resend_irq(struct
hw_interrupt_type *h, unsigned int i) {}

Chapter 4 Interrupts and Exceptions 92


setup_irq (1/2)
int setup_irq(unsigned int irq, struct irqaction
* new)
{
struct irq_desc *desc = irq_desc + irq;
struct irqaction *old, **p;
int shared = 0;
...
p = &desc->action;
if ((old = *p) != NULL) {
...
shared = 1;
}

Chapter 4 Interrupts and Exceptions 93


setup_irq (2/2)
*p = new;
if (!shared) {
desc->depth = 0;
desc->status &= ~(IRQ_DISABLED |
IRQ_AUTODETECT |
IRQ_WAITING |
IRQ_INPROGRESS);
if (desc->handler->startup)
desc->handler->startup(irq);
else
desc->handler->enable(irq);
...
return 0;
}Chapter 4 Interrupts and Exceptions 94
Mask/Unmask IRQs
 local_irq_disable()
 #define local_irq_disable()
__asm__ __volatile__("cli": : :"memory")
 local_irq_enable()
 #define local_irq_enable()
__asm__ __volatile__("sti": : :"memory")
 RA: __volatile__

Chapter 4 Interrupts and Exceptions 95


Review Slide
 IH return value?
 IRQ_NONE, IRQ_HANDLED
 When IRQ line is shared, how an IH acks a reque
sted device?
 Interrupt context?
 Sleep? Stack?
 I/O IH processing steps?
 local_irq_disable(), local_irq_enable()?
 disable_irq()? disable_irq_nosync()? irqs_disable
()? in_interrupt()?
Chapter 4 Interrupts and Exceptions 96
Writing Interrupt Service
Routine

Chapter 4 Interrupts and Exceptions 97


Introduction
 A typical declaration of an ISR
static irqreturn_t intr_handler(int irq, void *dev_id, struct pt_regs *regs)
 irq: the IRQ line it is servicing
 dev_id: a generic pointer to the same dev_id given to request_i
rq()
 regs: processor registers prior to servicing the interrupt
 Return value
 IRQ_NONE: ISR detects an interrupt for which its device was no
t the originator
 IRQ_HANDLED: Otherwise
 At a minimum, most ISRs need to provide acks to the de
vice that they received the interrupt
 When a line is shared by multiple ISRs, kernel invokes s
equentially each registered handler
 A HW device should have a status register its ISR can check
Chapter 4 Interrupts and Exceptions 98
Example: RTC Interrupt Service Routine
 When RTC driver loads, rtc_init() is invoked to i
nitialize the driver
static int __init rtc_init(void)
{ …
if (request_irq(rtc_irq, rtc_interrupt, SA_INTERRUPT, "rtc",
(void *)&rtc_port)) {
printk(KERN_ERR "rtc: cannot register IRQ %d\n", rtc_irq);
return -EIO;
}…
}
 rtc_interrupt runs with all interrupts disabled
 rtc_irq = IRQ8 on PC
Chapter 4 Interrupts and Exceptions 99
irqreturn_t rtc_interrupt(int irq, void *dev_id, struct pt_regs *regs)
{
// Can be an alarm interrupt, update complete interrupt, or a periodic interrupt.
// We store the status in the low byte and the number of interrupts received since
// the last read in the remainder of rtc_irq_data.

spin_lock (&rtc_lock);
rtc_irq_data += 0x100;
rtc_irq_data &= ~0xff;

if (is_hpet_enabled()) {
rtc_irq_data |= (unsigned long)irq & 0xF0;
} else {
rtc_irq_data |= (CMOS_READ(RTC_INTR_FLAGS) & 0xF0);
}

if (rtc_status & RTC_TIMER_ON)


mod_timer(&rtc_irq_timer, jiffies + HZ/rtc_freq + 2*HZ/100);
spin_unlock (&rtc_lock);

spin_lock(&rtc_task_lock);
if (rtc_callback) rtc_callback->func(rtc_callback->private_data);
spin_unlock(&rtc_task_lock);
wake_up_interruptible(&rtc_wait);
kill_fasync (&rtc_async_queue, SIGIO, POLL_IN);

return IRQ_HANDLED;
}

Chapter 4 Interrupts and Exceptions 100


Interrupt Context
 When executing an interrupt handler or bottom
half, kernel is in interrupt context
 Interrupt context cannot sleep
 Process context can
 This limits the functions which one can call from an
interrupt handler
 Interrupt context does not receive its own stack
 It shares the kernel stack of the process it interrupts
 If no process is running, it uses idle task’s stack
 Code trace: keyboard ISR (IRQ1)
 Code trace: mouse ISR (IRQ12)
Chapter 4 Interrupts and Exceptions 101
Mouse & Keyboard Interr
upt Handler
魏淳航

Chapter 4 Interrupts and Exceptions 102


/proc/interrupt

Chapter 4 Interrupts and Exceptions 103


I8042
 PS/2 mouse and keyboard controller

 This microcontroller is hidden within the


motherboard’s chipset, which integrates many
microcontrollers in a single package.

Chapter 4 Interrupts and Exceptions 104


 4 8-bits registers
 Status(read), control(write), input(writ), output(read) register.
 use IO port 0x60, 0x64

8042 chip

SR(0x 64)

IR(0x 60)
CR(0x
64)
0R(0x
60)

Chapter 4 Interrupts and Exceptions 105


I8042 Architecture

Chapter 4 Interrupts and Exceptions 106


Initial Steps
1. Init i8042 driver
2. Set interface : Serio
3. Init mouse driver
4. Connect mouse to interface
5. Call request irq
6. Start mouse

Chapter 4 Interrupts and Exceptions 107


drivers\input\serio\i8042.c
int __init i8042_init(void){
…//
…//initial controller

i8042_aux_values.irq = I8042_AUX_IRQ;//12
i8042_kbd_values.irq = I8042_KBD_IRQ;//1
if (!i8042_noaux && !i8042_check_aux(&i8042_aux_values)) {
//check if aux is available
if (!i8042_nomux && !i8042_check_mux(&i8042_aux_values)){
//check if mux is avalilable
for (i = 0; i < 4; i++) {
i8042_init_mux_values(i8042_mux_values + i, i8042_mux_port + i, i);
i8042_port_register(i8042_mux_values + i, i8042_mux_port + i);
}
}else{
i8042_port_register(&i8042_aux_values, &i8042_aux_port);
}
}

i8042_port_register(&i8042_kbd_values, &i8042_kbd_port);
}

Chapter 4 Interrupts and Exceptions 108


Structure of SERIO
struct serio { static struct i8042_values i8042_aux_values = {
.irqen = I8042_CTR_AUXINT,//0x02
void *private; .disable = I8042_CTR_AUXDIS,//0x20
void *driver; .name = "AUX",
char *name; .mux = -1,
};
char *phys;
unsigned short idbus; static struct serio i8042_aux_port =
{
unsigned short idvendor;
.type =SERIO_8042,
unsigned short idproduct; .write = i8042_aux_write,
unsigned short idversion; .open = i8042_open,
.close = i8042_close,
unsigned long type; .driver = &i8042_aux_values,
unsigned long event; .name = "i8042 Aux Port",
int (*write)(struct serio *, unsigned char); .phys =I8042_AUX_PHYS_DESC,
}; //others are NULL
int (*open)(struct serio *);
void (*close)(struct serio *);
struct serio_dev *dev;
struct list_head node;
};
Chapter 4 Interrupts and Exceptions 109
static int __init i8042_port_register(struct i8042_values *values, struct serio *port)
{
values->exists = 1;

i8042_ctr &= ~values->disable;

if (i8042_command(&i8042_ctr, I8042_CMD_CTL_WCTR)) {
//enable mouse or keyboard
printk(KERN_WARNING "i8042.c: Can't write CTR while registering.\n");
values->exists = 0;
return -1;
}

printk(KERN_INFO "serio: i8042 %s port at %#lx,%#lx irq %d\n",


values->name,
(unsigned long) I8042_DATA_REG,
(unsigned long) I8042_COMMAND_REG,
values->irq);

serio_register_port(port);

return 0;
}

Chapter 4 Interrupts and Exceptions 110


Add to serio_list
void __serio_register_port(struct serio *serio)
{
list_add_tail(&serio->node, &serio_list);
serio_find_dev(serio);
}

static void serio_find_dev(struct serio *serio)


{
struct serio_dev *dev;

list_for_each_entry(dev, &serio_dev_list, node) {


if (serio->dev)
break;
if (dev->connect)
dev->connect(serio, dev);
}
}

Chapter 4 Interrupts and Exceptions 111


Initial Steps
1. Init i8042 driver
2. Set interface : Serio
3. Init mouse driver
4. Connect mouse to interface
5. Call request irq
6. Start mouse

Chapter 4 Interrupts and Exceptions 112


\drivers\input\mouse\Psmouse-base.c
static struct serio_dev psmouse_dev = {
int __init psmouse_init(void) .interrupt = psmouse_interrupt,
{ .connect = psmouse_connect,
psmouse_parse_proto(); .reconnect = psmouse_reconnect,
serio_register_device(&psmouse_dev); .disconnect = psmouse_disconnect,
return 0;
.cleanup = psmouse_cleanup,
}
};
void serio_register_device(struct serio_dev *dev)
{
struct serio *serio;
down(&serio_sem);
list_add_tail(&dev->node, &serio_dev_list);
list_for_each_entry(serio, &serio_list, node)
if (!serio->dev && dev->connect)
dev->connect(serio, dev);
up(&serio_sem);
}

Chapter 4 Interrupts and Exceptions 113


psmouse_connect()
static void psmouse_connect(struct serio *serio, struct serio_dev *dev)
{
...
if (serio->type!=SERIO_8042) //check if serio type is SERIO_8042
return;
if (serio_open(serio, dev)) { //request irq
kfree(psmouse);
serio->private = NULL;
return;
}
if (psmouse_probe(psmouse) < 0) { //Hand Shake
serio_close(serio); //get ack from mouse and device ID (0x00)
kfree(psmouse);
serio->private = NULL;
return;
}
psmouse->protocol_handler = psmouse_process_byte;//mouse event handler
psmouse_activate(psmouse); // reset counter of mouse and enables it
}

Chapter 4 Interrupts and Exceptions 114


serio_open( )-request irq
int serio_open(struct serio *serio, struct serio_dev *dev)
{
serio->dev = dev;
static struct serio i8042_aux_port =
if (serio->open && serio->open(serio)) {
{
serio->dev = NULL; .type =SERIO_8042,
return -1; .write = i8042_aux_write,
} .open = i8042_open,
return 0; .close = i8042_close,
} .driver = &i8042_aux_values,
.name = "i8042 Aux Port",
.phys =I8042_AUX_PHYS_DESC,
}; //others are NULL
static int i8042_open(struct serio *port){
struct i8042_values *values = port->driver;
if (request_irq(values->irq, i8042_interrupt,SA_SHIRQ, "i8042", i8042_request_irq_c
ookie)) {
goto irq_fail;
}
}

Chapter 4 Interrupts and Exceptions 115


Mouse Interrupt Handler
1. i8042_interrupt: get data and flags from
8042
2. psmouse_interrupt()
3. psmouse_process_byte():handle the pack
ets

Chapter 4 Interrupts and Exceptions 116


I8042_interrupt
static irqreturn_t i8042_interrupt(int irq, void *dev_id, struct pt_regs *regs){
unsigned int dfl;

spin_lock_irqsave(&i8042_lock, flags);
str = i8042_read_status(); If 8042 output buffer have data.
if (str & I8042_STR_OBF) Read it and save to “data”
data = i8042_read_data();
spin_unlock_irqrestore(&i8042_lock, flags);

dfl = ((str & I8042_STR_PARITY) ? SERIO_PARITY : 0) |


set flag from 8042
((str & I8042_STR_TIMEOUT) ? SERIO_TIMEOUT : 0);

…(next page)

Chapter 4 Interrupts and Exceptions 117


I8042_interrupt

if (i8042_aux_values.exists && (str & I8042_STR_AUXDATA)) {


serio_interrupt(&i8042_aux_port, data, dfl, regs);
goto irq_ret;
}
Check status reg, if data is AUX type
Then we can call mouse interrupt
if (!i8042_kbd_values.exists)
else :we call keyboard interrupt
goto irq_ret;
serio_interrupt(&i8042_kbd_port, data, dfl, regs);

irq_ret:
ret = 1;

Chapter 4 Interrupts and Exceptions 118


I8042_interrupt

rqreturn_t serio_interrupt(struct serio *serio,


unsigned char data, unsigned int flags, struct pt_regs *regs)
{

if (serio->dev && serio->dev->interrupt)


ret = serio->dev->interrupt(serio, data, flags, regs);

return ret;
} static struct serio_dev psmouse_dev = {
.interrupt = psmouse_interrupt,
.connect = psmouse_connect,
.reconnect = psmouse_reconnect,
.disconnect = psmouse_disconnect,
.cleanup = psmouse_cleanup,
};
Chapter 4 Interrupts and Exceptions 119
Mouse Data Packets
The standard PS/2 mouse sends movement (and button) information to
the host using the following 3-byte packet (4)

Byte2(3) is the amount of movement that has occurred since


the last movement data packet was sent to the host.

Chapter 4 Interrupts and Exceptions 120


psmouse_interrupt
static irqreturn_t psmouse_interrupt(struct serio *serio,
unsigned char data, unsigned int flags, struct pt_regs *regs)
{
//check flags
//check mouse state

if (psmouse->state == PSMOUSE_ACTIVATED &&
psmouse->pktcnt && time_after(jiffies, psmouse->last + HZ/2)) {
printk(KERN_WARNING "psmouse.c: %s at %s lost synchronization, throwing %d bytes
away.\n",psmouse->name, psmouse->phys, psmouse->pktcnt);
psmouse->pktcnt = 0;
}
psmouse->last = jiffies;
psmouse->packet[psmouse->pktcnt++] = data;
rc = psmouse->protocol_handler(psmouse, regs);

return IRQ_HANDLED;
}

Chapter 4 Interrupts and Exceptions 121


psmouse_process_byte()
static psmouse_ret_t psmouse_process_byte(struct psmouse *psmouse, struct
pt_regs *regs)
{
struct input_dev *dev = &psmouse->dev;
unsigned char *packet = psmouse->packet;

if (psmouse->pktcnt < 3 + (psmouse->type >= PSMOUSE_GENPS))


return PSMOUSE_GOOD_DATA;

Chapter 4 Interrupts and Exceptions 122


psmouse_process_byte()
input_report_key(dev, BTN_LEFT, packet[0] & 1);
input_report_key(dev, BTN_MIDDLE, (packet[0] >> 2) & 1);
input_report_key(dev, BTN_RIGHT, (packet[0] >> 1) & 1);

input_report_rel(dev, REL_X, packet[1] ?


(int) packet[1] - (int) ((packet[0] << 4) & 0x100) : 0);
input_report_rel(dev, REL_Y, packet[2] ?
(int) ((packet[0] << 3) & 0x100) - (int) packet[2] : 0);

return PSMOUSE_FULL_PACKET;
}

Chapter 4 Interrupts and Exceptions 123


static inline void input_report_key(struct input_dev *dev,
unsigned int code, int value)
{
input_event(dev, EV_KEY, code, !!value);
}

static inline void input_report_rel(struct input_dev *dev, u


nsigned int code, int value)
{
input_event(dev, EV_REL, code, value);
}
choose a handler from
dev->h_list to handle the event
Chapter 4 Interrupts and Exceptions 124
Linux kernel - 2.6.14
1. Init i8042 driver
2. Set interface : Serio
3. Init keyboard driver
4. Connect keyboard to interface
5. Call request irq
6. Start keyboard interrupt

Chapter 4 Interrupts and Exceptions 125


\drivers\input\keyboard\Atkbd.c
static struct serio_dev psmouse_dev = {
int __init atkbd_init(void) .interrupt = psmouse_interrupt,
{ .connect = psmouse_connect,
serio_register_device(&atkbd_dev); .reconnect = psmouse_reconnect,
return 0; .disconnect = psmouse_disconnect,
}
.cleanup = psmouse_cleanup,
};
void serio_register_device(struct serio_dev *dev)
{
struct serio *serio;
down(&serio_sem);
list_add_tail(&dev->node, &serio_dev_list);
list_for_each_entry(serio, &serio_list, node)
if (!serio->dev && dev->connect)
dev->connect(serio, dev);
up(&serio_sem);
}

Chapter 4 Interrupts and Exceptions 126


Start Keyboard Interrupt
static irqreturn_t psmouse_interrupt(struct serio *serio,
unsigned char data, unsigned int flags, struct pt_regs *regs)
{
//check flags
//check keyboard state

unsigned int code = data;
……
value = atkbd->release ? 0 :(1 + (!atkbd_softrepeat && test_bit(atkbd->keyco
de[code], atkbd->dev.key)));
……
atkbd_report_key(&atkbd->dev, regs, atkbd->keycode[code], value);
}

Chapter 4 Interrupts and Exceptions 127


static void atkbd_report_key(struct input_dev *dev, struct pt_regs *re
gs, int code, int value)
{
…..
input_event(dev, EV_KEY, code, value);
……
}

Chapter 4 Interrupts and Exceptions 128


Bottom Half and
Deferring Work

Chapter 4 Interrupts and Exceptions 129


Why Bottom Half?
 IH (top halves) have following properties (requirements)
 IH (top half) need to run as quickly as possible
 IH runs with some (or all) interrupt levels disabled
 IH are often time-critical and they deal with HW
 IH do not run in process context and cannot block

 No hard and fast rules exist about what work to perform where
 Research work needed

 Bottom halves are to defer work later


 “Later” is often simply “not now”
 Often, bottom halves run immediately after interrupt returns
 They run with all interrupts enabled

Chapter 4 Interrupts and Exceptions 130


A World of Bottom Halves
 Multiple mechanisms are available for implementing a bottom half
 softirq, tasklet, work queues

 softirq: (available since 2.3)


 A set of 32 statically defined bottom halves that can run simultaneousl
y on any processor
 Even 2 of the same type can run concurrently
 Used when performance is critical
 Must be registered statically at compile-time

 tasklet: (available since 2.3)


 Are built on top of softirqs
 Two different tasklets can run simultaneously on different processors
 But 2 of the same type cannot run simultaneously
 Used most of the time for its ease and flexibility
 Code can dynamically register tasklets

 work queues: (available since 2.5)


 Queueing work to later be performed in process context
Chapter 4 Interrupts and Exceptions 131
Softirqs
 Softirqs are rarely used
 tasklets are used more of the time
 Statically allocated at compile-time
 Related code: kernel/softirq.c
struct softirq_action
{
void (*action)(struct softirq_action *); // function to run
void *data; // data to pass to function
};
static struct softirq_action softirq_vec[32];
 In 2.6.7 kernel, only 6 softirqs are used
enum
{
HI_SOFTIRQ=0, TIMER_SOFTIRQ, [code trace]
NET_TX_SOFTIRQ, NET_RX_SOFTIRQ,
SCSI_SOFTIRQ, TASKLET_SOFTIRQ
};

Chapter 4 Interrupts and Exceptions 132


The Softirq Handler
 The prototype of a softirq handler:
 void softirq_handler(struct softirq_action *)
 Example:
 my_softirq = softirq_vec[0];
 my_softirq->action(my_softirq);
 Passing the whole structure will make future change
of softirq_action invincible to every softirq handler
 A softirq never preempts another softirq
 It can only be preempted by an interrupt handler
 Another softirq (even the same type) can run simulta
neously on another processor
Chapter 4 Interrupts and Exceptions 133
Executing Softirqs
 A softirq must be raised before it is executed
 At a suitable later time, pending softirqs runs
 Pending softirqs are checked for and executed i
n the following places:
 After processing a HW interrupt
 By the ksoftirqd kernel thread
 By code that explicitly checks and executes pending
softirqs (e.g. networking subsystem)
 They all call do_softirq() to execute softirqs

Chapter 4 Interrupts and Exceptions 134


Saving Registers for Exception Handler
IRQn_interrupt: struct pt_regs {
ebx
pushl $n-256 long ebx;
ecx long ecx;
jmp common_interrupt edx long edx;
long esi;
esi
common_interrupt: long edi;
edi long ebp;
SAVE_ALL
ebp long eax;
call do_IRQ eax int xds;
int xes;
jmp $ret_from_intr xds
xes long orig_eax;
push
pushl
movl
cld %edx,
%es
%ds
%eax
%ebp
%edi
%esi
%edx
%ecx
%ebx %es
%ds
$__KERNEL_DS, %edx orig_eax
ESP long eip;
eip int xcs;
xcs long eflags;
eflags long esp;
int xss;
esp
};
Chapter 4 Interrupts and Exceptions xss 135
asmlinkage unsigned int do_IRQ(struct pt_regs regs) asmlinkage int handle_IRQ_event(unsigned int irq,
{ struct pt_regs *regs, struct irqaction *action)
int irq = regs.orig_eax & 0xff;
/* high bits used in ret_from_ code */
{
irq_desc_t *desc = irq_desc + irq; int status = 1;
struct irqaction * action; /* Force the "do bottom halves" bit */
unsigned int status; int retval = 0;
irq_enter();

kstat_this_cpu.irqs[irq]++;
if (!(action->flags & SA_INTERRUPT))
spin_lock(&desc->lock); local_irq_enable();
desc->handler->ack(irq);
status = desc->status & ~(IRQ_REPLAY | IRQ_WAITING); do {
status |= IRQ_PENDING; /* we _want_ to handle it */ status |= action->flags;
for (;;) {
retval |= action->handler(irq,
irqreturn_t action_ret; action->dev_id, regs);
spin_unlock(&desc->lock); action = action->next;
… } while (action);
action_ret = handle_IRQ_event(irq, &regs, action);

spin_lock(&desc->lock);
if (status & SA_SAMPLE_RANDOM)
desc->status &= ~IRQ_PENDING; add_interrupt_randomness(irq);
} local_irq_disable();
desc->status &= ~IRQ_INPROGRESS; return retval;
}
out:
desc->handler->end(irq);
spin_unlock(&desc->lock);
irq_exit();
return 1;
}

Chapter 4 Interrupts and Exceptions 136


#define irq_exit() \ asmlinkage void do_softirq(void)
do { \ {
preempt_count() -= IRQ_EXIT_OFFSET; \ unsigned long flags;
struct thread_info *curctx;
if (!in_interrupt() && softirq_pending(smp_
processor_id())) \ union irq_ctx *irqctx;
u32 *isp;
do_softirq(); \
preempt_enable_no_resched(); \ if (in_interrupt()) return;
} while (0) local_irq_save(flags);
if (local_softirq_pending()) {
static inline int netif_rx_ni(struct sk_buff *skb) curctx = current_thread_info();
{ irqctx = softirq_ctx[smp_processor_id()];
int err = netif_rx(skb); irqctx->tinfo.task = curctx->task;
if (softirq_pending(smp_processor_id())) irqctx->tinfo.previous_esp =
current_stack_pointer();
do_softirq();
return err; /* build the stack frame on the softirq stack */
} isp = (u32*) ((char*)irqctx + sizeof(*irqctx))
;
static int ksoftirqd(void * __bind_cpu) asm volatile(
" xchgl %%ebx,%%esp \n"
{
" call __do_softirq \n"
current->flags |= PF_NOFREEZE; " movl %%ebx,%%esp \n"
set_current_state(TASK_INTERRUPTIBLE); : "=b"(isp)
: "0"(isp)
…. do_softirq(); : "memory", "cc", "edx", "ecx", "eax"
} );
__set_current_state(TASK_RUNNING); }
return 0; … local_irq_restore(flags);
}
}

Chapter 4 Interrupts and Exceptions 137


do_softirq()
游家慶

Chapter 4 Interrupts and Exceptions 138


do_softirq()
 Finish the jobs deferred to bottom halve
s in ISR
1. Get pending list from current CPU’s irq_
stat[cpu].member
2. Invoke __do_softirq() if there are some
pending jobs
3. Restore local irq and leave do_softirq()

Chapter 4 Interrupts and Exceptions 139


__do_softirq() (1/2)
 Finish the jobs deferred to bottom halves
in ISRs
1. Get pending list from current CPU’s irq_st
at[cpu].member
2. Disable bottom half
3. Clear irq_stat[cpu].member
4. Enable irq
5. Carry out pending jobs until all jobs are d
one
Chapter 4 Interrupts and Exceptions 140
__do_softirq() (2/2)
6. Disable irq
7. Get pending list from current CPU’s irq_
stat[cpu].member
(step 3 to 7 could be carried out for up to 10, set in MAX_S
OFTIRQ_RESTART, times as necessary)
8. Defer the remaining pending jobs if kern
el thread should stop, invoke another do
_softirq() otherwise.

Chapter 4 Interrupts and Exceptions 141


When to invoke do_softirq()?
 Local_bh_enable macro re-enable the so
ftirqs
 do_IRQ() finishes handling an I/O interru
pt
 smp_apic_timer_interrupt() finishes han
dling a local timer interrupt
 One of the special ksoftirqd_CPUn kerne
l threads is awoken
 A packet is received on a network card
Chapter 4 Interrupts and Exceptions 142
Using Softirqs
 Currently, only networking and SCSI subsystems directly
use softirqs
 Kernel timers and tasklets are built on top of softirqs
 Index assignment:
 Before using softirqs, you must declare its index at compile tim
e via an enum in slide-64
 Softirqs with lower numerical priority execute first
 Register handler:
 Softirq handler is registered at run-time via open_softirq()
void open_softirq(int nr, void (*action)(struct softirq_action*), void *data)
{
softirq_vec[nr].data = data;
softirq_vec[nr].action = action;
}

Chapter 4 Interrupts and Exceptions 143


Using Softirq (2/2)
 Sofirqs run with interrupt enabled and cannot sleep
 When a handler runs, softirqs on current processor are
disabled
 Another CPU can execute softirqs
 Need proper locking in softirqs
 As a result, most softirq handlers resort to per-processor data
 Raising softirq
 Call: raise_softirq(NEX_TX_SOFTIRQ), for example
 Softirqs are often raised from within interrupt handlers
 When done processing interrupts, kernel invokes do_softirq()

Chapter 4 Interrupts and Exceptions 144


Review Slide
 Why bottom halves?
 BH available mechanism?
 softirqs, tasklets, work queues
 2.6.7, # of used softirqs?
 When and where are pending softirqs checked and exec
uted?
 do_softirq()? open_softirq()? raise_softirq()?
 HW#5: Study the usage of preempt_count()
 Deadline: 03/27 (mail your report to TA)
 No class on 03/27
 In-class presentation on 04/10 by 林凱立
 Sample solution
Chapter 4 Interrupts and Exceptions 145
Tasklets Usage

Chapter 4 Interrupts and Exceptions 146


Tasklet Implementation
 Tasklets are implemented on top of softirqs
 HI_SOFTIRQ, TASKLET_SOFTIRQ
 The former runs prior to the latter

struct tasklet_struct
{
struct tasklet_struct *next; // next tasklet in the list
unsigned long state; // state of the tasklet
atomic_t count; // reference counter: 0 == enabled, !0 = disabled
void (*func)(unsigned long); // handler function
unsigned long data; // args to handler function
};

enum
{
TASKLET_STATE_SCHED, /* Tasklet is scheduled for execution */
TASKLET_STATE_RUN /* Tasklet is running (SMP only) */
};

#define DECLARE_TASKLET(name, func, data) \


struct tasklet_struct name = { NULL, 0, ATOMIC_INIT(0), func, data }

#define DECLARE_TASKLET_DISABLED(name, func, data) \


struct tasklet_struct name = { NULL, 0, ATOMIC_INIT(1), func, data }

Chapter 4 Interrupts and Exceptions 147


Scheduling Tasklets
 Scheduled tasklets (or raised softirqs) are stored in 2 per-processor structures
 tasklet_vec (regular tasklets)
 tasklet_hi_vec (high-priority tasklets)

 Tasklets are scheduled via tasklet_schedule() and tasklet_hi_schedule()

static inline void tasklet_schedule(struct tasklet_struct *t)


{
if (!test_and_set_bit(TASKLET_STATE_SCHED, &t->state))
__tasklet_schedule(t);
}

void fastcall __tasklet_schedule(struct tasklet_struct *t)


{
unsigned long flags;

local_irq_save(flags);
t->next = __get_cpu_var(tasklet_vec).list;
__get_cpu_var(tasklet_vec).list = t;
raise_softirq_irqoff(TASKLET_SOFTIRQ);
local_irq_restore(flags);
}

Chapter 4 Interrupts and Exceptions 148


while (list) {

Execute Tasklets struct tasklet_struct *t = list;


list = list->next;
if (tasklet_trylock(t)) {
if (!atomic_read(&t->count)) {
void __init softirq_init(void)
if (!test_and_clear_bit(TASKLET_STATE_SCHED
{ , &t->state))
open_softirq(TASKLET_SOFTIRQ, BUG();
tasklet_action, NULL); t->func(t->data);
tasklet_unlock(t);
open_softirq(HI_SOFTIRQ, continue;
}
tasklet_hi_action, NULL);
tasklet_unlock(t);
} }

static void tasklet_action(struct softirq_action *a) local_irq_disable();


{ t->next = __get_cpu_var(tasklet_vec).list;
struct tasklet_struct *list; __get_cpu_var(tasklet_vec).list = t;
__raise_softirq_irqoff(TASKLET_SOFTIRQ);
local_irq_disable(); local_irq_enable();
list = __get_cpu_var(tasklet_vec).list; }
__get_cpu_var(tasklet_vec).list = NULL; }
local_irq_enable();

Chapter 4 Interrupts and Exceptions 149


softireq & tasklets Concurrency
 Two of the same tasklets never run concurrentl
y

#ifdef CONFIG_SMP
static inline int tasklet_trylock(struct tasklet_struct *t)
{
return !test_and_set_bit(TASKLET_STATE_RUN,
&(t)->state);
}
#else
#define tasklet_trylock(t) 1
#endif

Chapter 4 Interrupts and Exceptions 150


Using Tasklet
 A tasklet can be declared statically or dynamically
 DECLARE_TASKLET(name, func, data)
 DECLARE_TASKLET_DISABLED(name, func, data)
 Writing tasklet handler
 void tasklet_handler(unsigned long data) for example
 A tasklet handler cannot sleep
 It runs with all interrupts enabled
 Two of the same tasklets never run concurrently
 If the same tasklet is scheduled again before it actually runs 
it still runs only once
 Disable / Kill a tasklet
 tasklet_disable()
 tasklet_disable_nosync()
 tasklet_kill()

Chapter 4 Interrupts and Exceptions 151


ksoftirqd
 Most commonly, kernel processes softirqs on return fro
m handling an interrupt
 In interrupt context
 However, softirqs may be raised at very high rates
 Sometimes, they reactivate themselves
 It may lead to starvation of user programs
 Kernel solution
 When softirqs grow excessively, kernel wakes up a family of ker
nel threads
 They runs at lowest possible priority
 One thread per processor, named ksoftirqd/n
 static int ksoftirqd(void * __bind_cpu) [code]

Chapter 4 Interrupts and Exceptions 152


Work Queues

Chapter 4 Interrupts and Exceptions 153


Introduction
 Work queues defer work into a kernel thread
 Runs in process context
 Schedulable and can sleep
 These threads are called worker threads
 Default worker threads are called events/n
 n is the processor number
 Unless there is a need to create its own thread, most drivers de
fer work to default worker thread

struct workqueue_struct {
struct cpu_workqueue_struct cpu_wq[NR_CPUS];
const char *name;
struct list_head list; /* Empty if single thread */
};

Chapter 4 Interrupts and Exceptions 154


More Data Structure
struct cpu_workqueue_struct {

spinlock_t lock;

long remove_sequence; /* Least-recently added (next to run) */


long insert_sequence; /* Next to add */

struct list_head worklist;


wait_queue_head_t more_work;
wait_queue_head_t work_done;

struct workqueue_struct *wq;


task_t *thread;

int run_depth; /* Detect run_workqueue() recursion depth */


} ____cacheline_aligned;

Chapter 4 Interrupts and Exceptions 155


#define create_workqueue(name) __create_wor static struct task_struct *create_workqueue_thr
kqueue((name), 0) ead(struct workqueue_struct *wq, int cpu)
{
struct workqueue_struct *__create_workqueue struct cpu_workqueue_struct
(const char *name, *cwq = wq->cpu_wq + cpu;
int singlethread) struct task_struct *p;
{
int cpu, destroy = 0; spin_lock_init(&cwq->lock);
struct workqueue_struct *wq; cwq->wq = wq;
struct task_struct *p; cwq->thread = NULL;
cwq->insert_sequence = 0;
wq = kmalloc(sizeof(*wq), GFP_KERNEL); cwq->remove_sequence = 0;
if (!wq) return NULL; INIT_LIST_HEAD(&cwq->worklist);
memset(wq, 0, sizeof(*wq)); init_waitqueue_head(&cwq->more_work);
init_waitqueue_head(&cwq->work_done);
wq->name = name;
lock_cpu_hotplug(); if (is_single_threaded(wq))
if (singlethread) { p = kthread_create(worker_thread, cw
… q, "%s", wq->name);
else
} else {
p = kthread_create(worker_thread, cw
spin_lock(&workqueue_lock); q, "%s/%d", wq->name, cpu);
list_add(&wq->list, &workqueues); if (IS_ERR(p))
spin_unlock(&workqueue_lock); return NULL;
for_each_online_cpu(cpu) { cwq->thread = p;
p = create_workqueue_thread(wq, cpu); return p;
…. }
}

Chapter 4 Interrupts and Exceptions 156


static int worker_thread(void *__cwq) set_current_state(TASK_INTERRUPTIBLE);
{ while (!kthread_should_stop()) {
struct cpu_workqueue_struct *cwq = __cwq add_wait_queue(&cwq->more_work,
; &wait);
DECLARE_WAITQUEUE(wait, current); if (list_empty(&cwq->worklist))
struct k_sigaction sa; schedule();
sigset_t blocked; else
__set_current_state(TAS
current->flags |= PF_NOFREEZE; K_RUNNING);
remove_wait_queue(&cwq->more_w
set_user_nice(current, -10); ork, &wait);

if (!list_empty(&cwq->worklist))
/* Block and flush all signals */
sigfillset(&blocked); run_workqueue(cwq);
sigprocmask(SIG_BLOCK, &blocked, NULL); set_current_state(TASK_INTERRUPTI
BLE);
flush_signals(current); }
__set_current_state(TASK_RUNNING);
/* SIG_IGN makes children autoreap: see do return 0;
_notify_parent(). */
}
sa.sa.sa_handler = SIG_IGN;
sa.sa.sa_flags = 0;
siginitset(&sa.sa.sa_mask, sigmask(SIGCHL
D));
do_sigaction(SIGCHLD, &sa, (struct k_sigact
ion *)0);

Chapter 4 Interrupts and Exceptions 157


Wait Queues
 Wait queues have several uses in kernel
 especially for interrupt handling, process synchronization, and
timing
 A process wishing to wait for a specific event places its
elf in the proper wait queue and relinquishes control
 Each wait queue is identified by a wait queue head (wai
t_queue_head_t)
 Wait queues are modified by interrupt handlers and major kern
el functions
 Protected by spinlock
 Each element is of type wait_queue_t
 Each entry represents a sleeping process
 Exclusive processes: selectively woken up
 Nonexclusive processes: always woken up
Chapter 4 Interrupts and Exceptions 158
Data Structures
struct __wait_queue_head {
spinlock_t lock;
struct list_head task_list;
};
typedef struct __wait_queue_head wait_queue_head_t;

struct __wait_queue {
unsigned int flags;
#define WQ_FLAG_EXCLUSIVE 0x01
struct task_struct * task;
wait_queue_func_t func;
struct list_head task_list;
};

Chapter 4 Interrupts and Exceptions 159


worker_thread()
 set_current_state(TASK_INTERRUPTIBLE);
 mark it sleeping
 add_wait_queue(&cwq->more_work, &wait);
 adds this thread into a wait queue
 if (list_empty(&cwq->worklist)) schedule()
 do a context switch and sleep
 else __set_current_state(TASK_RUNNING);
 Thread does not go to sleep
 remove_wait_queue(&cwq->more_work, &wait);
 dequeue itself from the wait queue
 if (!list_empty(&cwq->worklist)) run_workqueue(cwq);
 perform deferred work

Chapter 4 Interrupts and Exceptions 160


Work Item
struct work_struct {
unsigned long pending; // is this work pending?
struct list_head entry; // link list of all work
void (*func)(void *); // handler function
void *data; // argument to handler
void *wq_data; // used internally
struct timer_list timer; // timer used by delay work queues
};

Chapter 4 Interrupts and Exceptions 161


run_workqueue()
 while (!list_empty(&cwq->worklist)) {
 Check out if worklist is empty, if not
 struct work_struct *work = list_entry(cwq->worklist.nex
t, struct work_struct, entry);
 Obtain one work item
 void (*f) (void *) = work->func;
 Obtain handler function
 void *data = work->data;
 Obtain argument to this handler function
 list_del_init(cwq->worklist.next);
 Remove the work item
 f(data);
 Execute handler function
Chapter 4 Interrupts and Exceptions 162
Using Work Queues
 Create work to defer
 DECLARE_WORK(xyz, void (*abc)(void *), void *def);
 It statically creates a work_struct structure named x
yz, with handler abc and data def
 Write work queue handler
 void work_handler(void *data) for example
 It runs at process context
 Schedule work
 On default event queue: schedule_work(&work);
 schedule_delayed_work(&work, delay);

Chapter 4 Interrupts and Exceptions 163


static struct workqueue_struct *keventd_wq; /* We queue the work to the CPU it was submitted
int fastcall schedule_work(struct work_struct *wor , but there is no guarantee that it will be proc
k)
essed by that CPU. */
{
return queue_work(keventd_wq, work); int fastcall queue_work(struct workqueue_struct *
} wq, struct work_struct *work)
int fastcall schedule_delayed_work(struct work_st {
ruct *work, unsigned long delay)
{
int ret = 0, cpu = get_cpu();
return queue_delayed_work(keventd_wq, wor if (!test_and_set_bit(0, &work->pending)) {
k, delay); if (unlikely(is_single_threaded(wq)))
}
int fastcall queue_delayed_work(struct workqueu
cpu = 0;
e_struct *wq, struct work_struct *work, unsig BUG_ON(!list_empty(&work->entry));
ned long delay)
__queue_work(wq->cpu_wq + cpu, work);
{
int ret = 0; ret = 1;
struct timer_list *timer = &work->timer; }
if (!test_and_set_bit(0, &work->pending)) { put_cpu(); return ret;
work->wq_data = wq;
timer->expires = jiffies + delay;
}
timer->data = (unsigned long)work; void init_workqueues(void)
timer->function = delayed_work_timer_fn; {
add_timer(timer); hotcpu_notifier(workqueue_cpu_callback, 0);
ret = 1;
} keventd_wq = create_workqueue("events");
return ret; BUG_ON(!keventd_wq);
} }

Chapter 4 Interrupts and Exceptions 164


int default_wake_function(wait_queue_t *curr, unsigne static void __wake_up_common(wait_queue_he
d mode, int sync, void *key) ad_t *q, unsigned int mode, int nr_exclusiv
{ e, int sync, void *key)
task_t *p = curr->task; {
return try_to_wake_up(p, mode, sync); struct list_head *tmp, *next;
} list_for_each_safe(tmp, next, &q->task_lis
t) {
#define wake_up(x) __wake_up(x, TASK_UNINTERRUPTI wait_queue_t *curr;
BLE | TASK_INTERRUPTIBLE, 1, NULL) unsigned flags;
curr = list_entry(tmp, wait_queue_t,
void fastcall __wake_up(wait_queue_head_t *q, unsigne task_list);
d int mode, int nr_exclusive, void *key) flags = curr->flags;
{ if (curr->func(curr, mode, sync, key)
unsigned long flags; && (flags & WQ_FLAG_EXCLUSIVE) &&
!--nr_exclusive)
spin_lock_irqsave(&q->lock, flags); break;
__wake_up_common(q, mode, nr_exclusive, 0, key) }
;
spin_unlock_irqrestore(&q->lock, flags); }
}
#define list_entry(ptr, type, member) \
#define list_for_each_safe(pos, n, head) \ container_of(ptr, type, member)
for (pos = (head)->next, n = pos->next; pos != (hea
d); \ #define container_of(ptr, type, member) ({\
pos = n, n = pos->next) const typeof( ((type *)0)->member ) *__mp
tr = (ptr); \
(type *)( (char *)__mptr - offsetof(type,me
RA: try_to_wake_up() [TASK_INTERRUPTIBLE o mber) );})
r TASK_UNINTERRUPTIBLE, 1 or 0 or nr]

Chapter 4 Interrupts and Exceptions 165


Summary
 Choices for bottom halves
 softirqs, tasklets, work queues
 Softirqs provide least serialization
 Only used when scalability is a concern
 Tasklets are used if code is not finely
threaded
 Work queues process work items in
process context
 Easiest to use

Chapter 4 Interrupts and Exceptions 166


Disabling Bottom Halves
 local_bh_disable()
 To disable all bottom halves (softirqs and tas
klets)
 local_bh_enable()
 To enable bottom halves
 If nested, only the last call enables

Chapter 4 Interrupts and Exceptions 167


local_bh_disable()
 local_bh_disable() disables all bottom halves, excep
t workqueue on local CPU
 Disable local bottom halves by incrementing preemp
t_count
 local_bh_enable() enables local bottom halves
 by decreasing preempt_count
 check if any pending softirq

 #define local_bh_disable() \
do { preempt_count() += SOFTIRQ_OFFSET; \ barrier(); } while (0)

Chapter 4 Interrupts and Exceptions 168


local_bh_enable()
 local_bh_enable() enables local bottom halves
 by decreasing preempt_count,
 and optionally run any pending bottom halves

 void local_bh_enable(void)
{
__local_bh_enable();
if (unlikely(!in_interrupt() && local_softirq_pending()))
invoke_softirq();
}

Chapter 4 Interrupts and Exceptions 169


Usage of preempt_count
 Preemption markers preempt_disable and pr
eempt_enable operate on a defined int. pre
empt_count, stored in each threadinfo
 bits 8-15 are softirq count
 max # of softirqs: 256
 OFFSET
 SOFTIRQ_OFFSET : 0x00000100
 SOFTIRQ_MASK : 0x0000ff00

Chapter 4 Interrupts and Exceptions 170


irq_exit()
 #define irq_exit() do { \
preempt_count() -= IRQ_EXIT_OFFSET; \
if (!in_interrupt() & softirq_pending(smp_processor_id())) \
do_softirq(); \
} while (0)

 in_interrupt() examines preempt_count to check if i


t is in softirq context

 local_bh_disable is mostly used in driver


asmlinkage void __do_softirq(void)
{
pending = local_softirq_pending();
local_bh_disable();

/* handle softirq MAX_SOFTIRQ_RESTART times */

__local_bh_enable();
}
Chapter 4 Interrupts and Exceptions 171
Review Slide
 tasklet IRQ?
 DECLARE_TASKLE? DECLARE_TASKLET_DISABLED?
 tasklet_action()?
 ksoftirqd()?
 Work queue usage?
 workqueue_struct? cpu_workqueue_struct? work_struct?
 worker_thread()? run_workqueue()?
 schedule_work()?
 MP1: Provide timer & keyboard ISRs for eos_x86 operati
ng system

Chapter 4 Interrupts and Exceptions 172


Return from Interrupts a
nd Exceptions
朱宗賢

Chapter 4 Interrupts and Exceptions 173


Introduction
 The following things must be handled before
terminating an interrupt or exception handler
 # of kernel control paths being concurrently
executed
 If there is just one, CPU switches back to user mode
 Pending process switch requests
 If TIF_NEED_RESCHED is set, call schedule()
 Pending signals
 If a signal is sent to current process, it must be
handled

Chapter 4 Interrupts and Exceptions 174


Related Terminating Functions
 ret_from_exception()
 Terminates all exceptions except 0x80 ones
 ret_from_intr()
 Terminate interrupt handlers
 ret_from_sys_call()
 Terminates system calls (0x80 programmed exceptio
n)
 ret_from_fork()
 Terminates fork(), vfork(), or clone() system calls

Chapter 4 Interrupts and Exceptions 175


Return from Interrupts and Exceptions
ret_from_exception: ret_from_fork:

ret_from_intr:
schedule_tail()
ret_from_sys_call:
Nested Kernel no System yes
control paths? call tracing? tracesys_exit:
yes no
syscall_trace()

Virtual no
reschedule:
v86 mode? Need yes
schedule()
yes reschedule?

Pending yes Virtual yes v86_signal_return


signals? signal_return v86 mode?
:
no no
save_v86_state()
restore_all: do_signal()

Restore
Chapterhardware context
4 Interrupts and Exceptions 176
Returning from Interrupt
 Return from an interrupt path is
much more complicated than the
entry path
 It is a good place to do other tasks,
unrelated to the interrupt, but need
to done fairly frequently
 These include checking for pending
signals or if a reschedule is needed
Chapter 4 Interrupts and Exceptions 177
General Implementation Issue
 Number of kernel control paths being
concurrenly executed
 Pending process switch requests
 Pending signals

Chapter 4 Interrupts and Exceptions 178


Exiting from Interrupt Handling

Chapter 4 Interrupts and Exceptions 179


Return from System Call
 Disable interrupt first. It means that the t
ests follow are guaranteed to be atomic
 Check pending work-to-be-done flags in t
hread information
 syscall trace active
 resumption notification requested
 signal pending
 rescheduling necessary

Chapter 4 Interrupts and Exceptions 180


Returning form Exception and Interrupts

 We have to determine whether the CPU


was already running in kernel mode
before the interrupt or not
 Kernel mode/ user mode / vm86 mode
 If so, we are dealing with a nested
interrupt and want to terminate the
processing of it as quickly as possible

Chapter 4 Interrupts and Exceptions 181


// entry.s //entry.S
# system call handler stub ret_from_exception:
ENTRY(system_call) preempt_stop

syscall_call:
ret_from_intr:
GET_THREAD_INFO(%ebp)
call *sys_call_table(,%eax,4)
movl EFLAGS(%esp), %eax # mix EFLAGS and CS
movl %eax,EAX(%esp) # store the retur
n value movb CS(%esp), %al
testl $(VM_MASK | 3), %eax
syscall_exit: jz resume_kernel # returning to
cli # make sure we don't miss an interrupt
# setting need_resched or si ENTRY(resume_userspace)
gpending cli # make sure we don't miss an interrupt
# between sampling and the iret # setting need_resched or sigpending
movl TI_flags(%ebp), %ecx # between sampling and the iret
testw $_TIF_ALLWORK_MASK, %cx # curr movl TI_flags(%ebp), %ecx
ent->work andl $_TIF_WORK_MASK, %ecx
jne syscall_exit_work # is there any work to be done on
restore_all: # int/exception return?
RESTORE_ALL jne work_pending
jmp restore_all

Chapter 4 Interrupts and Exceptions 182


Deal with Pending Signal
 Check VM_MASK bit in the flags register (Ke
rnel / VM86 mode)
 Call do_notify_resume()
 There is an extra complication if a signal w
as found to be pending while the processor
was running in virtual 8086 mode before int
errupt
 It copies saved values from the stack to the vm
86_info filed of the thread structure

Chapter 4 Interrupts and Exceptions 183


Reschedule Current Process
 If there is any switch request, the kernel
must perform process scheduling; otherwi
se, control is returned to the current proc
ess
 If the current process cannot continue aft
er interrupt, then work_resched() will be
invoked

Chapter 4 Interrupts and Exceptions 184


Return from Fork
 ret_from_fork function is executed by the
child process right after its creation throu
gh a fork(), vfork(), or clone() system call
 schedule_tail(): It is relevant only in the S
MP case. It tries to find a suitable CPU on
which to run the process just switched ou
t.

Chapter 4 Interrupts and Exceptions 185


// entry.s // entry.s
work_resched: ENTRY(ret_from_fork)
call schedule pushl %eax
cli call schedule_tail
movl TI_flags(%ebp), %ecx
GET_THREAD_INFO(%ebp)
andl $_TIF_WORK_MASK, %ecx
jz restore_all popl %eax
testb $_TIF_NEED_RESCHED, %cl jmp syscall_exit
jnz work_resched
work_notifysig:
# deal with pending signals and
# notify-resume requests
testl $VM_MASK, EFLAGS(%esp)
movl %esp, %eax
jne work_notifysig_v86
# returning to kernel-space or
# vm86-space
xorl %edx, %edx
call do_notify_resume
jmp restore_all

Chapter 4 Interrupts and Exceptions 186

You might also like