Lect8 424 002

Download as pdf or txt
Download as pdf or txt
You are on page 1of 74

CS162

Operating Systems and


Systems Programming
Lecture 8

Introduction to I/O,
Sockets, Networking

February 18th, 202


Prof. John Kubiatowic
http://cs162.eecs.Berkeley.edu

Acknowledgments: Lecture slides are from the Operating Systems course


taught by John Kubiatowicz at Berkeley, with few minor updates/changes.
When slides are obtained from other sources, a reference will be noted on the
bottom of that slide, in which case a full list of references is provided on the last
slide.



0


z



Recall: UNIX System Structure

Applications
User Mode
Standard Libs

Kernel Mode

Hardware

2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 2


Recall: A Kind of Narrow Waist

Word Processing
Compilers Web Browsers
Email
Databases Web Servers
Application / Service

Portable OS Library OS
User
System Call
System Interface
Portable OS Kernel
Software Platform support, Device Drivers

Hardware x86 PowerPC ARM


PCI

Ethernet (10/100/1000) 802.11 a/b/g/n SCSI IDE Graphics

2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 3


Recall: web server

Request

Repl
(retrieved by web server)
Client Web Server

2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 4


y

Recall: web server

Server 4. parse request 9. format reply


reques repl
buffer buffer

1.networ 3. kerne 10. networ


socke 5. fil 8. kerne
socket copy
syscall read write syscall read copy
Kernel wait RTU wait RTU
11. kernel copy
from user buffe
to network buffer

interrupt interrupt
2. copy arrivin 12. format outgoin 6. dis 7. disk data
packet (DMA) packet and DMA request (DMA)

Hardware
Network
interface Disk interface

Request Reply
2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 5
y

POSIX I/O: Everything is a “File”


• Identical interface for
– Devices (terminals, printers, etc.
– Regular files on dis
– Networking (sockets
– Local interprocess communication (pipes, sockets
• Based on open(), read(), write(), and close(
• Allows simple composition of programs
» find | grep | wc …

2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 6


k

POSIX I/O Design Patterns


• Open before us
– Access control check, setup happens her
• Byte-oriente
– Least common denominato
– OS responsible for hiding the fact that real devices may not work
this way (e.g. hard drive stores data in blocks
• Explicit close

2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 7


d

POSIX I/O: Kernel Buffering


• Reads are buffere
– Part of making everything byte-oriente
– Process is blocked while waiting for devic
– Let other processes run while gathering resul
• Writes are buffere
– Complete in background (more later on
– Return to user when data is “handed off ” to kernel

2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 8


d

Putting it together: web server


Kernel buffer
4. parse request
reads 9. format reply
Server
reques repl
buffer buffer

1.networ 3. kerne 10. networ


socke 5. fil 8. kerne
socket copy
syscall read write syscall read copy
Kernel wait RTU wait RTU
11. kernel copy
from user buffe
to network buffer

interrupt interrupt
2. copy arrivin 12. format outgoin 6. dis 7. disk data
packet (DMA) packet and DMA request (DMA)

Hardware
Network
interface Disk interface

Request Reply
2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 9
y

Putting it together: web server


Kernel buffer
write
Server 4. parse request 9. format reply
reques repl
buffer buffer

1.networ 3. kerne 10. networ


socke 5. fil 8. kerne
socket copy
syscall read write syscall read copy
Kernel wait RTU wait RTU
11. kernel copy
from user buffe
to network buffer

interrupt interrupt
2. copy arrivin 12. format outgoin 6. dis 7. disk data
packet (DMA) packet and DMA request (DMA)

Hardware
Network
interface Disk interface

Request Reply
2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 10
y

I/O & Storage Layers


Application / Service

High Level I/O streams

Low Level I/O handles

Syscall registers

File System descriptors

I/O Driver Commands and Data Transfers

Disks, Flash, Controllers, DMA

2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 11


The File System Abstraction
• High-level ide
– Files live in hierarchical namespace of filename
• Fil
– Named collection of data in a file syste
– POSIX File data: sequence of byte
» Text, binary, linearized objects,
– File Metadata: information about the fil
» Size, Modification Time, Owner, Security inf
» Basis for access contro
• Director
– “Folder” containing files & Directorie
– Hierachical (graphical) namin
» Path through the directory grap
» Uniquely identifies a file or director
• /home/ff/cs162/public_html/fa18/index.htm
– Links and Volumes (later)

2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 12


e

C High-Level File API – Streams


• Operate on “streams” - sequence of bytes, whether text or data,
with a position

#include <stdio.h
FILE *fopen( const char *filename, const char *mode )
int fclose( FILE *fp );

Mode Text Binary Descriptions

sh
flu
r rb Open existing file for reading

to
w wb Open for writing; created if does not exist

t
ge
for
a ab Open for appending; created if does not exist

n’t
r+ rb+ Open existing file for reading & writing.

Do
w+ wb+ Open for reading & writing; truncated to zero if exists, create otherwise
a+ ab+ Open for reading & writing. Created if does not exist. Read from beginning, write
as append
2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 13
>

Connecting Processes, Filesystem, and Users


• Process has a ‘current working directory
• Absolute Path
– /home/ff/cs16
• Relative path
– index.html, ./index.html - current WD
– ../index.html - parent of current WD
– ~, ~cs162 - home directory

2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 14


s

C API Standard Streams – stdio.h


• Three predefined streams are opened implicitly when a program is
execute
– FILE *stdin – normal source of input, can be redirected
– FILE *stdout – normal source of output, can be redirected
– FILE *stderr – diagnostics and errors, can be redirected

• STDIN / STDOUT enable composition in Uni


• All can be redirected (for instance, using “pipe” symbol: ‘|’)
– cat hello.txt | grep “World!
» Cat’s stdout goes to grep’s stdin!

2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 15


d

C high level File API – stream ops


#include <stdio.h
// character oriented
int fputc(int c, FILE *fp) // rtn c or EOF on er
int fputs(const char *s, FILE *fp); // rtn >0 or EO

int fgetc( FILE * fp )


char *fgets( char *buf, int n, FILE *fp )

// block oriente
size_t fread(void *ptr, size_t size_of_elements,
size_t number_of_elements, FILE *a_file)

size_t fwrite(const void *ptr, size_t size_of_elements,


size_t number_of_elements, FILE *a_file)

// formatte
int fprintf(FILE *restrict stream, const char *restrict
format, ...)
int fscanf(FILE *restrict stream, const char *restrict
format, ...);
2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 16
d

>

;
;

C Streams: char by char I/O


#include <stdio.h

int main(void)
FILE* input = fopen(“input.txt”, “r”)
FILE* output = fopen(“output.txt”, “w”)
int c

c = fgetc(input)
while (c != EOF)
fputc(output, c)
c = fgetc(input)

fclose(input)
fclose(output)
}

2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 17


}

>

What if we wanted block by block I/O?


#include <stdio.h
// character oriented
int fputc(int c, FILE *fp) // rtn c or EOF on er
int fputs(const char *s, FILE *fp); // rtn >0 or EO

int fgetc( FILE * fp )


char *fgets( char *buf, int n, FILE *fp )

// block oriente
size_t fread(void *ptr, size_t size_of_elements,
size_t number_of_elements, FILE *a_file)

size_t fwrite(const void *ptr, size_t size_of_elements,


size_t number_of_elements, FILE *a_file)

// formatte
int fprintf(FILE *restrict stream, const char *restrict
format, ...)
int fscanf(FILE *restrict stream, const char *restrict
format, ...);
2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 18
d

>

;
;

stdio Block-by-Block I/O


#include <stdio.h
#define BUFFER_SIZE 102
int main(void)
FILE* input = fopen("input.txt", "r")
FILE* output = fopen("output.txt", "w")
char buffer[BUFFER_SIZE]
size_t length
length = fread(buffer, BUFFER_SIZE, sizeof(char), input)
while (length > 0)
fwrite(buffer, length, sizeof(char), output)
length = fread(buffer, BUFFER_SIZE, sizeof(char),
input)

fclose(input)
fclose(output)
}

2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 19


}

>

Aside: Systems Programming


• Systems programmers are paranoi
• We should really be writing things like
FILE* input = fopen(“input.txt”, “r”);
if (input == NULL) {
// Prints our string and error msg.
perror(“Failed to open input file”)
}
• Be thorough about checking return value
– Want failures to be systematically caught and dealt with

2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 20


d

C Stream API: Positioning


int fseek(FILE *stream, long int offset, int
whence)

long int ftell (FILE *stream

void rewind (FILE *stream)

offset (SEEK_SET) offset (SEEK_END)

whence

offset (SEEK_CUR)

• Preserves high level abstraction of a uniform stream of objects

2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 21


;

What’s below the surface ??

Application / Service
High Level I/O streams

Low Level I/O handles


Syscall registers

File System descriptors


I/O Driver commands and Data Transfers
disks, flash, controllers, DMA

2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 22


C Low level I/O
• Operations on File Descriptors – as OS object representing the
state of a fil
– User has a “handle” on the descriptor

#include <fcntl.h
#include <unistd.h
#include <sys/types.h

int open (const char *filename, int flags [, mode_t mode]


int creat (const char *filename, mode_t mode
int close (int filedes)

Bit vector of Bit vector of Permission Bits


• Access modes (Rd, Wr, … • User|Group|Other X R|W|X
• Open Flags (Create, …
• Operating modes (Appends, …)

http://www.gnu.org/software/libc/manual/html_node/Opening-and-Closing-Files.html
2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 23
:

>

>

>

C Low Level: standard descriptors

#include <unistd.h

STDIN_FILENO - macro has value


STDOUT_FILENO - macro has value
STDERR_FILENO - macro has value

int fileno (FILE *stream

FILE * fdopen (int filedes, const char *opentype

• Crossing levels: File descriptors vs. stream


• Don’t mix them!

2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 24


>

C Low Level Operations

ssize_t read (int filedes, void *buffer, size_t maxsize


- returns bytes read, 0 => EOF, -1 => erro
ssize_t write (int filedes, const void *buffer, size_t
size
- returns bytes writte

off_t lseek (int filedes, off_t offset, int whence

int fsync (int fildes) – wait for i/o to finis


void sync (void) – wait for ALL to finish

• When write returns, data is on its way to disk and can be read,
but it may not actually be permanent!

2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 25


)

A little example: lowio.c

#include <fcntl.h
#include <unistd.h
#include <sys/types.h

int main()
char buf[1000]
int fd = open("lowio.c", O_RDONLY, S_IRUSR | S_IWUSR)
ssize_t rd = read(fd, buf, sizeof(buf))
int err = close(fd)
ssize_t wr = write(STDOUT_FILENO, buf, rd)
}

2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 26


{

>

>

>

And lots more !


• TTYs versus file
• Memory mapped file
• File Lockin
• Asynchronous I/
• Generic I/O Control Operation
• Duplicating descriptors

int dup2 (int old, int new


int dup (int old)

2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 27


g

Another: lowio-std.c
#include <stdlib.h
#include <stdio.h
#include <string.h
#include <unistd.h
#include <sys/types.h

#define BUFSIZE 102

int main(int argc, char *argv[]

char buf[BUFSIZE]
ssize_t writelen = write(STDOUT_FILENO, "I am a process.\n", 16)

ssize_t readlen = read(STDIN_FILENO, buf, BUFSIZE)

ssize_t strlen = snprintf(buf, BUFSIZE,"Got %zd chars\n", readlen)

writelen = strlen < BUFSIZE ? strlen : BUFSIZE


write(STDOUT_FILENO, buf, writelen)

exit(0)
}

2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 28


{

>

>

>

>

>

Low-Level I/O: Example


#include <fcntl.h
#include <unistd.h

#define BUFFER_SIZE 102

int main(void)
int input_fd = open(“input.txt”, O_RDONLY)
int output_fd = open(“output.txt”, O_WRONLY)
char buffer[BUFFER_SIZE]
ssize_t length
length = read(input_fd, buffer, BUFFER_SIZE)
while (length > 0)
write(output_fd, buffer, length)
length = read(input_fd, buffer, BUFFER_SIZE)

close(input_fd)
close(output_fd)
}

2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 29


}

>

>

Streams vs. File Descriptors


• Streams are buffered in user memory
printf("Beginning of line ");
sleep(10); // sleep for 10 second
printf("and end of line\n");
⇒ Prints out everything at onc
• Operations on file descriptors are visible immediatel
write(STDOUT_FILENO, "Beginning of line ",
18)
sleep(10)
write("and end of line \n", 16)
⇒ Outputs "Beginning of line" 10 seconds earlier

2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 30


;

Summary: Key Unix I/O Design Concepts


• Uniformity – everything is a fil
– file operations, device I/O, and interprocess communication through open, read/
write, clos
– Allows simple composition of programs
» find | grep | wc
• Open before us
– Provides opportunity for access control and arbitratio
– Sets up the underlying machinery, i.e., data structure
• Byte-oriente
– Even if blocks are transferred, addressing is in byte
• Kernel buffered read
– Streaming and block devices looks the same, read blocks yielding processor to
other tas
• Kernel buffered write
– Completion of out-going transfer decoupled from the application, allowing it to
continu
• Explicit close

2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 31


e

What’s below the surface ??


Application / Service

High Level I/O streams


Low Level I/O handles
Syscall registers
File System descriptors
I/O Driver Commands and Data Transfers
Disks, Flash, Controllers, DMA

2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 32


Recall: SYSCALL

• Low level lib parameters are set up in registers and syscall instruction is
issue
– A type of synchronous exception that enters well-defined entry points
into kernel
2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 33
d

What’s below the surface ??

Application / Service
File descriptor numbe
- an int High Level I/O streams
Low Level I/O handles
Syscall registers
File System descriptors
File Descriptor
• a struct with all the info I/O Driver Commands and Data Transfers
about the files Disks, Flash, Controllers, DMA

2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 34


s

Internal OS File Descriptor


• Internal Data Structure describing everything about the fil
– Where it reside
– Its statu
– How to access i

• Pointer:
struct file *file

2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 35


s


s

File System: from syscall to driver


In fs/read_write.c
ssize_t vfs_read(struct file *file, char __user *buf, size_t count, loff_t
*pos

ssize_t ret •Read up to “count” bytes from “file”


if (!(file->f_mode & FMODE_READ)) return -EBADF
starting from “pos” into “buf”.
if (!file->f_op || (!file->f_op->read && !file->f_op->aio_read)
return -EINVAL •Return error or number of bytes read.
if (unlikely(!access_ok(VERIFY_WRITE, buf, count))) return -EFAULT
ret = rw_verify_area(READ, file, pos, count)
if (ret >= 0)
count = ret
if (file->f_op->read
ret = file->f_op->read(file, buf, count, pos)
els
ret = do_sync_read(file, buf, count, pos)
if (ret > 0)
fsnotify_access(file->f_path.dentry)
add_rchar(current, ret)

inc_syscr(current)

return ret
}
2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 36
{

File System: from syscall to driver


In fs/read_write.c
ssize_t vfs_read(struct file *file, char __user *buf, size_t count, loff_t
*pos

ssize_t ret
if (!(file->f_mode & FMODE_READ)) return -EBADF
if (!file->f_op || (!file->f_op->read && !file->f_op->aio_read)
return -EINVAL
if (unlikely(!access_ok(VERIFY_WRITE, buf, count))) Make sure-EFAULT
return we are
ret = rw_verify_area(READ, file, pos, count)
allowed to read this
if (ret >= 0)
count = ret file
if (file->f_op->read
ret = file->f_op->read(file, buf, count, pos)
els
ret = do_sync_read(file, buf, count, pos)
if (ret > 0)
fsnotify_access(file->f_path.dentry)
add_rchar(current, ret)

inc_syscr(current)

return ret
}
2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 37
{

File System: from syscall to driver


In fs/read_write.c
ssize_t vfs_read(struct file *file, char __user *buf, size_t count, loff_t
*pos

ssize_t ret
if (!(file->f_mode & FMODE_READ)) return -EBADF
if (!file->f_op || (!file->f_op->read && !file->f_op->aio_read)
return -EINVAL
if (unlikely(!access_ok(VERIFY_WRITE, buf, count))) return -EFAULT
ret = rw_verify_area(READ, file, pos, count)
if (ret >= 0) Check if file has
count = ret
read methods
if (file->f_op->read
ret = file->f_op->read(file, buf, count, pos)
els
ret = do_sync_read(file, buf, count, pos)
if (ret > 0)
fsnotify_access(file->f_path.dentry)
add_rchar(current, ret)

inc_syscr(current)

return ret
}
2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 38
{

File System: from syscall to driver


In fs/read_write.c
ssize_t vfs_read(struct file *file, char __user *buf, size_t count, loff_t
*pos

ssize_t ret
if (!(file->f_mode & FMODE_READ)) return -EBADF
if (!file->f_op || (!file->f_op->read && !file->f_op->aio_read)
return -EINVAL
if (unlikely(!access_ok(VERIFY_WRITE, buf, count))) return -EFAULT
ret = rw_verify_area(READ, file, pos, count)
if (ret >= 0)
count = ret
if (file->f_op->read •Check whether we can write to buf (e.g.,
buf ispos)
ret = file->f_op->read(file, buf, count, in the user space range)
els •unlikely(): hint to branch prediction this
ret = do_sync_read(file, buf, count, condition
pos) is unlikel
if (ret > 0)
fsnotify_access(file->f_path.dentry)
add_rchar(current, ret)

inc_syscr(current)

return ret
}
2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 39
{

File System: from syscall to driver


In fs/read_write.c
ssize_t vfs_read(struct file *file, char __user *buf, size_t count, loff_t
*pos

ssize_t ret
if (!(file->f_mode & FMODE_READ)) return -EBADF
if (!file->f_op || (!file->f_op->read && !file->f_op->aio_read)
return -EINVAL
if (unlikely(!access_ok(VERIFY_WRITE, buf, count))) return -EFAULT
ret = rw_verify_area(READ, file, pos, count)
if (ret >= 0)
count = ret
if (file->f_op->read
Check whether we read from a
ret = file->f_op->read(file, buf, count, pos)
els valid range in the file.
ret = do_sync_read(file, buf, count, pos)
if (ret > 0)
fsnotify_access(file->f_path.dentry)
add_rchar(current, ret)

inc_syscr(current)

return ret
}
2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 40
{

File System: from syscall to driver


In fs/read_write.c
ssize_t vfs_read(struct file *file, char __user *buf, size_t count, loff_t
*pos

ssize_t ret
if (!(file->f_mode & FMODE_READ)) return -EBADF
if (!file->f_op || (!file->f_op->read && !file->f_op->aio_read)
return -EINVAL
if (unlikely(!access_ok(VERIFY_WRITE, buf, count))) return -EFAULT
ret = rw_verify_area(READ, file, pos, count)
if (ret >= 0)
count = ret
if (file->f_op->read
ret = file->f_op->read(file, buf, count, pos)
els
ret = do_sync_read(file, buf, count, pos)
if (ret > 0)
fsnotify_access(file->f_path.dentry)
add_rchar(current, ret) If driver provide a read function
(f_op->read) use it; otherwise use
inc_syscr(current) do_sync_read()
return ret
}
2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 41
{

File System: from syscall to driver


In fs/read_write.c
ssize_t vfs_read(struct file *file, char __user *buf, size_t count, loff_t
*pos

ssize_t ret
if (!(file->f_mode & FMODE_READ)) return -EBADF
if (!file->f_op || (!file->f_op->read && !file->f_op->aio_read)
return -EINVAL
if (unlikely(!access_ok(VERIFY_WRITE, buf, count))) return -EFAULT
ret = rw_verify_area(READ, file, pos, count)
if (ret >= 0)
count = ret
if (file->f_op->read Notify the parent of this file that the file was read (see
http://www.fieldses.org/~bfields/kernel/vfs.txt
ret = file->f_op->read(file, buf, count, pos)
els
ret = do_sync_read(file, buf, count, pos)
if (ret > 0)
fsnotify_access(file->f_path.dentry)
add_rchar(current, ret)

inc_syscr(current)

return ret
}
2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 42
{

)
;

File System: from syscall to driver


In fs/read_write.c
ssize_t vfs_read(struct file *file, char __user *buf, size_t count, loff_t
*pos

ssize_t ret
if (!(file->f_mode & FMODE_READ)) return -EBADF
if (!file->f_op || (!file->f_op->read && !file->f_op->aio_read)
return -EINVAL
if (unlikely(!access_ok(VERIFY_WRITE, buf, count))) return -EFAULT
ret = rw_verify_area(READ, file, pos, count)
if (ret >= 0)
count = ret
if (file->f_op->read
Update the number of bytes
ret = file->f_op->read(file, buf, count, pos)
els read by “current” task (for
ret = do_sync_read(file, buf, count, pos) scheduling purposes
if (ret > 0)
fsnotify_access(file->f_path.dentry)
add_rchar(current, ret)

inc_syscr(current)

return ret
}
2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 43
{

)
{

File System: from syscall to driver


In fs/read_write.c
ssize_t vfs_read(struct file *file, char __user *buf, size_t count, loff_t
*pos

ssize_t ret
if (!(file->f_mode & FMODE_READ)) return -EBADF
if (!file->f_op || (!file->f_op->read && !file->f_op->aio_read)
return -EINVAL
if (unlikely(!access_ok(VERIFY_WRITE, buf, count))) return -EFAULT
ret = rw_verify_area(READ, file, pos, count)
if (ret >= 0)
count = ret
if (file->f_op->read
ret = file->f_op->read(file, buf, count, pos)
els
Update the number of read
ret = do_sync_read(file, buf, count, pos)
if (ret > 0) syscalls by “current” task (for
fsnotify_access(file->f_path.dentry) scheduling purposes
add_rchar(current, ret)

inc_syscr(current)

return ret
}
2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 44
{

)
{

Lower Level Driver


• Associated with particular hardware devic
• Registers / Unregisters itself with the kerne
• Handler functions for each of the file operations

2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 45


e

Device Drivers
• Device Driver: Device-specific code in the kernel that interacts
directly with the device hardwar
– Supports a standard, internal interface
– Same kernel I/O system can interact easily with different device drivers
– Special device-specific configuration supported with the ioctl() system call
• Device Drivers typically divided into two pieces
– Top half: accessed in call path from system calls
» implements a set of standard, cross-device calls like open(), close(),
read(), write(), ioctl(), strategy()
» This is the kernel’s interface to the device drive
» Top half will start I/O to device, may put thread to sleep until finishe
– Bottom half: run as interrupt routine
» Gets input or transfers next block of outpu
» May wake sleeping threads if I/O now complete

2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 46


e

Life Cycle of An I/O Request


Use
Program

Kernel I/
Subsystem

Device Drive
Top Half

Device Drive
Bottom Half

Devic
Hardware

2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 47


r

Communication between processes


• Can we view files as communication channels

write(wfd, wbuf, wlen);

n = read(rfd,rbuf,rmax);

• Producer and Consumer of a file may be distinct processe


– May be separated in time (or not
• However, what if data written once and consumed once?
– Don’t we want something more like a queue
– Can still look like File I/O!

2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 48


)

Communication Across the world looks like file IO

write(wfd, wbuf, wlen);

n = read(rfd,rbuf,rmax);

• Connected queues over the Interne


– But what’s the analog of open
– What is the namespace
– How are they connected in time?

2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 49


?

Request Response Protocol

Client (issues requests) Server (performs operations)


write(rqfd, rqbuf, buflen);

requests

n = read(rfd,rbuf,rmax);

wait service request


write(wfd, respbuf, len);

responses

n = read(resfd,resbuf,resmax);

2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 50


Request Response Protocol

Client (issues requests) Server (performs operations)


write(rqfd, rqbuf, buflen);

requests
n = read(rfd,rbuf,rmax);

wait service request


write(wfd, respbuf, len);
responses

n = read(resfd,resbuf,resmax);

2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 51


Client-Server Models

Client 1

Client 2 Server

***

Client n

• File servers, web, FTP, Databases,


• Many clients accessing a common server

2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 52


Client-Server Communication

• Client “sometimes on • Server is “always on


– Initiates a request to the – Services requests from
server when interested many client hosts
– E.g., Web browser on your – E.g., Web server for the
laptop or cell phone www.cnn.com Web site
– Doesn’t communicate – Doesn’t initiate contact with
directly with other clients the clients
– Needs to know the server’s – Needs a fixed, well-known
address address

2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 53


Sockets
• Socket: an abstraction of a network I/O queu
– Mechanism for inter-process communication
– Embodies one side of a communication channe
» Same interface regardless of location of other en
» Could be local machine (called “UNIX socket”) or remote machine (called
“network socket”
– First introduced in 4.2 BSD UNIX: big innovation at tim
» Now most operating systems provide some notion of socke
• Data transfer like file
– Read / Write against a descripto
• Over ANY kind of networ
– Local to a machin
– Over the internet (TCP/IP, UDP/IP
– OSI, Appletalk, SNA, IPX, SIP, NS, …

2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 54


)

Silly Echo Server – running example


Client (issues requests) Server (performs operations)

gets(fd,sndbuf, …);

requests

write(fd, buf,len);
n = read(fd,buf,);

wait print
write(fd, buf,);

responses

n = read(fd,rcvbuf, );
print

2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 55


Echo client-server example
void client(int sockfd)
int n
char sndbuf[MAXIN]; char rcvbuf[MAXOUT]
getreq(sndbuf, MAXIN); /* prompt *
while (strlen(sndbuf) > 0)
write(sockfd, sndbuf, strlen(sndbuf)); /* send *
memset(rcvbuf,0,MAXOUT); /* clear *
n=read(sockfd, rcvbuf, MAXOUT-1); /* receive *
write(STDOUT_FILENO, rcvbuf, n) /* echo *
getreq(sndbuf, MAXIN); /* prompt *

void server(int consockfd)


char reqbuf[MAXREQ]
int n
while (1) {
memset(reqbuf,0, MAXREQ)
n = read(consockfd,reqbuf,MAXREQ-1); /* Recv *
if (n <= 0) return
n = write(STDOUT_FILENO, reqbuf, strlen(reqbuf));
n = write(consockfd, reqbuf, strlen(reqbuf)); /*
echo*
2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 56
}
}

;
;

What assumptions are we making?


• Reliabl
– Write to a file => Read it back. Nothing is lost.
– Write to a (TCP) socket => Read from the other side, same
– Like pipe
• In order (sequential stream
– Write X then write Y => read gets X then read gets

• When ready
– File read gets whatever is there at the time. Assumes writing
already took place.
– Like pipes!

2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 57


e

Socket creation and connection


• File systems provide a collection of permanent objects in
structured name spac
– Processes open, read/write/close the
– Files exist independent of the processe
• Sockets provide a means for processes to communicate
(transfer data) to other processes
• Creation and connection is more comple
• Form 2-way pipes between processe
– Possibly worlds awa
• How do we name them
• How do these completely independent programs know that
the other wants to “talk” to them?

2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 58


y

Namespaces for communication over IP


• Hostnam
– www.eecs.berkeley.ed
• IP addres
– 128.32.244.172 (ipv6?
• Port Numbe
– 0-1023 are “well known” or “system” port
» Superuser privileges to bind to on
– 1024 – 49151 are “registered” ports (registry
» Assigned by IANA for specific service
– 49152–65535 (215+214 to 216−1) are “dynamic” or “private
» Automatically allocated as “ephemeral Ports”

2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 59


e

Socket Setup over TCP/IP


Server
Socket
e ct ion
Conn
q u e st
Re new
socket
Connection
socket connection
connection
socket

Client Server

• Special kind of socket: server socke


– Has file descripto
– Can’t read or writ
• Two operations
1. listen(): Start allowing clients to connect
2. accept(): Create a new socket for a particular client connection
2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 60

Socket Setup over TCP/IP


Server
ion
Socket
e ct
Conn
e st new
q u
Re socket
socket connection socket

Client Server
• Server Socket: Listens for new connection
– Produces new sockets for each unique connection
– 3-way handshake to establish new connection
• Things to remember
– Connection involves 5 values:
[ Client Addr, Client Port, Server Addr, Server Port, Protocol ]
– Often, Client Port “randomly” assigned
» Done by OS during client socket setup
– Server Port often “well known”
» 80 (web), 443 (secure web), 25 (sendmail), et
» Well-known ports from 0—1023
2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 61


c

Web Server using Sockets (in concept)


Client Server
Create Server Socket

Create Client Socket Bind it to an Address


(host:port)

Connect it to server (host:port) Listen for Connection

Accept syscall()
Connection Socket Connection Socket
write request read request
read response write response

Close Client Socket Close Connection Socket

Close Server Socket


2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 62

Client Protocol

char *host_name, port_name

// Create a socke
struct addrinfo *server = lookup_host(host_name, port_name)
int sock_fd = socket(server->ai_family, server->ai_socktype
server->ai_protocol)

// Connect to specified host and por


connect(sock_fd, server->ai_addr, server->ai_addrlen)

// Carry out Client-Server protoco


run_client(sock_fd)

/* Clean up on termination *
close(sock_fd);

2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 63


t

Client: getting the server address

struct addrinfo *lookup_host(char *host_name, char *port)


struct addrinfo *server
struct addrinfo hints
memset(&hints, 0, sizeof(hints))
hints.ai_family = AF_UNSPEC
hints.ai_socktype = SOCK_STREAM

int rv = getaddrinfo(host_name, port_name


&hints, &server)
if (rv != 0)
printf("getaddrinfo failed: %s\n", gai_strerror(rv))
return NULL

return server
}

2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 64


}

Server Protocol (v1)


// Create socket to listen for client connection
char *port_name
struct addrinfo *server = setup_address(port_name)
int server_socket = socket(server->ai_family,

server->ai_socktype, server->ai_protocol)

// Bind socket to specific por


bind(server_socket, server->ai_addr, server->ai_addrlen)

// Start listening for new client connection


listen(server_socket, MAX_QUEUE)

while (1)
// Accept a new client connection, obtaining a new socke
int conn_socket = accept(server_socket, NULL, NULL)
serve_client(conn_socket)
close(conn_socket)

close(server_socket);

2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 65


}

Server Address - itself


struct addrinfo *setup_address(char *port)
struct addrinfo *server
struct addrinfo hints
memset(&hints, 0, sizeof(hints))
hints.ai_family = AF_UNSPEC
hints.ai_socktype = SOCK_STREAM
hints.ai_flags = AI_PASSIVE
getaddrinfo(NULL, port, &hints, &server)
return server
}

• Simple for
• Internet Protocol, TC
• Accepting any connections on the specified port

2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 66


m

How does the server protect itself?


• Isolate the handling of each connection
• By forking it off as another process

2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 67


Sockets With Protection


Client Server
Create Server Socket

Create Client Socket Bind it to an Address


(host:port)

Connect it to server (host:port) Listen for Connection

Accept syscall()

Connection Socket Connection Socket


Child Parent
Close Listen Socket Close Connection
write request read request
Socket
read response write response
Wait for child
Close Connection
Close Client Socket
Socket
Close Server Socket
2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 68

Server Protocol (v2)


// Start listening for new client connection
listen(server_socket, MAX_QUEUE)
while (1)
// Accept a new client connection, obtaining a new socke
int conn_socket = accept(server_socket, NULL, NULL)

pid_t pid = fork() // New process for connectio


if (pid == 0) // Child proces
close(server_socket) // Doesn’t need server_socke
serve_client(conn_socket) // Serve up content to clien
close(conn_socket) // Done with client
exit(EXIT_SUCCESS);
} else // Parent process
close(conn_socket) // Don’t need client socke
wait(NULL) // Wait for our (one) chil
}

close(server_socket);

2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 69


}

{
{

;
{
;
;
;
;
;
;

Concurrent Server
• Listen will queue request
• Buffering present elsewher
• But server waits for each connection to terminate before
initiating the next

2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 70


s

Sockets With Protection and Parallelism


Client Server
Create Server Socket

Create Client Socket Bind it to an Address


(host:port)

Connect it to server (host:port) Listen for Connection

Accept syscall()

Connection Socket Connection Socket


Child Parent
Close Listen Socket Close Connection
write request read request
Socket
read response write response

Close Connection Close Server Socket


Close Client Socket
Socket
2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 71

Server Protocol (v3)


// Start listening for new client connection
listen(server_socket, MAX_QUEUE)
signal(SIGCHLD,SIG_IGN) // Prevent zombie children
while (1)
// Accept a new client connection, obtaining a new socke
int conn_socket = accept(server_socket, NULL, NULL)

pid_t pid = fork() // New process for connectio


if (pid == 0) // Child proces
close(server_socket) // Doesn’t need server_socke
serve_client(conn_socket) // Serve up content to clien
close(conn_socket) // Done with client
exit(EXIT_SUCCESS);
} else // Parent process
close(conn_socket) // Don’t need client socke
// wait(NULL) // Don’t wait (SIGCHLD
// ignored, above
}

close(server_socket);

2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 72


}

{
{

{
;
;
;
;
;
;
;
;

Conclusion (I)
• System Call Interface is “narrow waist” between user programs and
kerne

• Streaming IO: modeled as a stream of byte


– Most streaming I/O functions start with “f ” (like “fread”
– Data buffered automatically by C-library function

• Low-level I/O:
– File descriptors are integer
– Low-level I/O supported directly at system call leve

• STDIN / STDOUT enable composition in Uni


– Use of pipe symbols connects STDOUT and STDIN
» find | grep | wc …
2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 73
l

Conclusion (II)

• Device Driver: Device-specific code in the kernel that interacts


directly with the device hardwar
– Supports a standard, internal interfac
– Same kernel I/O system can interact easily with different device driver

• File abstraction works for inter-processes communication (local or


Internet

• Socket: an abstraction of a network I/O queu


– Mechanism for inter-process communication

2/18/2020 Kubiatowicz CS162 ©UCB Fall 2020 74


)

You might also like