Unit 1

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 54

Web Technologies

Unit I – Fundamentals

Text Book: Programming the World wide web

Robert W Sebesta
Edition: Seventh
Outline

1.1 A Brief Introduction to the Internet


1.2 The World Wide Web
1.3 Web Browsers
1.4 Web Servers
1.5 Uniform Resource Locators (URL)
1.6 Multipurpose Internet Mail Extensions (MIME)
1.7 The Hypertext Transfer Protocol (HTTP)

2
1.1 A Brief Introduction to the Internet

1.1.1 Origins
–Advanced Research Projects Agency Network (ARPANET) - late 1960s
and early 1970s
• Text based communication through mail,
• For ARPA-funded research organizations
• BITnet(because its time network, university in new York), Csnet( computer
science network - late 1970s & early 1980s
• email and file transfer for other institutions
–National Science Foundation internet (NSFnet) - 1986
• Originally for non-DOD (Department of Defense) funded places
• Initially connected five supercomputer centers
• By 1990, it had replaced ARPAnet for non-military uses
• Soon became the network for all (by the early 1990s)

–NSFnet eventually became known as the Internet


3
1.1 A Brief Introduction to the Internet
1.1.2 What the Internet is:
• Internet is a huge collection of computers connected in a communication network.

• These computers are of every imaginable size, configuration and manufacturer.

• It is network of network rather than network of computers.

• At the lowest level, since 1982, all connections use TCP(transmission control
protocol)/IP(internet protocol).
• Normally the individual computers in an organization are connected to each other in
local network. One node on this local network is physically connected to the internet.

4
1.1 A Brief Introduction to the Internet
1.1.3 Internet Protocol (IP) Addresses

- Internet nodes are identified by names, for computers they are identified by
numeric address.

- IP address of a machine connected to the internet is a unique 32-bit number.


– Form: 32-bit binary number four 8 bit number separated by periods
– 191.57.126.0 to 191.57.126.255 (256 IP addresses, small organization)

• Large Organizations such as Department of Defense (DoD) are assigned with 16


million IP address. Which include IP address with one particular first 8-bit
number, such as 12.0.0.0 to 12.255.255.255.

• In late 1998, a new IP standard IPv6, was approved, although it is still is not
widely used. The most significant change was to expand the address size from 32
bits to 128 bits.
5
1.1 A Brief Introduction to the Internet

1.1.4 Domain Names


• People have difficulty with dealing with numbers, machines on the
internet hence have textual names.
• These names begin with the name of the host machine followed by
progressively lager enclosing collections of machines called domains.
• There are two, three or more domain names.
• First domain name appears immediately to the right of the host name, is
the domain of which the host is a part.
• The second domain name gives the domain of which first domain is the
part.
• The last domain name identifies the type of organization in which the
host resides.

6
• In United states, edu education, orgorganization, gov
governmemt, comcompany.
• In Sweden the largest domain is the abbreviation of country, i.e se.
• Fully qualified domain name - the host name and all of the domain
names
• DNS (Domain Name System) servers - convert fully qualified domain
names to IP’s
• Movies.marxbros.comedy.com movies host name, marxbros
movies local domain, which is part of comedy domain ,which in turn
part of com

7
1.1 A Brief Introduction to the Internet
• 1.1.4 Domain Names (cont’d)

Figure 1.1 Domain Name Conversion

8
• The conversion of fully qualified domain name to IP address using a
software systems called name servers before sending the message to
destination thru internet.

• Name servers serve a collection of machines on the internet and are


operated by organizations .

• All documents requests from browsers are routed to the nearest name
server.

• if the name server can convert, it does so.

• If not it sends the to another name server for conversion.

9
1.1 A Brief Introduction to the Internet
BROWSER

1. Browser requests for a particular HTML file

SERVER
3. The
browser
displays the
file

2. The server locates the file


and sends it to the browser

hello.html 172.17.28.45
Client and Server - Static HTML pages
10
• Along with TCP/IP other protocols used for various uses of internet.
• By the mid-1980s, several different protocols had been invented and
were being used on the Internet, all with different user interfaces
(Telnet, FTP, Usenet, mailto).
• telnet is used to log on to another computer von the internet.
• FTP  file transfer
• Usenet serve as electronic bulletin board.
• mailto send messages from one computer on the internet to other
computers

11
1.2 The World-Wide Web

1.2.1 Origins
– Tim Berners-Lee at CERN (The European Organization for
Nuclear Research) proposed the Web in 1989
• Purpose: to allow scientists to have access to many databases of
scientific work through their own computers
• http://www.w3.org/History/1989/proposal.html
– Document form: hypertext
– Pages? Documents? Resources?
• We’ll call them documents
– Hypermedia – more than just text – images, sound, etc.

12
1.2 The World-Wide Web

1.2.2 Web or the Internet?


• The Web uses one of the protocols, HTTP, that runs on the
Internet--there are several others (telnet, FTP, mailto, etc.)

13
1.3 Web Browsers

• Browsers are clients - always initiate, servers react (although


sometimes servers require responses)
• Mosaic - NCSA (The National Center for Supercomputing Applications
as a unit of the Univ. of Illinois), in early 1993
– First to use a GUI (Graphic User Interface), led to explosion
of Web use
– Initially for X-Windows, under UNIX, but was ported to other
platforms by late 1993
• Most requests are for existing documents, using HTTP
• Browsers are IE(Internet explorer, Firefox, chrome, Opera, apples
safari.

14
Mosaic Beta version 0.4, Sep 9 1994
15
1.4 Web Servers

• Provide responses to browser requests, either


existing documents or dynamically built documents
• Browser-server connection is now maintained
through more than one request-response cycle
• All communications between browsers and servers
use HTTP

16
1.4 Web Servers (cont’d)

1.4.1 Web Server Operation


• Web servers run as background processes in the operating system
– Monitor a communication port on the host, accepting HTTP
messages when it appears
• All current Web servers came from either
1.The original from CERN
2.The second one, from NCSA

17
1.4 Web Servers (cont’d)
1.4.2 General Server Characteristics
• Web servers have two main directories:
1.Document root (servable documents)
2.Server root (server system software)
• Document root is accessed indirectly by clients
– Its actual location is set by the server configuration file
– Requests are mapped to the actual location
• Virtual document trees
• Virtual hosts C:\xampp\apache\
• Proxy servers conf\httpd.conf
• Web servers now support other Internet protocols ftp, gopher,
news,mailto

18
1.4 Web Servers (cont’d)

1.4.3 Apache
• Apache (open source, fast, reliable)
– Directives (operation control):
ServerName
ServerRoot
ServerAdmin,
DocumentRoot
Alias
Redirect
DirectoryIndex
UserDir

http://httpd.apache.org/

19
1.4 Web Servers (cont’d)

1.4.4 IIS
• Internet Information Server
- Operation is maintained through a program with a GUI
interface

20
1.5 Uniform Resource Locators (URLs)

1.5.1 URL Formats


scheme:object-address
– The scheme is often a communications protocol, such as http, gopher,
news, telnet or ftp
• For the http protocol, the object-address is: fully qualified domain
name/doc path
• For the file protocol, only the doc path is needed

21
1.5 Uniform Resource Locators

1.5.1 URL Formats


• Host name may include a port number, as in zeppo:80 (80 is the
default)
• URLs cannot include spaces or any of a collection of other special
characters (semicolons, colons, ...)
• e.g. http://ic.payap.ac.th/index.php

22
1.5 Uniform Resource Locators

1.5.2 URL Paths


• The doc path may be abbreviated as a partial path
– The rest is furnished by the server configuration
// C:\xampp\apache\conf\httpd.conf
DocumentRoot "C:\xampp\htdocs"

• If the doc path ends with a slash, it means it is a directory


http://www.payap.ac.th/
http://cis.payap.ac.th/index.php

23
1.6 Multipurpose Internet Mail Extensions
(MIME)
• Originally developed for e-mail
• Used to specify to the browser the form of a file
returned by the server (attached by the server to the
beginning of the document)

24
1.6 Multipurpose Internet Mail Extensions
(MIME)
1.6.1 Type specifications
– Form:
type/subtype
– Examples: text/plain, text/html, image/gif, image/jpeg

• Server gets type from the requested file name’s suffix (.html implies
text/html)

• Browser gets the type explicitly from the server

25
1.6 Multipurpose Internet Mail Extensions
(MIME)
• 1.6.2 Experimental Document Types
• Subtype begins with x-
• e.g., video/x-msvideo
• Experimental types require the server to send a helper
application or plug-in so the browser can deal with the
file

26
1.7 The Hypertext Transfer Protocol (HTTP)

• The protocol used by ALL Web communications.

• Invented by Tim Berners-Lee in 1990.

• RFC 1945 (1996) - HTTP/1.0

• RFC 2068 (1997) - HTTP/1.1

• RFC 2616 (1999) - HTTP/1.1 (current version available at the


website for W3C.

27
1.7 HTTP (cont’d)

• HTTP consist of two phases: the request and the response.

• Each HTTP communication (request or response) between the browser and a


web server consists of two parts: a header and body.

• The header contains information about the communication

• Body contains the data of the communication.

28
1.7.1 The Request Phase

• Request Phase
• General Form:
1.HTTP method domain part of URL HTTP version
2.Header fields
3.Blank line
4.Message body
• An example of the first line of a request:
GET /degrees.html HTTP/1.1

29
1.7.1 The Request Phase
• Few request methods are defined by HTTP.
• Get and post methods are frequently used.

Table 1.1 HTTP Request Methods

30
• Following the first line of an HTTP communication is any number of header
fields, most of which are optional.

• The format of a header field is the field name followed by a colon and a HTTP
Headers

• Four categories of header fields:

General: for general information such as date


Request: included in request headers
Response: included in response headers
Entity: included in both request and response

• Common request fields: Accept: text/plain, Accept: text/html, Accept:


image/gif, Accept:text/*
31
• Host: host name request field gives the name of host.

• If-Modified-since: date, request field specifies requested file to be


sent only if it is modified from the given date.

• Content-length: length of the response body in bytes.

• GET,HEAD and DELETE methods do not have bodies.

32
1.7.1 The Request Phase
• Can communicate with server without a browser
telnet blanca.uccs.edu http
This command creates a connection to the http port on the
blanca.uccs.edu server
The connection to the server is now complete, and HTTP commands
are given
GET /respond.html HTTP/1.1
Host: blanca.uccs.edu

33
1.7.2 The Response Phase
• General Form:
• Status line
• Response header fields
• blank line
• Response body
• Status line format: includes the HTTP version used, a three-digit
status code and a short textual explanation of the status code.
• HTTP version status code explanation
• e.g. HTTP/1.1 200 OK
(Current version is 1.1)

34
1.7.2 The Response Phase (cont’d)

• Status code is a three-digit number; first digit specifies the general status
1 => Informational
2 => Success
3 => Redirection
4 => Client error
5 => Server error
• After the status line, the server spends a response header, which can
contain several lines of information about the response, each in the form
of a field. The only essential field of the header is Content-type.

35
1.7.2 The Response Phase (cont’d)

200 : OK
201 : Created
202 : Accepted
204 : No Content
301 : Moved Permanently
302 : Moved Temporarily
400 : Bad Request
401 : Unauthorized
403 : Forbidden
404 : Not Found
500 : Internal Server Error
503 : Service Unavailable
504: Gateway Timeout
505: HTTP Version Not Supported

36
1.7.2 The Response Phase (cont’d)

HTTP Response Example

HTTP/1.1 200 OK
Date: Tues, 18 May 2004 16:45:13 GMT
Server: Apache (Red-Hat/Linux)
Last-modified: Tues, 18 May 2004 16:38:38 GMT
Etag: "841fb-4b-3d1a0179"
Accept-ranges: bytes
Content-length: 364
Connection: close
Content-type: text/html, charset=ISO-8859-1

• Both request headers and response headers must be followed by a blank line

37
Unit – 1 Continued

Chapter 2

38
2.1 Origins and Evolution of HTML and XHTML (extensible
hypertext markup language)

• HTML was defined with SGML (standard generalized markup


language),which is an ISO standard notation for describing text
formatting languages.
• Original intent of HTML: General layout of documents that could be
displayed by a wide variety of computers
• Addition of style sheets to HTML in the late 1990 advanced the
capabilities closer to those of other text formatting languages.
Versions of HTML & XHTML
• The original version of HTML was designed ,with the structure of web
& the first browser at European laboratory or CERN( conseil European
la Recherce Nucleaire)
• MOSAIC is the first graphical web browser release in 1993,it was
commercialized & marketed by Netscape
• Microsoft began developing its own browser IE (internet explorer)
• In late 1994 Berners-Lee started the worldwide web consortium(W3C)
purpose is to develop standards for web technologies starting with HTML.

• The first HTML standard, HTML 2.0 was released in 1995.


• It was followed by HTML 3.2 in early 1997.
• Recent versions:
• HTML 4.0 – 1997
• Introduced many new features and deprecated many older features
• HTML 4.01 – 1999-approved by W3C – A cleanup of 4.0
• XHTML 1.0 – approved in early 2000.
• XHTML 1.0 is a redefinition of HTML4.01 using XML.
• XHTML 1.1 was recommended by W3C in may 2001.
• XHTML 1.1 Modularized 1.0, and drops frames.
• We’ll stick to 1.1, except for frames.
• XHTML 2.0 is getting close to release.
• The latest versions of the most popular browsers, Microsoft internet
explorer 7 (IE7) and Firefox 2(FX2).
2.1 Origins and Evolution of HTML (continued)
• Reasons to use HTML, rather than XHTML:
1.HTML has lax syntax rules, leading to sloppy and sometime
ambiguous documents
– XHTML syntax is much more strict, leading to clean and clear
documents in a standard form
2.HTML is easier to write
3.Because a huge number of HTML documents available on the
web ,browsers supports it.
Why to use XHTML
1.Quality & consistency is high
2.Enforces strict syntax rules
3.The syntactic correctness of XHTML documents can be
validated i.e. can be checked by xml browser
2.2 Basic Syntax

• The fundamental syntactic units of html are called tags.


• Tags are used to specify categories of content.
• Elements are defined by tags (markers)
• Tag format: must be written in lower case letters
• Opening tag: <name>
• Closing tag: </name>
• The opening tag and its closing tag together specify a container for
the content they enclose
• Not all tags have content
• If a tag has no content, its form is <name />
• The container and ,its content together are called an element
Eg: <p> this is simple </p>
• If a tag has attributes(specifies alternative meanings of the tag), they appear
between its name and the right bracket of the opening tag
• Attributes are specified in keyword form
Attribute-name=attribute-value
• Comment form: <!-- … -->
Eg: <!-- This is program 1 -- >
• Browsers ignore comments, unrecognizable tags, line breaks, multiple
spaces, and tabs
• Tags are suggestions to the browser, even if they are recognized by the
browser
2.3 standard HTML Document Structure
• First line is a DOCTYPE Command which specifies the SGML document type
definition(DTD) with which the document is compiled
<!DOCTYPE html>
• Following DOCTYPE html tag is used
<html lang =‘eng’>
• <html>, <head>, <title>, and <body> tags are required in every
XHTML document
• The whole document must have <html> as its root.
• A document consists of a head and a body .
• Head contains title and meta tags.
• The <title> tag is used to give the document a title, which is normally
displayed in the browser’s window title bar (at the top of the
display) .
• Meta which provides additional information to document.
• < meta charset = “utf-8”/>  it has no content.
• Body specifies the content of document.
2.4 Basic Text Markup

1. paragraphs
•Text is normally placed in paragraph elements
•Paragraph Elements
• The <p> tag breaks the current line and inserts a blank line - the new line gets the beginning of the
content of the paragraph
• The browser puts as many words of the paragraph’s content as will fit in each line

<!DOCTYPE html>
<!-- greet.hmtl
A trivial document
-->
<html lang =“en″>
<head> <title> Our first document </title>
<meta charset =“ utf-8”/>
</head>
<body>
<p>
Greetings from your Webmaster!
</p>
</body>
</html>
2.4 Basic Text Markup (continued)

2.Line breaks
• The effect of the <br /> tag is the same as that of <p>, except for the blank line
• No closing tag! / indicates both beginning and closing

•Example of paragraphs and line breaks


On the plains of hesitation <p> bleach the
bones of countless millions </p> <br />
who, at the dawn of victory <br /> sat down
to wait, and waiting, died.

•Typical display of this text:


On the plains of hesitation

bleach the bones of countless millions


who, at the dawn of victory
sat down to wait, and waiting, died.
3. Preserving white spaces
•Some cases it is desirable to preserve whitespaces in text, which is done
using pre tag.
Eg:
<p><pre>
mary
had a
pretty
</pre></p>
Output:
mary
had a
pretty
4.Headings
2.4 Basic Text Markup (continued)
• Six levels of headings, 1 - 6, specified with <h1> to <h6>

• 1, 2, and 3 use font sizes that are larger than the default font size

• 4 uses the default size

• 5 and 6 use smaller font sizes

• Heading tags always break current line so there content always appear on
new line, browser usually insert some vertical spaces before and after all
headings.
<!DOCTYPE html>
<html lang =“en″>
<head> <title> Our first document </title>
<meta charset =“ utf-8”/>
</head>
<body>
<h1> Aidan’s Airplanes (h1) </h1>
<h2> The best in used airplanes (h2) </h2>
<h3> "We’ve got them by the hangarful" (h3) </h3>
<h4> We’re the guys to see for a good used airplane (h4) </h4>
<h5> We offer great prices on great planes (h5) </h5>
<h6> No returns, no guarantees, no refunds,
all sales are final (h6) </h6>
</body>
</html>
2.4 Basic Text Markup (continued)

5.Blockquotations
• Content of <blockquote>
• To set a block of text off from the normal flow in a document
• Browsers often indent, and sometimes italicize
6.Font Styles and Sizes (can be nested)
• Boldface - <b>
• Italics - <i>
2.4 Basic Text Markup (continued)
There are few tags called content based tags. These tags are not affected by
<blockquote>
•Emphasis tag, <em>,-specifies the content is special.(specifies it in italic
•Strong tag <strong> species the strong element bold
• <code> tag is used to specify the monospace font usually used for program code.

• Superscripts and subscripts


•Subscripts with <sub>
•Superscripts with <sup>
Example: x<sub>2</sub><sup>3</sup>
Display: x23
Inline versus block elements
•The content of the inline tag appears on the current line except br which is
an inline tag eg: <em>,<strong>
•The block tag breaks the current line and appears on newline. eg:heading
& <blockquote>
•Inline tags cannot be nested in the body ,block tags can be nested.
2.4 Basic Text Markup (continued)
7. Character entities
Char. Entity Meaning
& &amp; Ampersand
< &lt; Less than
> &gt; Greater than
” &quot; Double quote
’ &apos; Single quote
¼ &frac14; One quarter
½ &frac12; One half
¾ &frac34; Three quarters
 &deg; Degree
(space) &nbsp; Non-breaking space
8. Horizontal rules
• <hr /> draws a line across the display, after a line break

9.The meta element.

•The meta element Used to provide additional information about a document, with attributes,
not content.
•The 2 attributes that are used are name & conent. commonly used name attribute is
keywords <meta name =“keywords” content=“binary trees, lists, stacks ” />

You might also like