Date: 18.02.2021 Introduction To XHTML
Date: 18.02.2021 Introduction To XHTML
Date: 18.02.2021 Introduction To XHTML
2021
Introduction to XHTML
The eXtensible Hypertext Markup Language (XHTML) is commonly used subset of Hypertext
Markup Language (HTML).
The use of XHTML form of HTML has strictness of syntax rules.
Tags used to specify the presentation of text includes those for line breaks, paragraph breaks,
headings, and block quotations, as well as tags for specifying the style and relative size of fonts.
The formats and uses of images in Web documents.
Next, Hypertext links are introduced.
Three kinds of lists—ordered, unordered, and definition.
The HTML tags and attributes used to specify tables are discussed.
The forms, which provide the means to collect information from Web clients.
Describe and illustrate the audio, the video, the organization elements, and the time element.
Summarizes the syntactic differences between HTML and XHTML.
Good Reference: http://www.w3schools.com
2.1.2 HTML versus XHTML
Until 2010, many Web developers used XHTML to gain the advantages of stricter syntax rules,
standard formats, and validation, but their documents were served as text/html and browsers used
HTML parsers.
Other developers willfully adhered to HTML. They are now enthusiastically climbing on the HTML5
bandwagon. Meanwhile, the XHTML crowd is disappointed and confused at the realization that the
W3C effort to force developers into using the more strict syntactic rules of XHTML to produce
documents less prone to errors is over— W3C had capitulated, apparently willing to accept life in a
world populated by syntactically sloppy documents.
Now, the major browser vendors have all implemented at least some of the more important new
features of HTML5. This makes use of the W3C XHTML 1.0 Strict validator impossible.
There are strong reasons that one should use XHTML.
First: One of the most compelling is that quality and consistency in any endeavor, be it electrical
wiring, software development, or Web document development, rely on standards.
Second: HTML has few syntactic rules, and HTML processors (e.g., browsers) do not enforce the
rules it does have. Therefore, HTML authors have a high degree of freedom to use their own
syntactic preferences to create documents. Because of this freedom, HTML documents lack
consistency, both in low-level syntax and in overall structure. By contrast, XHTML has strict
syntactic rules that impose a consistent structure on all XHTML documents.
Third: Furthermore, the fact that there are a large number of poorly structured HTML documents
on the Web is a poor excuse for generating more.
Fourth: when you create an XHTML document, its syntactic correctness can be checked, either by an
XML browser / by a validation tool. This checking process may find errors that could otherwise go
undetected until after the document is posted on a site and requested by a client possibly with only
a specific browser.
Fifth: The argument that XHTML is difficult to write correctly, this inconvenience is overcome by the
availability of XHTML editors, which provide a simple and effective approach to creating
syntactically correct XHTML documents.
It is also possible to convert legacy HTML document to XHTML documents using software tools.
E.g.: http://tidy.sourceforge.net
There are two issues in choosing between HTML and XHTML: First, one must decide whether the
additional discipline required to use XHTML is worth the gain in document clarity and uniformity in
display across a variety of browsers. Second, one must decide whether the possibility of validation
afforded by authoring XHTML documents is worth the trouble.
2.2 Basic Syntax
The fundamental syntactic units of HTML are called tags. In general, tags are used to specify
categories of content. For each kind of tag, a browser has default presentation specifications for the
specified content.
The syntax of a tag is the tag’s name surrounded by angle brackets (< and >). Tag names must be
written in all lowercase letters. In HTML, tag names and attribute names can be written in any
mixture of uppercase and lowercase letters.
Most tags appear in pairs: an opening tag and a closing tag. The name of the closing tag, when one
is required, is the name of its corresponding opening tag with a slash attached to the beginning. E.g.:
if the tag name is p, its closing tag is </p>.
Whatever appears between a tag and its closing tag is the content of the tag. A browser display of
an HTML document shows the content of all the document’s tags; it is the information the
document is meant to portray. Not all tags can have content.
The opening tag and its closing tag together specify a container for the content they enclose. The
container and its content together are called an element. E.g.: Consider the following element:
<p> This is simple stuff. </p>
Attributes, which are used to specify alternative meanings of a tag, are written between the
opening tag name and its right-angle bracket. They are specified in keyword form, which means that
the attribute’s name is followed by an equal’s sign and the attribute’s value. Attribute names, like
tag names, are written in lowercase letters. Attribute values must be delimited by double quotes. In
HTML, some attribute values, for example, numbers, need not be quoted.
Comments in programs increase the readability of those programs. Comments in HTML serve the
same purpose. They are written in HTML in the following form:
<!–– anything except two adjacent dashes ––>
Browsers ignore HTML comments—they are for people only. Comments can be spread over as many
lines as are needed. For example, you could have the following comment:
<!–– PetesHome.html
This document describes the home document of
Pete's Pickles
––>
Documents sometimes have lengthy sequences of lines of markup that together produce some part
of the display. If such a sequence is not preceded by a comment that states its purpose, a document
reader may have difficulty determining why the sequence is there.
Commenting every line is both tedious and counterproductive. However, comments that precede
logical collections of lines of markup are essential to making a document (or a program) more
understandable.
Besides comments, several other kinds of text that are ignored by browsers may appear in an HTML
document. Browsers ignore all unrecognized tags. They also ignore line breaks. Line breaks that
show up in the displayed content can be specified, but only with tags designed for that purpose. The
same is true for multiple spaces and tabs.
HTML tags are treated more like suggestions to the browser. If a reserved word is misspelled in a
program, the error is usually detected by the language implementation system and the program is
not executed. However, a misspelled tag name usually results in the tag being ignored by the
browser, with no indication to the user that anything has been left out.
Browsers are even allowed to ignore tags that they recognize. Furthermore, the user can configure
his or her browser to react to specific tags in different ways.
2.3 Standard HTML Document Structure
The first line of every HTML document is a DOCTYPE command, which specifies the particular SGML
Document-Type Definition (DTD) with which the document complies. For HTML, this declaration is
simply the following:
<!DOCTYPE html>
An HTML document must include the four tags <html>, <head>, <title> and <body>. (This is
another XHTML rule. Documents that do not include all of these are acceptable in HTML.)
The <html> tag identifies the root element of the document. So, HTML documents always have an
<html> tag following the DOCTYPE command and they always end with the closing html tag,
</html>.
The html element includes an attribute, lang, which specifies the language in which the document
is written, as shown in the following element:
<html lang = "en">
In the above example the language is specified as "en", which means English.
An HTML document consists of two parts, the head and the body. The head element provides
information about the document but does not provide its content. The body of a document
provides the content of the document which itself include tags and attributes.
The head element always contains two simple elements, a title element and a meta element.
The meta element is used to provide additional information about a document. It has no content;
rather, all the information provided is specified with attributes. At a minimum, the meta tag
specifies the character set used to write the document. The most popular international character
set used for the Web is the 8-bit Unicode Transformation Format (UTF-8). This character set uses
from 1 byte to 6 bytes to represent a character, but is backward compatible with the ASCII
character set. This compatibility is accomplished by having all the single-byte characters in UTF-8
correspond to the ASCII characters.
This meta element is required by HTML, but not for XHTML. E.g.: The meta element:
<meta charset = "utf-8" />
The slash at the end of this element indicates that it has no closing tag—it is a combined opening
and closing tag.
The content of the title element is displayed by the browser at the top of its display window,
usually in the browser window’s title bar. The body of a document provides its content.
Following is a skeletal document that illustrates the basic structure:
<!DOCTYPE html>
<!–– File name and document purpose ––>
<html lang = "en">
<head>
<title> A title for the document </title>
<meta charset = "utf-8" />
...
</head>
<body>
...
</body>
</html>
Whenever an element is nested inside a preceding element, the nested element is indented. We will
indent nested elements two spaces, although there is nothing special about that number. As is the
case with programs, the indentation pattern is used to enhance readability.