Known XML Vulnerabilities Are Still A Threat To Popular Parsers and Open Source Systems
Known XML Vulnerabilities Are Still A Threat To Popular Parsers and Open Source Systems
Known XML Vulnerabilities Are Still A Threat To Popular Parsers and Open Source Systems
Abstract—The Extensible Markup Language (XML) is exten- XML External Entities (XXE). First, among publicly available
sively used in software systems and services. Various XML-based parsers, 13 of them were picked that are widely used by
attacks, which may result in sensitive information leakage or
denial of services, have been discovered and published. However,
projects hosted at GitHub and Google Code, the two most
due to development time pressures and limited security expertise, popular open source repositories. We then submitted to each
such attacks are often overlooked in practice. In this paper, parser a set of XML files carefully selected according to
following a rigorous and extensive experimental process, we study a systematic test strategy. These test files can detect if the
the presence of two types of XML-based attacks: BIL and XXE in parser is vulnerable to the two XML-based attacks. Finally,
13 popular XML parsers. Furthermore, we investigate whether
open-source systems that adopt a vulnerable XML parser apply
we observed the behaviour of the parsers in terms of memory
any mitigation to prevent such attacks. Our objective is to provide consumption, CPU time, and parsing results in order to assess
clear and solid scientific evidence about the extent of the threat their vulnerability. Moreover, we also investigated, based on
associated with such XML-based attacks and to discuss the 628 open source projects that use a vulnerable parser, whether
implications of the obtained results. Our conclusion is that most developers properly configured the parser to thwart these
of the studied parsers are vulnerable and so are systems that use
them. Such strong evidence can be used to raise awareness among
XML-based attacks or adopted other mitigation measures. The
software developers and is a strong motivation for developers to obtained results are very alarming: most of the selected parsers
provide security measures to thwart BIL and XXE attacks before are vulnerable to BIL and XXE attacks, and no measures are
deployment when adopting existing XML parsers. taken to prevent such attacks to harm the systems using these
Index Terms—XML Vulnerabilities (BIL, XXE), XML Parsers, parsers.
Security Testing.
The key contribution of this paper includes a large-scale,
systematic experimental assessment of widely-used and well-
I. I NTRODUCTION
known XML parsers and a large number of systems that
XML is the enabling technology used in most of today’s use those parsers, with respect to two common XML-based
web services for exchanging data between service providers attacks. The obtained experimental results provide an unbiased
and consumers. XML is also widely used to store data and con- and extensive evidence of the lack of mitigation for such
figuration files that govern the operation of software systems. attacks. In turn, this can help raise awareness among software
However, a dozen of XML vulnerabilities have been recently developers that appropriate security measures are required for
uncovered and reported [5], [14]; they provide opportunities using such vulnerable XML parsers.
for denial of service attacks or malicious data access and The remainder of the paper is structured as follows. Section
manipulation. As a consequence, many systems that rely II provides detailed background on the two studied XML-
on XML are at risk if they do not properly mitigate these based attacks and discusses related work. Section III describes
vulnerabilities. Such systems include (i) web services, (ii) our study: procedure, results, and discussion. Finally, Section
XML processors, and (iii) other systems that read XML input IV concludes the work.
data or configurations.
The popularity of XML and its wide adoption in software II. BACKGROUND AND R ELATED W ORK
systems make it an attractive target for attackers. A recent
study has revealed that web-based systems experience up to A. Background
26 attacks per minute [1]. Given the fact that XML is a core Standardised by the W3C [2], Document Type Definition
technology of web services and is adopted in many other (DTD) is a mechanism to define legal building blocks (e.g.,
systems, attacks that target XML vulnerabilities are, therefore, elements, types, or content) of XML documents [4]. Many
likely to be frequent. In general, this large number of attacks XML parsers support DTD. However, when these parsers are
and their successful exploitations are attributed to the lack of used improperly or the developers are unaware of such a
a secure coding practice, insufficient training and expertise of DTD support feature, the resulting software systems might be
developers, and inadequate security testing before deployment. vulnerable to DTD-based attacks. XML Billion Laughs (BIL)
In this paper, we test popular parsers for two of the most and XML External Entities (XXE) are two such attacks. This
common XML-based attacks, XML Billion Laughs (BIL) and section describes these attacks and their impact in detail.
234
<?XML version="1.0"?>
<!DOCTYPE lolz [
<!ELEMENT lolz (#PCDATA)>
<!ENTITY lol "lol">
<!ENTITY lol1 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;">
<!ENTITY lol2 "&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;">
<!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;">
<!ENTITY lol4 "&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;">
<!ENTITY lol5 "&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;">
<!ENTITY lol6 "&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;">
<!ENTITY lol7 "&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;">
<!ENTITY lol8 "&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;">
<!ENTITY lol9 "&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;">]>
<lolz>&lol9;</lolz>
Fig. 1. Example of an XML Bomb, an attack that uses the reference mechanism in XML.
carried out to study, in a systematic and rigorous manner, the XML parsers. In total, we selected the 13 most commonly used
presence of XXE and BIL vulnerabilities in modern XML parsers, e.g., the standard Java DOM, Python ETree, Microsoft
parsers and open source systems, which is the goal of this XML parser (MSXML). Table I lists these parsers, their cur-
paper. rent versions and languages, and provides short descriptions.
We evaluated the adoption of the selected parsers in GitHub5
III. E XPERIMENTAL S TUDY and Google Code6 to assess how widely they are used in
This section describes our study about the security of practice. GitHub and Google Code are highly popular open
the most popular parsers and open source software systems source hosting systems. Though there exists a few more
that use them, with respect to XML-based attacks. First, we project hosting systems, such as sourceforge.net, we focused
introduce our research questions and justify our selection of on GitHub and Google Code since they do index source code
subject parsers and systems. Second, we discuss the procedure very well, thus making it easier to query for the use of XML
that we follow to conduct the experiments. Finally, we discuss parsers in the source code of hosted projects.
the implications of the findings and provide recommendations On GitHub, for the Java parsers we used its search feature
for developers. to query for the XML parsing classes. For the other parsers,
the queries are conjunctions of the name of the corresponding
A. Objectives
XML processing classes or libraries and the names of the
We investigate two research questions: methods that parse XML inputs, e.g., “xml.etree.ElementTree”
• RQ1: To which extent are BIL and XXE attacks success- AND “parse”. On Google Code, we used Google Search7 with
ful in modern XML parsers? a site directive to narrow the search to solely code.google.com,
• RQ2: Do software systems, which use one of the vul- and the queries were similar to those for GitHub. For both
nerable parsers, apply mitigation techniques for BIL and repositories, we filtered the results to the specific language that
XXE attacks? a parser supports. Table II shows the frequency with which
We scoped our research by focusing on parsers that are in- these parsers on GitHub and Google Code were adopted.
tegrated with modern programming languages and are popular These numbers might be over-approximated since the search
in open source systems. We expect such parsers to be widely can return code that was commented out or unused (discussed
used in practice. We focus on parsers since, when XML inputs in Section III-E). The total number of adoptions of the
are submitted to a system, an XML parser used in the system parsers in both repositories goes above half a million. Except
needs to treat those inputs first. And if the parser is vulnerable, WOODSTOX, which is adopted about 500 times, the others are
the impact can be escalated to its encompassing system. In much more frequently used, ranging from a few thousand to a
fact, there exist many other proprietary parsers. These parsers hundred thousand times. This clearly shows that the selected
can potentially also be vulnerable to BIL and XXE if their parsers are widely used.
developers lack knowledge about such attacks. However, they C. Experimental Procedure
are out of the scope of our study.
Our experimental procedure consisted of the following
B. Subject Selection steps: (i) writing code to invoke each parser and pass XML
We first selected the parsers that come with modern pro- 5 https://github.com
gramming languages, including Java, Python, PHP, Perl, C#. 6 https://code.google.com
235
TABLE I
L IST OF 13 POPULAR XML PARSERS SELECTED FOR OUR ASSESSMENT.
TABLE II
T HE USE OF THE PARSERS IN OPEN SOURCE SYSTEMS , DATA COLLECTED FROM G IT H UB AND G OOGLE C ODE AS OF AUGUST 12 TH , 2014.
236
reused to test new parsers for BIL and XXE attacks.
D. Results on the Parsers
XXE: Concerning XXE, we manually inspected the results
obtained from each parser when they were fed with the three
XXE attacks. We found that the vulnerable parsers attempt
to expand the parsing results to include the content of the
referred files, specified in the XXE attacks. If the expansion
is successful, the content of the referred file (e.g., /etc/passwd)
is included in the parsing result. Otherwise, an exception
stating an access permission error is returned. Nevertheless, Fig. 3. An output example of Google Chrome that recognises an input XML
Bomb and raises an exception.
both cases indicate an XXE vulnerability because the parsers
try to access the content of the referred file. The other non-
vulnerable parsers blocked the entity expansion and returned
an error reporting the issue.
As a concrete example, Figure 2 depicts the parsing result
of a vulnerable parser where the parser was able to acquire
the content of the file referred to by the entity in the XML
file. In this example, this content has been included within the
tag <foo>.
237
of different sizes. We observe that the amount of memory TABLE III
S UMMARY OF BIL AND XXE VULNERABILITIES IN THE PARSERS . W E
required to parse XML bombs increases exponentially, for REPORT WHICH PARSERS ARE VULNERABLE TO BIL AND XXE.
all the vulnerable parsers. For the XML bomb of size 6x10,
the parsers require at least 33Mb RAM to parse the input.
Parser Vul. to BIL Vul. to XXE
When the size is equal or greater than 7x10, the amount of JDOM2 Yes Yes
memory consumed increases significantly, from 200Mb up to NanoXML Yes Yes
8Gb, which is the limit we set for each run. NanoXML-LITE No No
Std-DOM Yes Yes
Std-SAX Yes Yes
Std-STAX No No
WOODSTOX No No
XERCES-JDOM Yes Yes
LXML-ETREE No No
Std-ETREE Yes No
PERL(XML::LibXML) Yes Yes
PHPDOM No No
MSXML (DOMDocument) Yes Yes
Total 8 7
238
TABLE IV
A SAMPLE OF 99 OPEN SOURCES PROJECTS AMONG THOSE SELECTED IN OUR STUDY. T HE PROJECTS ARE ACCESSIBLE BY APPENDING THESE NAMES TO
GITHUB . COM , AS OF AUGUST 2014.
of our selected parsers by software developers, our inspection following attributes and their values in the source code of the
of the 1000 Java source files demonstrates that approximately selected Java classes.
75% (749 out of 1000) of these classes actually use the Std-
DOM parser, while the remaining 25 % do not. Assuming 25% Attribute Value
is an accurate estimate for the remaining parsers in Table I, “http://apache.org/xml/features/disallow- true
the total number of source classes that use our selected parsers doctype-decl”
should still number more than 400,000. Also, note that our “http://xml.org/sax/features/external- false
general-entities”
search results are based on only two repositories (GitHub and
“http://xml.org/sax/features/external- false
Google Code) among other repositories where these selected parameter-entities”
parsers could be used. This increases the number of their “JDK ENTITY EXPANSION LIMIT” a numeric value
adoptions. Therefore, we are certain that our selected parsers “FEATURE SECURE PROCESSING” true
are widely used by software developers.
Our assessment on whether a system that makes use of Std- Out of 749 selected Java source files (belonging to 628
DOM deals appropriately with BIL and XXE vulnerabilities is GitHub projects) that use Std-DOM to parse XML inputs, we
based on the application of known fixes, i.e., properly setting found only one file that properly sets one of the above prop-
the attributes of the parser (through the DocumentBuilder- erties to avoid being attacked through BIL and XXE vulnera-
Factory class) before using it to parse an XML input. For bilities. Among the remaining files, 735 classes (98,13%) are
example: clearly vulnerable. The other 14 classes cannot be confirmed to
be vulnerable with certainty since they use a DocumentBuilder
dbf = DocumentBuilderFactory.newInstance(); object created elsewhere in their corresponding projects and
dbf.setFeature("http://apache.org/xml/features/
disallow-doctype-decl", true);
security properties might be set from there. Nevertheless, these
results indicate that developers (at least the owners of the se-
In our assessment, we seek for the presence of any of the lected projects) have neglected to address these vulnerabilities.
239
TABLE V
T ESTED APPLICATIONS THAT ARE VULNERABLE TO BIL AND XXE.
Application Description
websphere-portal-plugin A plugin for WebSphere Portal for deploying WAR, EAR, PORTLETS, EXPORT/IMPORT XM-
LACCESS. https://github.com/JuanyongZhang/websphere-portal-plugin
File-Archiver-Main An application to combine a number of files together into one archive file.
https://github.com/DymaKulia/FileArchiverMain
AppDF A Project to facilitate easy uploading of an android application along with its supporting files to
several appstores by creating a single archive AppDF file. https://github.com/onepf/AppDF
source2XMI Convert the Java source code to XMI file. https://github.com/wbssyy/source2XMI
jbm-to-hornetq A tool to facilitate migration from JBM12 to HornetQ13 messaging platform. HornetQ is an open
source asynchronous messaging project from JBoss. https://github.com/gaohoward/jbm-to-hornetq
fastcatsearch An open source distributed search engine. https://github.com/fastcatsearch-/fastcatsearch
bimoku Crawler A web crawler.
https://github.com/cncduLee/bbks-crawer/tree/master/crawler/bimoku/crawler
blog A Java blog engine. https://github.com/IgorInger/blog
In addition, speculating that there could be workarounds are required, they should refer to trusted sources only. Known
to deal with vulnerabilities, we downloaded eight random vulnerabilities of the parser and their fixes should be inves-
systems (Table V) and analysed their entire source code. We tigated and input sanitisation should be done before parsing
found that one of them had a vulnerable Java class which used XML content. Adequate security testing of the parser should
the parser but the class was not used elsewhere in the project also be performed.
(i.e., orphan code), and the others seven were vulnerable: there Recommendations for Parser Developers: Developers of
was no mitigation along the control flow from reading XML XML parsers need to be fully aware of all potential XML-
inputs until they are parsed. Therefore, we conclude it is based attacks and should be able to provide countermeasures
unlikely that developers made use of other methods to deal wherever possible. It was observed, during our experiment,
with BIL and XXE vulnerabilities. that some vulnerabilities can be exploited because of the
Regarding research question RQ2 we found that: features allowed in the default configurations of XML parsers.
Parser developers should provide Secure Default Configura-
It is highly likely that systems that use a BIL- or tions and provide alerts when any potentially insecure feature
XXE-vulnerable XML parser do not apply any proper is enabled via making changes to the default configurations.
mediating measure and are hence vulnerable. Parser developers should perform security testing of their
F. Discussion and Recommendations parsers. They should also provide better documentation in-
cluding the potential risks of enabling any feature. This would
Our extensive study has demonstrated that BIL and XXE
guide software developers in using their parser in a secure way.
attacks are in most cases neglected by the developers of XML
Threats to Validity: Regarding the validity related to
parsers and software systems that adopt them. Since these
whether or not experimental results can be considered gen-
attacks are well-known and applying them is straightforward
eralisable and representative, we have selected a large number
(it is easy to create XML test files and send them to a target
of parsers from various programming languages and domains,
system), leaving them unaddressed before deployment might
and considered in our study their latest versions. Furthermore,
have severe consequences. As demonstrated from our results,
we evaluated also their adoption on GitHub and Google Code,
a vulnerable XML parser can consume a huge amount of
the two most popular open source repositories, to make sure
memory and CPU time as the result of an attack. This typically
that the selected parsers are used in practice. As a result, we
renders the system running the XML parser unavailable for
are fairly confident that the vulnerabilities we detected both
legitimate users. Similarly, the confidentiality of information
in parsers and the systems using them suggest a worrying but
residing on the system running the vulnerable XML parser is
representative state of practice.
at risk. Exploiting XXE, an attacker can get access to such
Moreover, although we consider only one parser in the
information.
evaluation of 628 open source projects, it is the most popular
Recommendations for Software Developers: Because soft-
one; and given the clear results we obtained - only one of
ware systems that improperly use vulnerable parsers are also
the projects properly deals with BIL and XXE attacks, it is
vulnerable, we recommend that developers of such systems
unlikely that mitigation techniques are implemented in other
should pay special attention to preventing such attacks if they
projects adopting other vulnerable parsers.
decide to adopt a third-party XML parser, even if it is provided
by a high-profile vendor, such as Oracle or Microsoft. In IV. C ONCLUSION
order to block BIL and XXE attacks, software developers
should gain full understanding of the XML parser that they In this paper, we study the potential of two major types
are considering to adopt and avoid its insecure features (e.g., of XML-based attacks: XML Billion Laughs (BIL) and XML
using Schema instead of DTD). If external entity references External Entities (XXE) that may undermine today’s XML
240
parsers and systems making use of those parsers. We con- [7] E. Bertino, L. Martino, F. Paci, and A. Squicciarini. Security for Web
Services and Service Oriented Architectures. Springer, 2010.
ducted a systematic and large-scale experiment to test the [8] M. R. Brenner and M. R. Unmehopa. Service-oriented architecture
most popular XML parsers for these attacks by measuring and web services penetration in next-generation networks. Bell Labs
their impact on CPU time and memory consumption. Our Technical Journal, 12(2):147–159, 2007.
[9] R. Chang, G. Jiang, F. Ivancic, S. Sankaranarayanan, and V. Shmatikov.
main objective is to provide representative, unbiased results Inputs of coma: Static detection of denial-of-service vulnerabilities. In
of the extent of the problem in popular parsers and open Computer Security Foundations Symposium, 2009. CSF ’09. 22nd IEEE,
source systems. We designed our experiment to achieve these pages 186–199, July 2009.
[10] J. Chen, Q. Li, C. Mao, D. Towey, Y. Zhan, and H. Wang. A web services
objectives and reported the results in great detail. vulnerability testing approach based on combinatorial mutation and soap
We have studied 13 XML parsers that are widely used in message mutation. Service Oriented Computing and Applications, 8:1–
open source systems hosted on GitHub and Google Code. 13, 2014.
[11] W. Chunlei, L. Li, and L. Qiang. Automatic fuzz testing of web service
Each was tested against BIL and XXE test cases. Executing vulnerability. In Information and Communications Technologies (ICT
these tests on the vulnerable parsers took exceptionally high 2014), 2014 International Conference on, pages 1–6, May 2014.
amounts of CPU time and memory that could not have been [12] Y. Demchenko, L. Gommans, C. de Laat, and B. Oudenaarde. Web
services and grid security vulnerabilities and threats analysis and model.
efficiently carried out without our HPC platform [23]. The In The 6th IEEE/ACM International Workshop on Grid Computing.
obtained results show that most of the selected parsers are IEEE, 2005.
vulnerable to BIL and XXE exploits. Furthermore, we extended [13] A. Falkenberg, C. Mainka, J. Somorovsky, and J. Schwenk. A new
approach towards dos penetration testing on web services. In Web
our experiment to evaluate more than 700 classes from 628 Services (ICWS), 2013 IEEE 20th International Conference on, pages
open source systems that use a vulnerable XML parser and 491–498, June 2013.
found that all but one of them are vulnerable as well, thus [14] A. N. Gupta and D. P. S. Thilagam. Attacks on web services need to se-
cure xml on web. Computer Science and Engineering, An International
showing that parsers’ vulnerabilities are not properly addressed Journal (CSEIJ), 3(5), 2013.
by the systems using them. [15] M. Jensen, N. Gruschka, and R. Herkenhner. A survey of attacks on web
Such alarming results call for software developers to take services. Computer Science - Research and Development, 24(4):185–
197, 2009.
appropriate security measures before using these vulnerable [16] C. Mainka, J. Somorovsky, and J. Schwenk. Penetration testing tool
XML parsers in their software development projects. More- for web services security. In Services (SERVICES), 2012 IEEE Eighth
over, parser developers need to fix the problems and/or provide World Congress on, pages 163–170, June 2012.
[17] R. Oliveira, N. Laranjeiro, and M. Vieira. Wsfaggressor: An extensible
better documentation to help developers configure such parsers web service framework attacking tool. In Proceedings of the Industrial
to secure their usage. In future work, we will focus on testing Track of the 13th ACM/IFIP/USENIX International Middleware Confer-
XML parsers and open source systems for more XML-based ence, MIDDLEWARE ’12, pages 2:1–2:6. ACM, 2012.
[18] S. Orrin. The soa/xml threat model and new xml/so/web 2.0 attacks and
vulnerabilities, XML Injections or Signature for example. threats. In DEFCON 15, 2007.
ACKNOWLEDGEMENTS [19] S. Padmanabhuni, V. Singh, K. Senthil Kumar, and A. Chatterjee.
Preventing service oriented denial of service (presodos): A proposed
This work was supported by the National Research Fund, approach. In Web Services, 2006. ICWS ’06. International Conference
Luxembourg (FNR/P10/03 and FNR6024200). We thank the on, pages 577–584, Sept 2006.
[20] V. Patel, R. Mohandas, and A. R. Pais. Attacks on web services
testing and security team of our industry partner, CETREL, and mitigation schemes. In Security and Cryptography (SECRYPT),
for their collaboration in this project. Proceedings of the 2010 International Conference on, pages 1–6, July
2010.
R EFERENCES [21] S. Suriadi, A. Clark, and D. Schmidt. Validating denial of service
[1] Imperva Web Application Attack Report. http://www.imperva.com/docs/ vulnerabilities in web services. In Network and System Security (NSS),
HII Web Application Attack Report Ed4.pdf. Accessed: 2014-07-05. 2010 4th International Conference on, pages 175–182, Sept 2010.
[2] W3C XML Standard. http://www.w3.org/TR/REC-xml/. Accessed: [22] S. Tiwari and P. Singh. Survey of potential attacks on web services
2014-07-01. and web service compositions. In Electronics Computer Technology
[3] XML External Entity Injection. http://securityhorror.blogspot.com/2012/ (ICECT), 2011 3rd International Conference on, volume 2, pages 47–
03/what-is-xxe-attacks.html. Accessed: 2014-06-27. 51, April 2011.
[4] XML Terminology. http://en.wikipedia.org/wiki/XML/. Accessed: 2014- [23] S. Varrette, P. Bouvry, H. Cartiaux, and F. Georgatos. Management of an
02-20. academic hpc cluster: The ul experience. In Proc. of the 2014 Intl. Conf.
[5] XML Vulnerabilities Introduction. http://resources.infosecinstitute.com/ on High Performance Computing & Simulation (HPCS 2014), Bologna,
xml-vulnerabilities/. Accessed: 2014-06-22. Italy, July 2014. IEEE.
[6] N. Antunes and M. Vieira. Comparing the effectiveness of penetration [24] X. Ye. Countering ddos and xdos attacks against web services. In
testing and static code analysis on the detection of sql injection vul- Embedded and Ubiquitous Computing, 2008. EUC ’08. IEEE/IFIP
nerabilities in web services. In Dependable Computing, 2009. PRDC International Conference on, volume 1, pages 346–352, Dec 2008.
’09. 15th IEEE Pacific Rim International Symposium on, pages 301–306,
Nov 2009.
241