A Reference Architecture For Web Browsers
A Reference Architecture For Web Browsers
A Reference Architecture For Web Browsers
onto the conceptual architectures of two other systems. Sec- 1993 Closed−source
2.0 Mosaic
tion 6 discusses related work, and Section 7 presents con- 1994
1.0 Hybrid
Internet
clusions. Explorer Netscape
1995 1.0 W3C founded
2.4 1.0 2.0
2 The Web Browser Domain 1996 2.0 2.0
Opera
2.1
3.0 3.0
1997 3.0 3.0
2.1 Overview 4.0
4.0 Mozilla
1998 1998−03−31
Konqueror
The World Wide Web (WWW) is a shared information 4.5 1.0
1999 5.0
system operating on top of the Internet. It consists of doc-
uments, as well as other types of files such as images and 2000
video clips, which are transmitted via HyperText Transfer 5.5 6.0 M18 2.0 4.0
2001 5.0
Protocol (HTTP), a stateless and anonymous means of in-
Galeon 6.0
formation exchange. A web browser is a program that re- 6.0 1.0
2002
trieves these documents from remote servers and displays 7.0 1.0 3.0 Safari
1.2 Firefox
them on screen, either within the browser window itself or 2003 0.5 0.8 7.0
Linked
Data Persistence
Predefined program
Facts with File level facts
landscape decomp− program
schema osition facts
addschema addcontain liftfile Browser Engine
Hierarchichal
Software subsystem
landscape decomposition
Concrete Conceptual
Architecture Architecture Legend Rendering Engine
Tool
Adjustments
jgrok script
Data flow
lsedit JavaScript XML
Human interaction Networking Interpreter Parser Display Backend
Figure 2. Extraction process for concrete ar- Figure 3. Reference architecture for web
chitecture browsers
Persist.
User Interface
User
UI Toolkit (XPFE) User Interface
Data Persistence
User Interface
KWallet
Browser Engine
Data Persistence
Browser Engine
Browser Engine
Gecko
Persist.
Browser
Rendering Engine KHTML
Rendering Engine
Spider− GTK+
Necko Monkey Expat KIO KJS
Adapter XML Display Backend
Parser
JavaScript XML Networking
Interpreter Parser Qt / X11 Libraries
Security GTK+ / X11 Libraries PCRE
(NSS/PSM)
JavaScript
Networking Display Backend Interpreter
it would be difficult to reuse the Rendering Engine by effort to involve nondevelopers with areas such as documen-
itself. tation, user interface design, issue tracking, and testing.
The mapping of Konqueror’s conceptual architecture
• All graphical elements in the user interface and web
onto the reference architecture is shown in Figure 5. Kon-
pages are specified in Extensible User Interface Lan-
queror makes extensive use of various KDE libraries:
guage (XUL), which abstracts away the details of dif-
KHTML performs parsing, layout, and rendering of web
ferent platform-specific display and widget libraries.
pages; KJS interprets embedded JavaScript code; KWal-
XUL is then mapped onto these each of these libraries
let stores data such as passwords, cookies, and form data
using specially written adapter components. This ar-
with strong encryption and error detection; and KIO is an
chitecture distinguishes Mozilla from other browsers
asynchronous virtual file system which automatically pro-
in which the user platform-specific display and wid-
vides encoding and decoding over common protocols. We
get libraries are used directly, and allows Mozilla to be
note the following observations about the conceptual-to-
ported to different platforms with minimal difficulty.
reference architecture mapping:
4.2 Konqueror • The XML Parser and Display Backend subsystems are
both provided by the Qt[21] toolkit, which serves as
Konqueror[10] is the official web browser of the K Desk- the basis for all KDE applications. That is, these sub-
top Environment (KDE)[9]. It can also serve as a file man- systems are external to the browser itself.
ager and a general-purpose file viewer. The project was • The Perl Compatible Regular Expressions (PCRE) li-
started in January 1999, and its main design goals are speed, brary is used as a backend for the regular expression
standards-compliance, and integration with KDE. We ex- functionality of the JavaScript Interpreter. PCRE is a
amined release 3.3.2, which consists of approximately 613 mature and well tested component used in many other
kLOC, including the required KDE libraries. Konqueror is high-profile open source projects including Python and
written entirely in C++, as is most of the code in KDE. Apache.
We found Konqueror’s codebase to be extremely well or-
ganized. Modules were split up cleanly into subdirectories • Data Persistence is provided at three levels. First,
and there was often a concise design document included some high-level data such as bookmarks and history
with the code explaining the main abstractions and design are stored by Konqueror itself. Second, other high-
decisions. This may be in part due to the extensive docu- level data such as form completions are stored by
mentaion provided by the KDE Quality Team that details KHTML. Third, secure data such as passwords are
various design guidelines and best practices for KDE ap- stored by KWallet, which allows this data to be shared
plication development. This group also makes a conscious with other KDE applications.
Overall, we found that Konqueror’s developers have
made a concious effort to implement the browser on top User Interface
of existing libraries which take care of difficult tasks. In
Data Persistence
contrast, Mozilla has developed almost all these libraries in- Browser Core
house, delegating only to other libraries only when neces-
sary. A consequence of this is that Konqueror is closely tied
Browser Engine
to UNIX-like operating systems and the Qt toolkit, while
Mozilla supports several different operating systems and
display toolkits. However, as we will see in the next sec- Rendering Engine
tion, Apple was able to adapt Konqueror to their own needs
by removing many of its dependencies.
JavaScript XML
5 Validating the Reference Architecture wwwlib Interpreter Parser Curses
Persist.
User Interface
User
complementary libraries: Carbon and Cocoa. Car-
User Interface bon provides a lower-level C API for display routines,
Data Persistence
while Cocoa provides a higher-level Objective C API.
Keychain
Brower Engine (Web Kit)
• Persistent data is handled by three separate system-
Browser Engine wide services that are built into OS X: Preferences,
Keychains, and Caches. The use of these services al-
Persist.
Browser
Adapter (KWQ) lows Safari to to integrate smoothly with other OS X
KHTML applications.
Rendering Engine Overall, Safari’s conceptual architecture corresponds
Core
well with our reference architecture. Safari reuses the core
Foundation KJS Expat Cocoa / Carbon engine from Konqueror, substitues a Mac OS X look and
feel, and makes use of other components and libraries na-
Networking XML Display Backend
PCRE Parser tive to OS X in place of the Linux- and KDE-specific com-
ponents of Konqueror.
JavaScript
Interpreter
5.3 Summary
Figure 7. Architecture of Safari
There are several reasons why a web browser’s architec-
ture would differ from our reference architecture. Some of
5.2 Safari the subsystems in the reference architecture may be imple-
mented as a single subsystem for simplicity, while others
Safari[22] is a web browser developed by Apple Com- may implemented across multiple subsystems for greater
puter for its Mac OS X operating system. The first version flexibility. Furthermore, new subsystems may be added to
was released in January 2003. The main design goals for provide additional capabilities not found in traditional web
Safari are usability, speed, standards-compliance, and in- browsers, while other subsystems may be omitted to make
tegration with OS X. Safari reuses the KHTML rendering the browser more lightweight.
engine and the KJS JavaScript interpreter from the KDE Lynx’s conceptual architecture is much simpler than our
project. Their modified version is called WebCore, and reference architecture. Some subsystems are missing be-
is released under the GNU Lesser General Public License cause they correspond to relatively modern features which
(LGPL). However, the rest of Safari’s code is proprietary, either are not applicable to text-only browsers, or simply
including the browser engine (WebKit) and the user inter- are not supported yet in Lynx. Other subsystems are tighly
face. We examined the source code of release 125 of We- coupled as a result of Lynx’s overall lack of modularity.
bCore and JavaScriptCore, which consists of 114 kLOC of Safari’s conceptual architecture corresponds quite
C++ code and 22 kLOC of Objective C++. Since we could closely to our reference architecture. This makes sense
not extract the proprietary parts, their structure was inferred because Safari is based on the same rendering engine and
from Apple’s developer documentation[1]. JavaScript interpreter as Konqueror; furthermore, it seems
The conceptual-to-reference architecture mapping for as though Apple has used Konqueror as a blueprint for Sa-
Safari is shown in Figure 7 We note the following obser- fari, substituting OS X technologies for the corresponding
vations about Safari’s archtiecture: KDE technologies. Additionally, we observe that Safari
uses the Expat XML parser, which is also found in Mozilla.
• The Rendering Engine is composed of the KHTML Table 1 shows various statistics about the different web
core engine wrapped in the KWQ adapter. KWQ is browsers studied. We note the following observations:
written in Objective C++, which allows it to present an
Objective C API to KHTML, which is written in C++. • Konqueror achieves nearly the same degree of
This was needed for integrating Safari into OS X. standards-compliance as Mozilla with one-quarter of
the amount of code. This may be due to the fact that
• Networking functionality is provided by OS X’s Core Mozilla supports many different platforms, while Kon-
Foundation networking library, used in place of KIO. queror only supports UNIX-like systems running X11
with the Qt toolkit.
• The XML Parser subsystem is provided by the Expat
XML parser, used in place of the XML parser provided • Lynx, while smaller than the other browsers, is
by the Qt toolkit. nonetheless very large for a text-based browser. For
architecture as part of a study investigating data exchange
Table 1. Approximate web browser statistics between different reverse engineering tools[30]. Mockus,
Project Rel.Lang. Files kLOC Size* Start
Fielding, and Herbsleb have used Mozilla as part of a
Mozilla 1.7.3C++ 10,500 2,400 29 1998
Konq. 3.3.2C++ 3,145 600 17 1996
case study of open source software projects[32]. Fischer,
Lynx 2.8.5C 200 122 2.1 1992 Pinzger, and Gall have analyzed the proximity of fea-
Safari 1.2C++, >750 >136 >2.1 2003 tures in Mozilla based on data in its bug-tracking database,
Obj C Bugzilla[28].
*Represents the compressed tarball size in megabytes.
7 Conclusions
comparisions sake, Links[11], a more recent text-only
browser with a comparable feature set, consists of only We have examined the history and evolution of the web
26 kLOC, approximately one-fifth the size of Lynx. browser domain, developed a reference architecture for web
This may be due to the large amount of legacy code browsers based on two existing implementations, and vali-
in Lynx. dated this reference architecture by mapping it onto two ad-
ditional implementations. Furthermore, we have observed
• We are unable to obtain complete size information for several interesting evolutionary phenomena while studying
Safari because a large portion of the code is closed web browsers; namely, emergent domain boundaries, con-
source. The numbers shown correspond only to the vergent evolution, and tension between open and closed
WebCore engine, and thus represent a lower-bound on source development approaches.
the total size. As the web browser domain has evolved, its concep-
tual boundaries—both external and internal—have become
We are currently investigating how the conceptual archi-
increasingly more defined. However, there are still dis-
tectures of the Mosaic[14], Dillo[5], and Galeon[8] web
crepancies as to the nature of these boundaries. For ex-
browsers correspond to our reference architecture. We
ample, Microsoft has claimed that Internet Explorer is a
would also like to examine web browsers designed specif-
fundamental part of the Windows operating systems, pro-
ically for embedded devices, but at the present time we do
viding rendering functionality to other applications such as
not know of any mature open source implementations.
help browsers and wizards. This extended boundary posed
a problem for third-party browsers such as Netscape who
6 Related Work sought to compete with IE. In a similar example, we have
seen email and usenet client functionality integrated with
There has been some previous research involving ref- the web browser starting with Netscape, and continuing
erence architectures. Eixelsberger has recovered a refer- with the Mozilla Suite. This integration has potentially
ence architecture from a family of embedded, real-time made it more difficult for external clients to compete. Fur-
train control systems, each around 150 kLOC[27]. He ther examples of domain integration include FTP clients and
used a formal Architectural Description Langugage (ADL) local file managers. It will be interesting to observe how the
to describe each system, and then performed commonal- web browser domain adapts to support embedded devices,
ity analysis. Batory, Coglianese, Goodwin, and Shafer such as cell phones and PDAs; these platforms often have
have defined a reference architecture for avionics as part limited amounts of memory, making it undesirable to have
of a project to build a domain-specific software architec- multiple competing applications installed at once.
ture (DSSA) environment for assisting the development of The large amount of effort devoted to creating high-
avionics software.[24]. Hassan and Holt have defined a ref- quality open source browser implementations has had a
erence architecture for web servers, and shown how it maps tremendous influence on the domain. During the “browser
to the conceptual architectures of three systems[31]. wars,” core browser components included proprietary ex-
A product line architecture specifies the architecture for tensions in order to attract customers. Today, increased
a group of products sharing a common, managed set of standardization and pressure to comply with these stan-
features[25, 26]. Product line architectures are similar to dards has led to reuse of core browser components. Rather
reference architectures, although they generally represent a than duplicate effort, browsers often attempt to differenti-
group of systems intended to be produced by a single orga- ate themselves by providing interface enhancements; how-
nization, while reference architectures represent the entire ever, these features seem to be easily duplicated. For ex-
spectrum of systems in a domain. ample, after tabbed browsing was pioneered by NetCap-
Finally, there have been some previous case studies ex- tor, it quickly began appearing in other browsers such as
amining various aspects of Mozilla’s architecture and devel- Opera and Mozilla. Similarly, popup blocking and auto-
opment process. Godfrey and Lee have extracted Mozilla’s matic web form filling are now commonplace, suggesting
that web browser domain is exhibiting a form of convergent [11] Links web browser home page. http://links.
evolution[29]. sourceforge.net.
The availability of mature browser components has also [12] Lynx web browser home page. http://lynx.isc.org.
resulted in tension betwen open and closed source devel- [13] Maxthon web browser home page. http://www.
maxthon.com.
opment approaches. The Mozilla project was founded with
[14] Mosaic web browser home page. http:
the intention of creating a mature, open source browser plat- //archive.ncsa.uiuc.edu/SDG/Software/
from that could be used as the basis for other browsers, both Mosaic/NCSAMosaicHome.html.
open and closed. Indeed, the last two releases of Netscape [15] Mozilla application suite transition plan. http://www.
have been based on Mozilla and have been closed source. A mozilla.org/seamonkey-transition.html.
similar situation has occurred with Apple’s Safari, which is [16] Mozilla project home page. http://www.mozilla.
a closed source browser based on Konqueror’s open source org.
engine. Although not required by the licence, Apple has [17] Netcaptor web browser home page. http://www.
voluntarily contributed their changes to open source code netcaptor.com.
[18] Omniweb web browser home page. http://www.
back to the community. Conversely, Internet Explorer rep-
omnigroup.com/applications/omniweb.
resents a closed source browser component that can po- [19] Opera web browser home page. http://www.opera.
tentially be embedded in an otherwise open source prod- com.
uct. Interstingly enough, the upcoming version of Netscape [20] QLDX reverse engineering toolkit home page. http://
promises to embed both the Mozilla and IE engines, allow- swag.uwaterloo.ca/qldx.
ing users to switch on the fly. [21] Qt application development framework home page. http:
While we have seen applications composed of both open //www.trolltech.com/products/qt.
and closed source components before, the interaction usu- [22] Safari web browser home page. www.apple.com/
safari.
ally takes place on the perimeter, as is the case with closed
[23] Webcore framework home page. http://developer.
source binary modules for the Linux kernel. We believe the apple.com/darwin/projects/webcore.
heterogeneous combination of core open and closed source [24] D. Batory, L. Coglianese, M. Goodwin, and S. Shafer. Cre-
software components within individual systems makes the ating reference architectures: an example from avionics. In
web browser domain unique and interesting. Proceedings of the 1995 Symposium on Software Reusability
(SSR ’95), pages 27–37, 1995.
[25] J. Bosch. Design and use of software architectures:
Acknowledgements adopting and evolving a product-line approach. ACM
Press/Addison-Wesley Publishing Co., New York, NY,
We thank Ali Echihabi for his contributions to an earlier USA, 2000.
project out which this paper has grown, as well as Ric Holt [26] P. Clements and L. M. Northrop. Software product lines:
for his feedback and advice. practices and patterns. Addison-Wesley Longman Publish-
ing Co., Inc., Boston, MA, USA, 2001.
[27] W. Eixelsberger, M. Ogris, H. Gall, and B. Bellay. Software
References architecture recovery of a program family. In Proceedings of
the 20th International Conference on Software Engineering
[1] Apple developer documentation. http://developer. (ICSE ’98), pages 508–511, 1998.
apple.com/documentation. [28] M. Fischer, M. Pinzger, and H. Gall. Analyzing and relating
[2] Avant web browser home page. http://www. bug report data for feature tracking. In Proceedings of the
avantbrowser.com. 10th Working Conference on Reverse Engineering (WCRE
[3] Camino web browser home page. http: ’03), pages 90–99, 2003.
//caminobrowser.org. [29] D. J. Futuyma. Evolutionary Biology. Sinauer Associates,
[4] Cascading Style Sheets home page. http://www.w3. Sunderland, MA, USA, 3rd edition, 1998.
org/Style/CSS. [30] M. Godfrey and E. H. S. Lee. Secrets from the monster:
[5] Dillo web browser home page. http://dillo.org. Extracting mozilla’s software architecture. In Second Inter-
[6] An early history of Lynx. http://www.cc.ku.edu/ national Symposium on Constructing Software Engineering
˜grobe/early-lynx.html. Tools (CoSET ’00), June 2000.
[7] ECMAScript language specification. http://www. [31] A. E. Hassan and R. C. Holt. A reference architecture for
ecma-international.org/publications/ web servers. In Proceedings of 7th the Working Conference
standards/Ecma-262.htm. on Reverse Engineering (WCRE ’00), pages 150–160, 2000.
[8] Galeon web browser home page. http://galeon. [32] A. Mockus, R. T. Fielding, and J. Herbsleb. A case study
sourceforge.net. of open source software development: the apache server. In
[9] K Desktop Environment home page. http://kde.org. Proceedings of the 22nd International Conference on Soft-
[10] Konqueror web browser home page. http: ware Engineering (ICSE ’00), pages 263–272, 2000.
//konqueror.org.