Sec23summer - 181 Wen Prepub
Sec23summer - 181 Wen Prepub
Sec23summer - 181 Wen Prepub
• We are the first to propose effective techniques for CFG and 3 Overview
symbol recovery in Qt binary RE (§3), by leveraging Qt’s
unique signal and slot as well as its dynamic introspection. 3.1 Objective
• We designed (§4) and implemented Q T RE (§5), an Reverse engineering (RE) of binaries is fundamental in
open-source tool to facilitate Qt binary analysis. It uses a computer security. In addition to binary code comprehen-
source-aware class inference to resolve indirect call targets sion [48, 68], RE has been the fundamental building block
of function callbacks, and a unit-level symbolic execution of many security applications, including but not limited
to recover semantic symbols. to vulnerability discovery [27, 28, 61, 71, 78], malware
• We evaluated Q T RE on KDE and Tesla Model S firmware, analysis [25, 33, 35, 73, 74], binary retrofitting [21, 29, 54,
where it additionally recovered 10,867 callbacks and 55, 70, 72, 75] , and exploit generation [24, 36, 37, 61]. In
24,973 symbols (§6). From the Tesla firmware, it discov- this paper, we present Q T RE, a static binary analysis tool to
ered 12 hidden commands with five new to the public (§7). facilitate reverse engineering of Qt binaries. In particular, it
1 MainWindow::MainWindow() {
2 ...
mechanism [19] to address this issue. Essentially, signals
3
4
// Create lineEdit instance
v0 = operator.new(0x30)
and slots are functions defined with special macros (signals
5 QLineEdit(v0) and slots), where a signal represents an event that an object
6 *(this + 0x30) = v0
7 ... fires, and a slot captures the event of its interest.
8 // Register callbacks
9 connect(*(this+0x30),“2textChanged(QString)” We further illustrate exactly how Qt’s signal and slot work
10 , this, “1updateText(QString)”, 0)
11 with a running example in Figure 1. At lines 9-13, the program
12 connect(*(this+0x30),“2editingFinished()”
13 , this, “1handleInput()”, 0) registers two callback functions by invoking a Qt library
14 ... editing
textChanged
15 } Finished function connect. Using the call site at line 9 as an example,
Signals
the function takes five parameters as input, including the
16 MainWindow::updateText(QString v1) {
17 // Slot
signal class object (a QLineEdit object), the signal function
18
19
if (v1 != null)
*(this + 0x48) = v1 // this->text
signature (2textChanged(QString))1 , the slot class object
20 } (a MainWindow instance pointed by a this pointer),
Metadata
the slot function signature (1updateText(QString)), and
21 MainWindow::handleInput() { Property Table
22 // Slot Name
the connection type (0, indicating that the callback is
23 v1 = *(this + 0x48) // this->text Index Type
Index
24 if (v1 == “secret”) {
Query 0 0 QString
synchronous). After the connection is established, when a
25 // Dynamic introspection
26 this->setProperty(“text”, “test”)
index
String Table
user enters some text in the QLineEdit UI widget, the slot
27 qDebug() << v1 // Will print out “test”
28 }
Index String function (lines 16-20) will be automatically notified to update
0 text
29 }
the text variable. Fundamentally, the registered callbacks
30 MainWindow::qt_metacall(… int v1, void** v2) { Invoke
are stored in a connection list (i.e., a linked list) of the
qt_metaCall
31
32
...
if (v1 == 0) {
v1 = 0 involved class objects. When a signal is emitted, the class
v2 = “test”
33 // Set property value by index object internally triggers a Qt library function activate to
34 *(this + 0x48) = (QString) v2
35 } invoke the slot function from the connection list [16].
36 }
The above example shows that Qt’s signal and slot can be
used to recover function callbacks, which cannot be identified
Figure 1: An example illustrating Qt binary internals.
by other generic binary RE tools [4, 5]. Essentially, the task
is to analyze the standard connect function and resolve the
attempts to address two fundamental challenges of RE: (1) pair of the signal (i.e., caller) and the slot (i.e., callee) from
control flow graph (CFG) recovery and (2) symbol recovery. the function parameters to establish the callback connection.
Insight 2: Repurposing Qt’s Dynamic Introspection.
3.2 Key Insights Another distinctive feature of Qt is its dynamic introspection,
a feature useful for run-time query and update of class
The recovery of CFG and symbols is fundamentally
attributes. To use dynamic introspection, a class member first
challenging due to indirect control flow transfer and code
needs to be registered as a property2 by using a Q_PROPERTY
stripping [27, 53, 67]. Moreover, while existing analysis
macro [10]. Fundamentally, to support dynamic introspection,
tools [3–5, 61] can be applied to Qt binary analysis, they
the MoC will collect the necessary meta-information and
will miss many Qt-specific callbacks and stripped symbols.
generate the corresponding code during compilation, which
Interestingly, we observe that two unique Qt mechanisms:
will be invoked at run-time for introspection [14, 20].
(1) signal and slot, and (2) dynamic introspection can be
We use lines 21-29 of Figure 1 to illustrate how dynamic
leveraged for Q T RE’s objective, leading to two key insights.
introspection works in Qt. First, the slot handleInput at line
First, Qt’s signal and slot mechanism provides a unique way
21 is triggered by the signal editingFinished when the user
to efficiently implement function callbacks, which can also be
finishes entering a string from the GUI. The slot then takes
used to identify function callback targets for CFG recovery.
the input variable at memory location this+0x48 (line 23),
Second, while Qt’s dynamic introspection is for run-time
and compares it with a constant string “secret” (line 24).
variable query and update, we surprisingly find that such
If the variable matches the string, setProperty is invoked
a process can be repurposed to recover symbols. In the
to set the property value of text as “test” using dynamic
following, we present the details of these two key insights.
introspection at line 26. Since such an update occurs internally
Insight 1: Leveraging Qt’s Signal and Slot. Function in the Qt library (not directly visible to programmers) with
callbacks are extremely common in GUI applications due complicated procedures, we explain it in a simplified manner.
to the handling of UI events and asynchronous function calls. Specifically, the program first uses the property name “text”
However, C++ provides neither standard APIs nor official to query its index from the metadata tables. By associating
guide for callback implementation, and thus programmers the name index 0 from the string and property table, it
have to use function pointers to implement callbacks, which
is error-prone and makes the code much less readable and 1 Constants 1 and 2 are macros to indicate a slot or a signal function.
maintainable. As such, Qt introduces the signal and slot 2A property is essentially a class member with additional features.
obtains the property index which is also 0. Next, the function Type Name
Param.0 Param.1 Param.2 Param.3 Param.4
qt_metacall is invoked along with the property index 0 Signal Class Signal Sig. Slot Class Slot Sig. Type
1 connect QObject* fptr* QObject* fptr* int
and the updated value “test” as arguments, and the purpose 2 connect QObject* char* QObject* char* int
is to store the value to the property’s memory address at
this+0x48 (line 34). Finally, the program at line 27 will Table 1: Connect functions and argument types.
print out “test” on the console even though the input string
is “secret”, indicating that the variable has been updated. uniquely identify the target. However, as shown in the running
Based on how dynamic introspection works, we notice example, there are various ways to derive a class instance. For
that unlike C++ binaries, Qt binaries must preserve semantic example, the signal class object at line 9 is initialized by a
symbols to support this feature, which makes the recovery new operator (essentially a heap variable), whereas the slot
of actual semantic symbols possible. More specifically, by class object is pointed by a this pointer.
repurposing the introspection process, we can reveal the
Solution. While there are many ways to derive a class object,
semantic symbols from the special data structures (e.g.,
we observe that there are a finite number of sources. In
metadata tables) and functions (e.g., qt_metacall) in Qt.
summary, there are six different sources, including (1) this
Compared with existing symbol recovery approaches [38,
pointer, (2) function parameter, (3) function return value,
44, 45, 59, 62, 66, 69, 70], our solution recovers the symbols
(4) global variable, (5) heap variable, and (6) stack variable.
instead of inferring them, as the recovered symbols are indeed
Therefore, to resolve the signal and slot classes, we first trace
the ones from the source code.
the use-def [22] chains of the corresponding parameters in
the connect function. Next, based on the data definition of
3.3 Scope and Assumption the class object, we use a set of source-aware inference rules
to infer the class. In the following, we illustrate in greater
We focus exclusively on Qt binaries and assume these binaries
detail how Q T RE identifies function callbacks.
are not stripped, and no anti-RE techniques are deployed so
that they can be disassembled using existing RE tools such
as Ghidra. For CFG recovery, we aim to identify callbacks 4.1.1 The connect Call Sites Identification
implemented by Qt’s signal and slot as they are Qt-specific, The first step of callback identification is to locate the
whereas other CFG recovery challenges such as indirect connect function call sites. To achieve comprehensiveness,
function calls are handled by existing approaches [46, 53, 67] we exhaustively looked into the Qt’s official documentations
and are not unique to Qt. For symbol recovery, Q T RE recovers and found that there are only two types of connect functions
symbols that can only be extracted using Qt’s dynamic (i.e., type-1 and type-2) [19], as presented in Table 1. For
introspection. both functions, the parameters 0 to 3 correspond to the signal
class object, the signal function signature, the slot class
4 Q T RE Design object, and the slot function signature, respectively. The last
integer parameter indicates the type of connection (whether
This section presents the design of Q T RE. First, we illustrate the connection is synchronous or asynchronous). These two
how Qt’s signal and slot are used to recover function callbacks functions are different in parameters 1 and 3, as type-1 directly
(§4.1). Next, we describe how Qt’s dynamic introspection is uses function pointers to denote the functions, whereas type-2
repurposed to recover semantic symbols (§4.2). uses strings as function signatures. Note that type-1 is only
available after Qt version-5 [19]. As a result, Q T RE locates
4.1 Identification of Function Callback Target these two types of connect functions in the binary based on
their signatures and then identifies their call sites.
Challenges. Recall in §3.2, the essence of function callback
identification is to resolve the signal and slot [19] from the 4.1.2 Source-aware Class Inference
connect function parameters. As shown in Table 1, the
connect function has five parameters: (1) the signal class Q T RE then resolves the signal and slot from the parameters
instance, (2) the signal function, (3) the slot class instance, of connect. Since there are two types of connect functions,
(4) the slot function, and (5) the connection type. Although Q T RE uses two strategies accordingly: (1) for type-1
the signal and slot functions can be easily resolved as they are connect, by resolving the addresses to which the function
hardcoded strings that represent the function signatures, this pointer points, Q T RE can determine the signal and slot.
is still not sufficient due to the polymorphism in C++ [53]. For Therefore, it only needs to resolve parameters 1 and 3 in
instance, if class A has a function foo, then any class inherited Table 1; (2) for type-2 connect, Q T RE has to resolve all the
from A can override function foo. Thus, only knowing the pointer types (i.e., parameters 0-3) including the class object
function signatures cannot accurately resolve the callback pointers, since a callback target is determined by both the
target, and we must use both the signature and class to class and the function signature due to the polymorphism.
p = this p 7→ v v ∈ Parameter( f ) p 7→ v v = f (...)
T HIS P OINTER
this 7→ class F UNC PARAM Type(v) F UNC R ET VAL
ReturnType( f )
p 7→ v v ∈ GlobalVariables p→
7 HeapAlloc(v,size)
G LOBALVAR H EAP VAR
Type(v) Constructor(v)
p 7→ v p ∈ Stack ∃! f , Signature( f ) = signature
S TACK VAR S IG M ATCHING
Type(v) Class( f )
Figure 2: Formal representation of source-aware class inference rules for callback target identification.
However, it is not so straightforward to resolve these The type information is obtained directly from the symbols
parameters, and Q T RE has to perform a static analysis (e.g., the types of function parameters and return values)
and trace back to the data definitions. Specifically, for each if available. If the binary is stripped, our approach still
parameter, Q T RE recursively traverses backward the use-def works by recursively applying the inference rules (e.g.,
chains until the data definition is reached. As such, it easily tracing the definition of a function parameter at the call site).
obtains the signatures of the signal and slot functions of type- Additionally, the symbols can be contributed from Q T RE’s
2 connect as most of the char pointers point to hardcoded recovery approach (§4.2) as those cannot be fundamentally
strings. For the signal and slot class objects, Q T RE further stripped. After Q T RE infers the class types and function
infers their classes based on the data definitions. In summary, signatures using the inference rules, it constructs the concrete
there are six sources that can derive a class object according to function callback targets (i.e., signal and slot) by connecting
our observation, and Q T RE also has one additional signature the inferred class (e.g., QLineEdit) with the function
matching inference rule (inspired by TypeArmor [67]) when signature (e.g., textChanged(QString)). Finally, to ensure
it fails to infer the class from these six sources. The formal the soundness of the results, Q T RE establishes a callback
representations of the source-aware class inference rules are connection only when both the signal and slot are resolved.
presented in Figure 2 and explained in the following.
• This pointer. If the class pointer p is a this pointer (e.g., 4.2 Semantic Symbol Recovery
the slot class object at line 10 in Figure 1), then the corre-
sponding class of the this pointer is used as the class type. Challenges. Next, Q T RE recovers semantic symbols by
• Function parameter. If p points to a parameter repurposing Qt’s dynamic introspection. First, as illustrated in
of function f , the class type is inferred from the our running example (Figure 1), the symbol strings are stored
corresponding parameter type. in metadata tables, and thus Q T RE interprets them to extract
• Function return value. If p points to the return object of the symbol strings, as demonstrated in an existing tool [14].
f , Q T RE uses the return type of f . Afterwards, with the symbol strings extracted, we still need
to map the property symbols to the corresponding memory
• Global variable. If p points to an object located in data
addresses (e.g., text is mapped to this+0x48 in Figure 1).
segments where global variables reside (e.g., bss), Q T RE
However, this is challenging for two reasons. First, the symbol
traces the definition of the global variable to infer its type.
addresses are not available from the metadata tables, and are
• Heap variable. If p points to a heap object v initialized hidden in deep program branches of the Qt binary code (e.g.,
through heap allocators such as new (e.g., line 4 in Figure 1) line 34 of Figure 1). Thus, we infer such a symbol-address
or malloc family, Q T RE searches the def-use chain of v association using a sophisticated binary analysis. Second,
for any constructor functions, since a heap variable must the symbol addresses are dynamically computed, which are
be initialized through a constructor (e.g., line 5 in Figure 1). derived from a base pointer this (lines 9 and 12 of Figure 1),
The class is thus inferred from the constructor name. making them non-trivial to solve statically.
• Stack variable. If p is a pointer located on the stack, Solution. Motivated by the running example, our key
Q T RE traces the definition of its pointed variable v and insight is that the symbol to address mapping must exist
uses the above rules to recursively resolve the class type. in the Qt library function qt_metacall to support the
• Signature matching. If none of the above rules works run-time query of class properties, which can be leveraged
(e.g., the class object is the returned from a function that to compute the symbol addresses associated with the symbol
cannot be concretely resolved), Q T RE uses an additional strings. Specifically, in Figure 1 the setProperty function
rule by matching the function signature, which is inspired invokes qt_metacall which guides the program to update
by TypeArmor [67]. Specifically, if there exists only one variable text at this+0x48. Although some static analysis
function f that has exactly the same signature, the class of approaches, such as data flow analysis [23] could be applied,
f is our target. Otherwise, if there are multiple f s, Q T RE it falls short due to the excessive number of branches in the
will not establish a callback to ensure soundness. qt_metacall function and dynamically computed values
QLineEdit (I) qt_meta_data_counter(Metadata table) (IV) qt_meta_data_stringdata_counter.data (V)
Content
(String table)
this+0x00: 58 e0 98 00 *QLineEdit::vtable 0x085db0c: 05 00 00 00 // Qt version
0x085d700: FF FF FF FF // metadata str[0]
this+0x04: d4 63 9a 00 *QLayoutPrivate::vtable 0x085db10: 00 00 00 00 // class name
... 0x085d704: 09 00 00 00 // length str[0]
this+0x08: 38 e1 98 00 *QLineEdit::~QLineEdit()
0x085db44: 01 00 00 00 // name ...
...
0x085db48: 00 00 00 00 // argc 0x085d718: FF FF FF FF // metadata str[1]
Signal
0x085db4c: 00 00 00 00 // parameters 0x085d71c: 11 00 00 00 // length str[1]
0x0993c90: 40 e0 98 00 *QLineEdit::staticMetaObject 0x085db50: 02 00 00 00 // tag
...
Slot
0x098e05c: d0 48 58 00 *QLineEdit::qt_metaCast() 0x085d74c: 04 00 00 00 // length str[3]
0x085db90: B0 00 00 00 // parameters
0x098e060: 24 6e 58 00 *QLineEdit::qt_metaCall() ...
0x085db94: 02 00 00 00 // tag
... 0x085d88c: “QLineEdit” // str[0]
0x085db98: 06 00 00 00 // flags
0x085d896: “editingFinished()” // str[1]
String
...
Param
0x085dbbc: 0A 00 00 00 // type 0x085d8a8: “setText(QString)” // str[2]
QLineEdit::staticMetaObject (III)
0x085dbc0: 03 00 00 00 // name 0x085d8b9: “text” // str[3]
0x098e040: 9c 93 98 00 *QWidget::staticMetaObject ... ...
Property
0x098e048: 0c db 85 00 *qt_meta_data_counter 0x085dbe8: 0A 00 00 00 // type
... 0x085dbec: 03 50 09 00 // flags
...
Figure 3: Tables and data structures that store the metadata information of the QLineEdit class in our running example.
(e.g., this pointer), making them difficult to design and consists of three pointers [16]. The first pointer points to
implement. In contrast, we find that a light-weight unit-level the staticMetaObject of its parent class QWidget. The
symbolic execution [42] is well-suited for this problem, as we second pointer points to the data section of a structure
can leverage the code transition and arithmetic computation called qt_meta_stringdata_Counter, which contains the
logic in function qt_metacall to efficiently compute the metadata about the strings used by this class. The third
relative symbol addresses. Back to Figure 1, we apply this idea pointer points to the qt_meta_data_Counter structure,
by executing qt_metacall with the property index 1 (v1) and which involves the metadata of its signals, slots, parameters,
a symbolic value assigned to the this pointer, which guides and properties in different data sections. For simplicity, we
the execution to line 34 and computes the desired symbol call the latter two string table and metadata table, respectively.
address. In the following, we describe how Q T RE extracts (IV) Metadata table. The metadata table consists of five
the symbol strings and computes the symbol addresses. sections [16] and stores the metadata of the class, signals,
4.2.1 Symbol String Extraction slots, parameters, and properties that are used in the class.
To extract the symbol strings, Q T RE interprets 3 key data • Content section includes information about the Qt version,
structures and 2 metadata tables. In Figure 3, we present a class name, number of methods, properties, and signals
real-world example of the QLineEdit class to explain how in the class, etc. For instance, the class name is represented
its symbol strings are extracted from these five elements: by an index 0. By looking up the index 0 at the string table,
we can know the class name is QLineEdit. Interpreting
(I) Object hierarchy. For each instance of class QLineEdit,
the content table is quite simple, as each element is of type
the hierarchy starts with its virtual function table pointer,
UInt, and the size of the table is fixed.
followed by other pointers, functions, and member variables.
Note that the class also has a pointer pointing to a static mem- • Signal section is right after the content section, which
ber staticMetaObject, which contains pointers pointing to stores the metadata of the signals in the class. It consists
the metadata tables of the class, as illustrated in Figure 3. of multiple signal entries, each of which has five UInt
elements to describe a signal. For simplicity, we only
(II) Virtual function table. Whenever a class defines a show one entry in Figure 3 (the same as below). As
virtual function (i.e., a function defined at the base class but shown, this entry has five elements, representing the signal
overridden in child classes), a virtual function table will be name, argument count, parameter index, tag, and flags.
generated and shared among all class instances. All classes Similarly, the signal name is represented by an index 1,
in Qt that inherit from the base class QObject will have a which is interpreted as editingFinished() at the string
virtual table, which contains built-in Qt functions such as table. In addition, if the signal function has parameters, the
qt_metacast and qt_metacall [16]. parameter entry offset (relative to the entry of the metadata
(III) Static meta object. There is a static member named table) will be specified. Since the signal does not have
staticMetaObject shared among all instances of the parameters, the parameter index is 0. The remaining two
same class. This member is created automatically by the elements represent the tags and flags of the signal. Note that
MoC when a class instance inherits from QObject [11]. each signal is also indexed according to the order in which
As shown, staticMetaObject is a data structure that it appears (e.g., editingFinished has an index of 0).
• Slot section is identical to the signal section, except that 1 int QLineEdit::qt_metaCall (int call, int index, void** a) {
2 ... 1 2
each entry represents a slot. Thus, Q T RE uses the same 3
4
if (call == 0) {
// call function by index
strategy to interpret the slot table. As in the example, we 5 ... Name Value Name Value
6 } call 1 call 1
can interpret the slot as setText(QString) with one 7 else if (call == 1) { index 0 index 1
8 // get property by index
parameter. In addition, the offset for parameter entry is 9 switch (index) {
this 𝜆 this 𝜆
name index 3 and type 0x0A, which are interpreted as text 24 // set property by index
**a
*(*(*(𝜆
+4)+300 **a
*(*(*(𝜆
+4)+300
25 ... )+8) )+56)
and QString (based on the special type encoding in the 26 }
27 return index
Qt source code [16]). Combining with the extracted slot 28 }
Table 2: Validated result of function callback recovery and semantics recovery in KDE ground truth evaluation.
long Q T RE takes to analyze the KDE and Tesla binaries is unfortunately inevitable. The major challenge lies in the
(§6.2). To answer RQ4, we compare Q T RE with two state- validation of symbol addresses that are only available in the
of-the-art RE tools G HIDRA [4] and A NGR [61] (§6.3). binary code and are often derived through convoluted program
Qt binary acquisition. KDE is an open-source Linux GUI logic such as nested function calls and branch statements,
desktop environment [6], and we use the KDE Plasma which is difficult to automate even with debug symbols (e.g.,
desktop image of version 21.04, which is based on Qt 5.79. DWARF) available. Therefore, we select a portion of the
We extracted 1,018 Qt binaries (including libraries) from the binaries for manual validation using the following criteria.
image, and further filtered them by scanning whether connect First, we exclude standard Qt binaries (e.g., libQt5Svg) and
or qt_metacall function is used (as the binary code is not only include those directly compiled from the KDE source
obfuscated), which confirms that each of them has at least one code. Second, we choose binaries that have both at least one
function callback or symbol. Finally, 80 binaries are selected callback and a recovered symbol, so that they can be used for
for evaluation. The Tesla firmware was extracted from a real both callback and symbol validations. By applying these two
CID device (originally as a part of a Tesla Model S). The criteria, there are eventually 16 binaries selected for validation,
CID runs 2.52.22 (v8.0) firmware version based on Qt 4.7.2, which took us two weeks to complete. The results are reported
which was released in 2017. The Tesla firmware contains 54 in Table 2 and we detail them as follows.
Qt binaries in total. We applied the same filtering strategy as Function callbacks validation. The validation of function
KDE, and got 43 binaries for evaluation. In total, we obtain callback is by comparing the recovered callback targets with
123 binaries from the two binary suites. We notice that more the corresponding parameters of the connect function in the
KDE binaries were filtered due to the absence of Qt callbacks source code. According to the bottom row of Table 2, among
and symbols, compared with the Tesla binaries, because these the 16 binaries, Q T RE successfully identified 598 callback
two binary suites were engineered differently. Specifically, instances in total, and all of them are correct. However, there
among the 1,018 binaries in KDE, most of them are generic are 796 callback instances in total among the 16 binaries (by
C++ binaries and do not contain Qt classes, which are mainly counting call sites of the connect function), indicating that
for non-GUI functions (e.g., back-end logic). In contrast, Tesla there are 198 callback instances that Q T RE did not identify
binaries tend to use Qt for both front-end and back-end logic. (i.e., a 25% FN rate). However, later (§6.1.2) we show that
Experiment environment. Q T RE’s analysis was conducted the FN rate is significantly lower when we measure the FN
on an Ubuntu 18.04.4 LTS desktop. The machine is equipped rate for all 123 binaries in our dataset.
with 12 Intel i7-8700 CPU cores and 32 GB RAM. We further investigate the root causes of these FN cases,
and find that Q T RE failed to accurately infer either their
6.1 Effectiveness signal or their slot classes, and thus did not construct callback
6.1.1 Quantifying FP and FN with KDE Programs relationships as described in §4.1. However, the function
signatures in these cases were successfully recovered (as they
Before presenting how many callbacks and symbols Q T RE are mostly hardcoded strings), which are still useful, as they
can recover, we first evaluate its effectiveness, i.e., the false can significantly narrow down the possible callback targets
positive (FP) and false negative (FN) rates. As such, we use based on the signature strings. To summarize, there are two
the open-source KDE binary suite [6] as our benchmark major causes that fail the source-aware class inference: (1)
with 80 Qt binaries in total. While we wish to use all of memory aliasing where Q T RE could not find the data source
them and develop an automatic validation tool, manual effort (e.g., Q T RE resolves the pointer of a class object which is
KDE Tesla Type Symbol String
Result
# % # % QString text, key, ssid, plainText, carName, country, name, reason,
# Total Binary 80 100% 43 100% response, message, location, keyword, command
phoneNumer, debugMessage, errorMessage
Callback Recovery
QUrl url, baseUrl, requestedUrl
# Total callback 3,972 100% 8,845 100%
# Recovered callback 3,323 83.7% 7,544 85.3% QByteArray data, userData
# Type-1 connect 2, 992 75.3% 0 0 int sec, id, pId, deviceId, securityType, canID, canData
# Type-2 connect 331 8.3% 7,544 85.3% bool useCarLocation
Source of Class Objects
# This pointer 275 4.1% 7,433 49.3% Table 4: Selected symbols recovered from Q T RE that
# Function parameters 113 1.7% 295 2.0% potentially contain sensitive information.
# Function return value 88 1.3% 1,371 9.1%
# Global variable 83 1.2% 3,509 23.3%
# Stack variable 5,984 90.0% 389 2.6%
# Heap variable 88 1.3% 652 4.3% the 1,161 symbols are all validated. In summary, among the
# Signature matching 15 0.2% 1,439 9.5% 1,164 validated symbols, there is no false positive and there
Symbol Recovery are only three (0.3%) false negatives in the recovered symbol
# Total recovered symbols 4,362 100% 20,611 100%
# Property 817 18.7% 951 4.6% addresses due to indirect function calls.
# Signal 1,182 27.1% 9,266 45.0%
# Slot 841 19.3% 3,326 16.1%
# Function parameter 1,522 34.9% 7,068 34.3% 6.1.2 Real-World Qt Binaries
Table 3: Results of callback and semantics recovery. We evaluated Q T RE using real-world binaries from both
KDE and Tesla, as summarized in Table 3. In general, Q T RE
initialized by another pointer pointing to it); (2) indirect calls identified 10,867 callback instances and 24,973 semantic
where a class object to be inferred is the return value of a symbols from the 123 binaries in the two binary suites.
function that is indirectly called (e.g., through a CALL EAX According to row 1 of the table, there are 80 binaries from
instruction). We note that these are common limitations in KDE and 43 from Tesla firmware. The detailed results of all
binary analysis (not unique for Qt binaries), and there have binaries from KDE and Tesla are presented in Table 12 and
been many solutions such as aliasing analysis (e.g., [30]) Table 13 in Appendix, respectively, for readers of interests. In
and Multi-Layer Type Analysis (e.g., [41, 46]), which could the following, we zoom in on the results of function callback
be integrated in Q T RE in future work. Therefore, in short, target identification and semantic symbol recovery.
there is no false positive in the function callback recovery Recovered function callbacks. According to row 4 in Table 3,
among the 598 validated callbacks, while there exist 25% among the 80 binaries from KDE and 43 Tesla binaries,
false negatives due to memory aliasing and indirect calls. Q T RE recovered 3,323 and 7,544 function callback instances,
Semantic symbols validation. The validation of semantic respectively. However, the recovered callbacks account for
symbols is by comparing the symbols of signals, slots, param- 83.7% and 85.3% among all identified callbacks in KDE and
eters, and properties with those in the corresponding source Tesla (obtained by counting the call sites of connect), and
code, including their names, types, and relative addresses. correspondingly the FN rates among all binaries are 16.3%
However, since the relative addresses are available only in and 14.7% respectively, which are much lower than the 25%
binaries, we have to use both the source code and the decom- FN rate of the validated callbacks in §6.1.1. These FNs are
piled binary code for the validation. As presented in Table 2, also caused by the same reasons as in our validation.
among the 16 binaries, there are 1,164 semantic symbols in In terms of the callback types defined in Table 1, most of
total (by counting the entries at all symbol tables), and Q T RE the callbacks in the KDE binaries use type-1 connect, while
successfully recovered 1,161 of them, in which all of them are the Tesla binaries use type-2 connect because type-1 connect
correct, including the symbol names and relative addresses. is only available after Qt5. Furthermore, to show that our
For the remaining three (0.3%) cases, they are all property source-aware class inference is useful, we further show the
symbols, and Q T RE did not successfully recover their contribution of the seven sources (as described in §4.1) in
relative addresses (though their symbol strings are still §6.1.2. It can be inferred from the statistics that stack variable
correctly recovered). The root cause is that the addresses is the dominating source among the KDE binaries, and we find
of these three properties are returned from a virtual function that it is frequently used by type-1 connects to store function
call [32] where Q T RE cannot precisely locate the function pointers of signals and slots. In contrast, Tesla binaries tend
address as it is dynamically computed (derived from the this to use this pointers to derive classes.
pointer). We consider these three cases as FNs since Q T RE Recovered semantic symbols. According to row 14 in
did not attempt to generate incorrect property addresses but Table 3, Q T RE recovered 4,362 symbols from 80 KDE
instead left them unresolved. As these FN cases only account binaries and 20,611 symbols from 43 Tesla binaries, showing
for a small portion of our results (three out of 96 properties that the Tesla binaries use Qt’s metaobject system more
in total), addressing the indirect call limitation is thus left to frequently than KDE. The recovered symbols include four
future work. On the other hand, there is no false positive as types: signals, slots, function parameters, and properties.
Result KDE Tesla Total MoC). To clearly show Q T RE’s contribution atop G HIDRA,
# Total call graph edges we present the total number of callbacks and symbols Q T RE
# Recovered by A NGR [61] 791, 907 1, 395, 093 2, 187, 000
# Callback recovered 0 0 have recovered, which are essentially the 10,867 callbacks
# Recovered by G HIDRA [4] 432, 843 987, 263 1, 420, 106 and 24,973 symbols (as reported in §6.1.2) from two binary
# Callback recovered 0 0
# Recovered by G HIDRA [4] w/ Q T RE 436, 166 994, 807 1, 430, 973 suites. Note that these callbacks and symbols cannot be iden-
# Callback from QtRE 3, 323 7, 544 10, 867 tified by generic binary RE tools such as A NGR and G HIDRA.
# Total symbols
# Recovered by A NGR [61] 97, 109 1, 171, 990 1, 269, 099
# Recovered by G HIDRA [4] 97, 109 1, 171, 990 1, 269, 099
# Recovered by G HIDRA [4] w/ Q T RE 101, 471 1, 192, 601 1, 294, 072 7 Applications
# Symbol from Q T RE 4, 362 20, 611 24, 973
Being an RE tool, Q T RE can be useful for real-world
Table 5: Comparison of Q T RE, G HIDRA and A NGR.
security problems. In particular, we demonstrate using Q T RE
to perform input validation analysis and extract hidden
Note that one slot can be registered to receive multiple commands from a Tesla Model S firmware, because Tesla
signals, which explains that the numbers of signals and slots vehicles are known to contain many Easter eggs [57]. To
are not always identical. To show that the recovered symbols perform the analysis, we make use of the recovered callbacks
are indeed useful for security analysis, we also present and symbols in §6.1.2. These results are crucial to our analysis
the semantic symbols selected manually that potentially as we leverage the recovered symbols to identify the user input
contain sensitive information in Table 4. As shown, these variables (e.g., the class members text). Additionally, as the
symbols indicate sensitive information for confidentiality hidden commands are triggered by UI operations, it requires
concerns, including personal identifiable information (e.g., the recovered callbacks to construct a complete CFG, and
id), confidential data (e.g., canData), and cryptography otherwise we cannot identify those hidden commands.
parameters (e.g., key). One use case of these symbols would
Input Validation Analysis. First, we define a set of rules to
be identifying privacy sensitive data leakage through taint
identify user-controllable input variables by using recovered
analysis [23, 27]. In addition, we show the top 10 most
symbols. Next, starting from the identified input variables, we
frequent symbols of properties, parameters, and functions
perform an automated taint analysis [23] to extract the input
in Table 9, Table 10, and Table 11, respectively, in Appendix.
validations and resolve the corresponding compared variables.
6.2 Efficiency • Symbol-guided input variable identification. According
to our observation, the input variables are members
On average, it takes Q T RE 1.7 minutes to analyze a binary.
of the UI widget classes (e.g., QLineEdit), and thus
The average time to analyze KDE and Tesla binaries is
they can appear in (1) return values of “get” functions
1.5 and 3.6 minutes, as Tesla binaries have more classes
(e.g., getText()) and (2) class members variables (e.g.,
and functions. In the worst case, Q T RE spent 77 minutes
QLineEdit.text). In addition, they can be members of
analyzing the most complicated binary libQtCarGUI, which
either (1) standard Qt library classes or (2) programmer-
has almost 2, 000 classes and more than 40K functions. To
defined classes, and we need to focus on both. For standard
better evaluate Q T RE’s efficiency on each binary, we present
Qt library classes, we manually investigate the official Qt
the analysis time of each binary in Table 12 and Table 13 in
documentation [11] to find the function and the class mem-
Appendix for KDE and Tesla binaries, respectively.
bers that convey user inputs, resulting in six such elements
in the Tesla Qt binaries, as shown in rows 1-6 in Table 6.
6.3 Comparison with Other RE Tools There are one function and five class members, with the
We compare Q T RE with two state-of-the-art open-source RE corresponding relative addresses shown in the 3rd column.
tools: G HIDRA (v9.2.2) [4] and A NGR (v9.1) [61] that can However, for the programmer-defined classes, it is not
also be applied to Qt binary analysis. The comparison was straightforward to recognize the input variables, as there is
conducted to evaluate each tool’s capability of recovering no documentation to rely on. Thus, we define two criteria
CFG (including callbacks) and symbols. The results are to identify the desired classes and locate the input variables
summarized in Table 5 and the detailed statistics per binary by utilizing the recovered symbols. The first criterion is that
are presented in Table 12 and Table 13 in Appendix for KDE the class of our interest should have implemented signals
and Tesla binaries, respectively. As shown in Table 5, we show that can monitor the input status. For instance, the standard
the total number of function call graph edges and symbols library class QLineEdit defines a signal textChanged()
recovered by A NGR, G HIDRA, and G HIDRA with Q T RE. As to notify the callback function to update the text on the
expected, neither A NGR nor G HIDRA can recover any of the screen. Note that all child classes that inherit from these
callbacks and symbols identified by Q T RE, because they are classes will be of our interest since they also hold these vari-
not specifically tailored for Qt binary analysis (i.e., they can- ables. The second criterion is that the desired classes should
not recognize and interpret Qt-specific code generated by the have a member variable to hold the user input, which should
Class Name Var./Func. Name Symbolic Address In Qt Lib? Vehicle
Category Content Description
QLineEdit text() N/A ✔ Agnostic
QLineEdit text *(*(*(λ+4)+300)+8) ✔ “007” Submarine Easter egg ✔
QAbstractSpinBox text *(*(λ+4)+452) ✔ “modelxmas” Show holiday lights ✔
QDoubleSpinBox text *(*(λ+4)+452) ✔ Easter “42” Change car name ✔
QSpinBox text *(*(λ+4)+452) ✔ Egg “mars” Turn map into Mars surface ✔
QDateTimeEdit text *(*(λ+4)+452) ✔
“transport” Transport mode ✔
TextField text *(*(λ+796)) ✗
“performance” Performance mode ✔
PasswordTextField text *(*(λ+796)) ✗
WebEntryField text *(*(λ+796)) ✗
“showroom” Showroom mode ✔
NavigationSearchBox text *(*(λ+796)) ✗ SecurityToken1 Enable diagnostic mode ✗
CompleterTextField text *(*(λ+796)) ✗ Access SecurityToken2 Enable diagnostic mode ✗
ExtEntryField text *(*(λ+796)) ✗ Token crc(token)==0x18e5a977 Enable developer mode ✔
crc(token)==0x73bbee22 Enable developer mode ✔
Table 6: Taint analysis sources and their computed Master Pwd “3500” Exit valet mode ✔
symbolic variable addresses (λ denotes this pointer).
Table 7: Extracted hidden commands in Tesla binaries.
be of a string type (e.g., QString and char*). In addition,
these variables should have names inferring that they are options such as limiting the vehicle speed, resetting the
of string type and represent certain text variable, such as vehicle modules, and reading system logs. Compared to
text. As shown in row 7-12, we identify six such variables. Easter eggs, access tokens are not intentionally left over
• Taint analysis. Finally, the standard taint analysis [23] is to users, but are possibly for developers and technicians
automatically performed to analyze the input validation, to debug and diagnose. As shown in rows 8-11 of
which is implemented based on Ghidra’s P-Code IR [8]. Table 7, there are four such tokens. The first two
Prior to the analysis, we need to define the sources and can trigger diagnostic mode, and are different security
sinks, which determine where the analysis starts and ends. tokens in the local storage, which are unique for each
As shown in Table 6, the input variables (i.e., function vehicle. The other two tokens that can enable developer
return values and the class properties) identified are the mode (with more security-critical functions) are slightly
sources, and the comparison instructions (e.g., operator== more complicated, as their crc32 checksum needs to
and QString.compare()) are the sinks. During the match specific hexadecimal values (i.e., 0x18e5a977 and
analysis, we also consider the control flow transition 0x73bbee22). To search for a feasible input, we wrote
between the callback targets of the function identified a simple brute-force script that only took 30 minutes to
by Q T RE. When the taint analysis is completed, Q T RE generate a valid string “987090324273775”.
obtains a set of input validation instructions. Therefore, we • Master password. Q T RE also reported one master pass-
need to further resolve the compared variables, as they may word (i.e., “3500”) that can exit the valet mode regardless
not be hardcoded and require computation. This problem of the password set by the vehicle owner. Specifically,
can be solved in the same way as we resolve the function the valet mode is designed to preserve user privacy and
parameters as described in §4.1. Specifically, Q T RE vehicle safety when the vehicle is parked by a valet driver.
traverses the use-def chains of the compared variable to find When this mode is enabled, normal functions are no longer
the data definition, and further resolve their concrete values. accessible (e.g., speed will be limited, and personal data
will not be shown on the screen), unless the user enters
Experiment Result. By performing taint analysis, Q T RE
the 4-digit PIN code to exit the valet mode. Apparently,
was able to identify seven Easter eggs from the Tesla
the master password is also for testing and development
firmware, four access tokens, and one master password, as
purposes and should not be available to any users.
presented in Table 7. Although the seven Easter eggs are
already known to the public, the remaining five are actually
new, to our knowledge. We detail each type of these hidden Exploitation. We have successfully validated all the hidden
commands as follows. commands on a real CID device extracted from a Tesla Model
S. To exploit them, we consider any attackers on board with
• Easter egg. The seven Easter eggs can be entered from physical access to the CID, such as a hotel valet, a repair
the CID screen to trigger hidden behaviors on the vehicle, shop technician, or a designated driver. For instance, one can
such as changing the CID GUI. These Easter eggs are often leverage the master password to escape the valet mode, or use
benign and do not have much security implication, as they access tokens to manipulate critical settings (e.g., speed limit).
are intentionally designed to entertain users. For example, However, these hidden commands require some preconditions.
by entering a string “mars” in the AccessPopup UI, the The latter two access tokens and the master password require
navigation map will become the surface of Mars. Other setting an environment variable GUI_isDevelopmentCar as
Easter eggs such as “showroom” and “performance” can true, and the former two access tokens require to dump the
enable hidden modes such as service mode. vehicle-specific tokens from file system. These preconditions
• Access token. The four access token can trigger developer can be satisfied by using prior exploits (e.g., remote browser
or diagnostic mode, which contains security-sensitive exploits [51, 52]) targeting unpatched Tesla firmware.
8 Discussions metadata tables of Qt classes as Q T RE does [14], but it
requires to identify metadata table entrances manually and
False positive and false negative. As described in §6.1.1, we
does not attempt to recover the relative addresses. Another
did not observe any FPs among the validated results. However,
tool uses heuristics to extract the Qt class hierarchy from
we did observe 15% FNs among the recovered callbacks and
memory at run-time [18]. However, as it targets an ancient
three FNs in symbol recovery, due to indirect calls and mem-
Qt version (v2), it cannot analyze binaries in our datasets
ory aliasing, which prevent Q T RE from accurately recovering
based on Qt4 or Qt5, due to significant changes in class
the callback target and symbol address. Another potential
layout [11, 15]. The remaining one [9] briefly mentions some
source of FP is that the source-aware class inference may
RE observations, including using the qt_metacall function
not accurately infer the actual class type. More specifically,
to analyze how Qt signals are dispatched. Q T RE instead
when Q T RE infers a class type A from function parameters
makes use of the qt_metacall function logic to compute
or return types, the actual class type at run-time can be any
the relative symbol addresses. In summary, while these tools
class B inherited from A. However, this does not frequently
provide some insight for Qt binary RE, they fail to (1) make
occur, and we did not observe such a case in our validation.
any attempts to take advantage of these insights to solve
Future work. As a domain-specific RE tool, Q T RE can fundamental binary RE challenges and (2) target any newer
be applied to many other security-critical Qt applications, Qt versions.
such as medical and automotive systems where Qt has
played an essential role. Meanwhile, the callback and symbol Hidden behavior detection. In relation to the extraction
recovery techniques of Q T RE can enable many other security of hidden commands, some works also detect hidden
applications. In addition to the symbol-guided input validation behaviors in binary programs. For instance, SUPOR [34]
analysis demonstrated in this paper, an immediate future and UiPicker [49] detect sensitive user inputs in the Android
work is to integrate Q T RE with state-of-the-art fuzzers for app UI. InputScope [76] and FirmAlice [60] reveal backdoors
GUI-fuzzing [65] of Qt binaries, which can help identify through input validation analysis. AsDroid [35] uncovers
vulnerabilities triggered by external user input. stealthy behaviors by contrasting the intended behavior with
the descriptions of the user interface. Many other works
Responsible disclosure. We reported our findings in §7 to detect malware by analyzing their hidden behaviors, such as
Tesla in November 2021, and received a response in April TriggerScope [33], IntelliDroid [73], and MineSweeper [25].
2022. The Tesla security team acknowledged our findings,
and claimed that they have eliminated the feasible paths for Security analysis of Tesla vehicles. The credentials used in
exploiting these hidden commands, such as removing the the early versions of Tesla firmware were vulnerable [47],
HTTP API for setting the GUI_isDevelopmentCar variable and the over-the-air (OTA) firmware update could be
since version 2021.44. For other commands that do not require intercepted [17]. In addition, the CID browser and kernel
this precondition, they require invasive physical access (e.g., had vulnerabilities that could lead to a remote root shell [51],
to leak access tokens from the file system) and thus are which enabled attacks on the CAN bus and the autopilot
difficult to exploit in practice. In conclusion, the best security system [43, 52]. Most recently, it has also been shown that
practice is to keep the firmware up-to-date. Model X’s keyless entry system can be hacked for car theft [2],
and the autopilot’s autonomous driving system is vulnerable
to phantom attacks [50]. Meanwhile, none of them attempted
9 Related Work to reverse engineer the Qt binaries for the GUI attack surface.
C++ binary analysis. MARX [53] uses vtables to recover
the hierarchy of classes, and DeClassifier [31] extends it to
optimized C++ binaries. OOAnalyzer [59] leverages coding 10 Conclusion
patterns to recover classes and methods. VirtAnalyzer [32]
reverse-engineers virtual inheritance among C++ classes. In this paper, we make the first look at the reverse engineering
OBJDIGGER [40] and PhASAR [58] use inter-procedural of Qt binaries, and develop Q T RE, a static binary analysis
data flow analysis for C++ binary analysis. Howard [62], tool that is capable of recovering function callback targets and
TIE [44], Rewards [45], DSIBin [56], Lego [64] perform semantic symbols based on Qt’s unique mechanisms includ-
dynamic analysis to recover data structures. HexType [39] ing signal and slot, and dynamic introspection. We have tested
and TCD [77] are two static analysis tools for detecting type Q T RE with two suites of Qt binaries: Linux KDE and Tesla
confusion bugs in C++ binaries (including Qt), but they do Model S firmware, from which Q T RE additionally recovered
not make use of any Qt’s unique mechanisms. 10,867 instances of callbacks and identified 24,973 semantic
Qt binary analysis. Although not presented in formal symbols among 123 binaries in total. We further demonstrate
publications, there have been some tools for Qt binary an application of using Q T RE to extract hidden commands
RE [9, 14, 18], in which two of them are open-source [14, 18]. from the Tesla Model S firmware, in which 12 unique hidden
Specifically, one provides an IDA-Python script to parse commands are discovered with five new to the public.
Acknowledgment [26] David Brumley, Ivan Jager, Thanassis Avgerinos, and Edward J
Schwartz. Bap: A binary analysis platform. In International Conference
We would like to thank the anonymous reviewers and on Computer Aided Verification, pages 463–469. Springer, 2011.
shepherd for their constructive feedback. This research was [27] Yinzhi Cao, Yanick Fratantonio, Antonio Bianchi, Manuel Egele,
supported in part by ARO award W911NF2110081, DARPA Christopher Kruegel, Giovanni Vigna, and Yan Chen. Edgeminer:
Automatically detecting implicit control flow transitions through the
award N6600120C4020, and NSF award 2118491. android framework. In NDSS, 2015.
[28] Drew Davidson, Benjamin Moench, Thomas Ristenpart, and Somesh
Jha. Fie on firmware: Finding vulnerabilities in embedded systems
References using symbolic execution. In 22nd USENIX Security Symposium
(USENIX Security 13), pages 463–478, 2013.
[1] Awesome c++. https://github.com/fffaraz/awesome-cpp.
[29] Sushant Dinesh, Nathan Burow, Dongyan Xu, and Mathias Payer.
[2] Belgian security researchers from ku leuven and imec Retrowrite: Statically instrumenting cots binaries for fuzzing and
demonstrate serious flaws in tesla model x keyless entry system. sanitization. In 2020 IEEE Symposium on Security and Privacy (SP),
https://www.imec-int.com/en/press/belgian-security- pages 1497–1511. IEEE, 2020.
researchers-ku-leuven-and-imec-demonstrate-serious-
[30] Amer Diwan, Kathryn S McKinley, and J Eliot B Moss. Type-based
flaws-tesla-model-x.
alias analysis. ACM Sigplan Notices, 33(5):106–117, 1998.
[3] Binary ninja. https://binary.ninja/.
[31] Rukayat Ayomide Erinfolami and Aravind Prakash. Declassifier: Class-
[4] Ghidra. https://ghidra-sre.org/. inheritance inference engine for optimized c++ binaries. In Proceedings
[5] Ida pro - hex rays. https://www.hex-rays.com/idapro. of the 2019 ACM Asia Conference on Computer and Communications
[6] Kde github mirror. https://github.com/KDE. Security, pages 28–40, 2019.
[7] Language bindings - qt wiki. https://wiki.qt.io/ [32] Rukayat Ayomide Erinfolami and Aravind Prakash. Devil is virtual:
Language_Bindings. Reversing virtual inheritance in c++ binaries. In Proceedings of the
[8] P-code. https://ghidra.re/ghidra_docs/api/ghidra/ 2020 ACM SIGSAC Conference on Computer and Communications
program/model/pcode/package-summary.html. Security, pages 133–148, 2020.
[9] Picturoku: Qt 4 you. http://picturoku.blogspot.com/2011/08/ [33] Yanick Fratantonio, Antonio Bianchi, William Robertson, Engin Kirda,
qt-4-you.html. Christopher Kruegel, and Giovanni Vigna. Triggerscope: Towards
[10] The property system | qt core 5.15.3. https://doc.qt.io/qt-5/ detecting logic bombs in android applications. In 2016 IEEE
properties.html. symposium on security and privacy (SP), pages 377–396. IEEE, 2016.
[11] Qobject class | qt core 5.15.7. https://doc.qt.io/qt-5/ [34] Jianjun Huang, Zhichun Li, Xusheng Xiao, Zhenyu Wu, Kangjie Lu,
qobject.html. Xiangyu Zhang, and Guofei Jiang. Supor: Precise and scalable sensitive
[12] Qt | cross-platform software development for embedded & desktop. user input detection for android apps. In 24th USENIX Security
https://www.qt.io/. Symposium (USENIX Security 15), pages 977–992, 2015.
[13] The qt company. https://www.qt.io/company. [35] Jianjun Huang, Xiangyu Zhang, Lin Tan, Peng Wang, and Bin Liang.
[14] Qt internals & reversing. http://www.ntcore.com/files/ Asdroid: Detecting stealthy behaviors in android applications by user
qtrev.htm. interface and program behavior contradiction. In Proceedings of the
[15] Qt toolkit - porting to qt 2.x. https://qt.developpez.com/doc/2.3/ 36th International Conference on Software Engineering, pages 1036–
porting/. 1046, 2014.
[16] qt5/ source tree. https://code.woboq.org/qt5. [36] Shih-Kun Huang, Min-Hsiang Huang, Po-Yen Huang, Chung-Wei Lai,
Han-Lin Lu, and Wai-Meng Leong. Crax: Software crash analysis
[17] Reverse engineering the tesla firmware update process.
for automatic exploit generation by modeling attacks as symbolic
https://www.pentestpartners.com/security-blog/reverse-
continuations. In 2012 IEEE Sixth International Conference on
engineering-the-tesla-firmware-update-process/.
Software Security and Reliability, pages 78–87. IEEE, 2012.
[18] Reversing qt applications - and part ii | reversing.org.
[37] Shih-Kun Huang, Min-Hsiang Huang, Po-Yen Huang, Han-Lin Lu,
https://web.archive.org/web/20080703162127/http:
and Chung-Wei Lai. Software crash analysis for automatic exploit
//www.reversing.org/node/view/7.
generation on binary programs. IEEE Transactions on Reliability,
[19] Signals & slots | qt core 5.15.7. https://doc.qt.io/qt-5/ 63(1):270–289, 2014.
signalsandslots.html.
[38] Alan Jaffe, Jeremy Lacomis, Edward J Schwartz, Claire Le Goues, and
[20] Using the meta-object compiler (moc) | qt 4.8. https://doc.qt.io/ Bogdan Vasilescu. Meaningful variable names for decompiled code: A
archives/qt-4.8/moc.html. machine translation approach. In Proceedings of the 26th Conference
[21] Ioannis Agadakos, Di Jin, David Williams-King, Vasileios P Kemerlis, on Program Comprehension, pages 20–30, 2018.
and Georgios Portokalidis. Nibbler: debloating binary shared libraries. [39] Yuseok Jeon, Priyam Biswas, Scott Carr, Byoungyoung Lee, and
In Proceedings of the 35th Annual Computer Security Applications Mathias Payer. Hextype: Efficient detection of type confusion errors
Conference, pages 70–83, 2019. for c++. In Proceedings of the 2017 ACM SIGSAC Conference on
[22] Frances E. Allen and John Cocke. A program data flow analysis Computer and Communications Security, pages 2373–2387, 2017.
procedure. Communications of the ACM, 19(3):137, 1976. [40] Wesley Jin, Cory Cohen, Jeffrey Gennari, Charles Hines, Sagar Chaki,
[23] Steven Arzt, Siegfried Rasthofer, Christian Fritz, Eric Bodden, Arie Gurfinkel, Jeffrey Havrilla, and Priya Narasimhan. Recovering
Alexandre Bartel, Jacques Klein, Yves Le Traon, Damien Octeau, c++ objects from binaries using inter-procedural data-flow analysis. In
and Patrick McDaniel. Flowdroid: Precise context, flow, field, object- Proceedings of ACM SIGPLAN on Program Protection and Reverse
sensitive and lifecycle-aware taint analysis for android apps. Acm Engineering Workshop 2014, pages 1–11, 2014.
Sigplan Notices, 49(6):259–269, 2014. [41] Sun Hyoung Kim, Cong Sun, Dongrui Zeng, and Gang Tan. Refining
[24] Thanassis Avgerinos, Sang Kil Cha, Alexandre Rebert, Edward J indirect call targets at the binary level. In Network and Distributed
Schwartz, Maverick Woo, and David Brumley. Automatic exploit System Security Symposium (NDSS), 2021.
generation. Communications of the ACM, 57(2):74–84, 2014. [42] James C King. Symbolic execution and program testing.
[25] David Brumley, Cody Hartwig, Zhenkai Liang, James Newsome, Dawn Communications of the ACM, 19(7):385–394, 1976.
Song, and Heng Yin. Automatically identifying trigger-based behavior
in malware. In Botnet Detection, pages 65–88. Springer, 2008.
[43] Tencent Keen Security Lab. Experimental security research of tesla Offensive techniques in binary analysis. In 2016 IEEE Symposium on
autopilot, 2019. Security and Privacy (SP), pages 138–157. IEEE, 2016.
[44] JongHyup Lee, Thanassis Avgerinos, and David Brumley. Tie: [62] Asia Slowinska, Traian Stancescu, and Herbert Bos. Howard: A
Principled reverse engineering of types in binary programs. In 18th dynamic excavator for reverse engineering data structures. In 18th
Network and Distributed Systems Security Symposium (NDSS), 2011. Network and Distributed Systems Security Symposium (NDSS), 2011.
[45] Zhiqiang Lin, Xiangyu Zhang, and Dongyan Xu. Automatic reverse [63] Dawn Song, David Brumley, Heng Yin, Juan Caballero, Ivan
engineering of data structures from binary execution. In Proceedings of Jager, Min Gyung Kang, Zhenkai Liang, James Newsome, Pongsin
the 17th Annual Network and Distributed System Security Symposium Poosankam, and Prateek Saxena. Bitblaze: A new approach to
(NDSS’10), San Diego, CA, February 2010. computer security via binary analysis. In International Conference on
[46] Kangjie Lu and Hong Hu. Where does it go? refining indirect-call Information Systems Security, pages 1–25. Springer, 2008.
targets with multi-layer type analysis. In Proceedings of the 2019 ACM [64] Venkatesh Srinivasan and Thomas Reps. Recovery of class hierarchies
SIGSAC Conference on Computer and Communications Security, pages and composition relationships from machine code. In International
1867–1881, 2019. Conference on Compiler Construction, pages 61–84. Springer, 2014.
[47] Kevin Mahaffey. Hacking a tesla model s: What we found and what [65] Michael Sutton, Adam Greene, and Pedram Amini. Fuzzing: brute
we learned. Lookout Blog, 2015. force vulnerability discovery. Pearson Education, 2007.
[48] Alessandro Mantovani, Simone Aonzo, Yanick Fratantonio, and Davide [66] Hieu Tran, Ngoc Tran, Son Nguyen, Hoan Nguyen, and Tien N
Balzarotti. Re-mind: a first look inside the mind of a reverse engineer. Nguyen. Recovering variable names for minified code with usage
In 31st USENIX Security Symposium (USENIX Security 22), pages contexts. In 2019 IEEE/ACM 41st International Conference on
2727–2745, 2022. Software Engineering (ICSE), pages 1165–1175. IEEE, 2019.
[49] Yuhong Nan, Min Yang, Zhemin Yang, Shunfan Zhou, Guofei Gu, and [67] Victor Van Der Veen, Enes Göktas, Moritz Contag, Andre Pawoloski,
XiaoFeng Wang. Uipicker: User-input privacy identification in mobile Xi Chen, Sanjay Rawat, Herbert Bos, Thorsten Holz, Elias Athana-
applications. In 24th USENIX Security Symposium (USENIX Security sopoulos, and Cristiano Giuffrida. A tough call: Mitigating advanced
15), pages 993–1008, 2015. code-reuse attacks at the binary level. In 2016 IEEE Symposium on
[50] Ben Nassi, Dudi Nassi, Raz Ben-Netanel, Yisroel Mirsky, Oleg Drokin, Security and Privacy (SP), pages 934–953. IEEE, 2016.
and Yuval Elovici. Phantom of the adas: Phantom attacks on driver- [68] Daniel Votipka, Seth Rabin, Kristopher Micinski, Jeffrey S Foster,
assistance systems. IACR Cryptol. ePrint Arch., 2020:85, 2020. and Michelle L Mazurek. An observational investigation of reverse
[51] Sen Nie, Ling Liu, and Yuefeng Du. Free-fall: Hacking tesla from engineers’ processes. In 29th USENIX Security Symposium (USENIX
wireless to can bus. Briefing, Black Hat USA, 25:1–16, 2017. Security 20), pages 1875–1892, 2020.
[52] Sen Nie, Ling Liu, Yuefeng Du, and Wenkai Zhang. Over-the-air: How [69] Ruoyu Wang, Yan Shoshitaishvili, Antonio Bianchi, Aravind Machiry,
we remotely compromised the gateway, bcm, and autopilot ecus of tesla John Grosen, Paul Grosen, Christopher Kruegel, and Giovanni Vigna.
cars. Briefing, Black Hat USA, 2018. Ramblr: Making reassembly great again. In NDSS, 2017.
[53] Andre Pawlowski, Moritz Contag, Victor van der Veen, Chris [70] Shuai Wang, Pei Wang, and Dinghao Wu. Reassembleable
Ouwehand, Thorsten Holz, Herbert Bos, Elias Athanasopoulos, and disassembling. In 24th USENIX Security Symposium (USENIX Security
Cristiano Giuffrida. Marx: Uncovering class hierarchies in c++ 15), pages 627–642, 2015.
programs. In NDSS, 2017. [71] Tielei Wang, Tao Wei, Zhiqiang Lin, and Wei Zou. Intscope:
[54] Chenxiong Qian, Hong Hu, Mansour Alharthi, Pak Ho Chung, Taesoo Automatically detecting integer overflow vulnerability in x86 binary
Kim, and Wenke Lee. Razor: A framework for post-deployment using symbolic execution. In NDSS. Citeseer, 2009.
software debloating. In 28th USENIX Security Symposium (USENIX [72] Richard Wartell, Vishwath Mohan, Kevin W Hamlen, and Zhiqiang
Security 19), pages 1733–1750, 2019. Lin. Securing untrusted code via compiler-agnostic binary rewriting.
[55] Anh Quach, Aravind Prakash, and Lok Yan. Debloating software In Proceedings of the 28th Annual Computer Security Applications
through piece-wise compilation and loading. In 27th USENIX Security Conference, pages 299–308, 2012.
Symposium (USENIX Security 18), pages 869–886, 2018. [73] Michelle Y Wong and David Lie. Intellidroid: A targeted input
[56] Thomas Rupprecht, Xi Chen, David H White, Jan H Boockmann, generator for the dynamic analysis of android malware. In NDSS,
Gerald Lüttgen, and Herbert Bos. Dsibin: Identifying dynamic data volume 16, pages 21–24, 2016.
structures in c/c++ binaries. In 2017 32nd IEEE/ACM International [74] Lok Kwong Yan and Heng Yin. Droidscope: Seamlessly reconstructing
Conference on Automated Software Engineering (ASE), pages 331–341. the os and dalvik semantic views for dynamic android malware analysis.
IEEE, 2017. In 21st USENIX Security Symposium (USENIX Security 12), pages 569–
[57] David Schmidt. 20 clever easter eggs in tesla cars people don’t 584, 2012.
know about. https://www.hotcars.com/clever-easter-eggs- [75] Mingwei Zhang and R Sekar. Control flow integrity for cots binaries.
in-tesla-cars-people-dont-know-about/. In 22nd USENIX Security Symposium (USENIX Security 13), pages
[58] Philipp Dominik Schubert, Ben Hermann, and Eric Bodden. Phasar: An 337–352, 2013.
inter-procedural static analysis framework for c/c++. In International [76] Qingchuan Zhao, Chaoshun Zuo, Brendan Dolan-Gavitt, Giancarlo
Conference on Tools and Algorithms for the Construction and Analysis Pellegrino, and Zhiqiang Lin. Automatic uncovering of hidden
of Systems, pages 393–410. Springer, 2019. behaviors from input validation in mobile apps. In 2020 IEEE
[59] Edward J Schwartz, Cory F Cohen, Michael Duggan, Jeffrey Gennari, Symposium on Security and Privacy (SP), pages 1106–1120. IEEE,
Jeffrey S Havrilla, and Charles Hines. Using logic programming 2020.
to recover c++ classes and methods from compiled executables. In [77] Changwei Zou, Yulei Sui, Hua Yan, and Jingling Xue. Tcd: Statically
Proceedings of the 2018 ACM SIGSAC Conference on Computer and detecting type confusion errors in c++ programs. In 2019 IEEE 30th
Communications Security, pages 426–441, 2018. International Symposium on Software Reliability Engineering (ISSRE),
[60] Yan Shoshitaishvili, Ruoyu Wang, Christophe Hauser, Christopher pages 292–302. IEEE, 2019.
Kruegel, and Giovanni Vigna. Firmalice-automatic detection of [78] Chaoshun Zuo and Zhiqiang Lin. Playing without paying: Detecting
authentication bypass vulnerabilities in binary firmware. In 22nd vulnerable payment verification in native binaries of unity mobile
Network and Distributed Systems Security Symposium (NDSS), 2015. games. In 31st USENIX Security Symposium (USENIX Security 22),
[61] Yan Shoshitaishvili, Ruoyu Wang, Christopher Salls, Nick Stephens, pages 3093–3110, 2022.
Mario Polino, Andrew Dutcher, John Grosen, Siji Feng, Christophe
Hauser, Christopher Kruegel, et al. Sok:(state of) the art of war:
A Empirical Measurement of the Prevalence KDE Tesla
Name Type Count Name Type Count
of C++ Frameworks
position double 10 currentIndex int 8
pressed bool 9 text QString 8
We detail how we conducted an empirical measurement study title QString 9 orientation Qt::Orientation 8
count int 8 count int 7
to understand the prevalence of C++ frameworks. First, we palette QPalette 7 readOnly bool 7
obtained a list of C++ frameworks from a comprehensive visualPosition double 7 icon QIcon 7
contentHeight double 6 alignment Qt::alignment 7
list [1]. As this list also contains a huge number of utility hovered bool 6 enabled bool 7
libraries which cannot be directly used to develop C++ font QFont 6 title QString 6
horizontal bool 6 iconSize QSize 6
applications (e.g., the STL libraries), we exclude them
from the list. Eventually, we have 106 frameworks under Table 9: Top 10 property names with types recovered.
5 categories, namely framework, game engine, GUI, robotics,
KDE Tesla
and web, according to the list. In addition, we also manually Name Type Count Name Type Count
added a few popular C++ frameworks (e.g., MFC) missing index int 55 ctx ServiceCallContext* 3, 114
from the list (as they are not available on Github). Next, current int 43 _rval_ int& 691
text QString 42 result bool 144
we counted the C++ repositories after searching with the url QUrl 37 reason QString 136
framework’s name as a keyword on Github, and the top 20 previous int 37 index int 116
printerName QString 32 status QString 49
results are shown in Table 8 (as of Jan 2022). Our results show printerUri QString 23 routeID int 41
that Qt is dominant among all the studied C++ frameworks, role QByteArray 21 id int 38
item QVariant 20 text QString 35
as it has 45, 635 C++ repositories on Github, which is nearly msg QByteArray 18 success bool 34
3X of the second place ROS.
Table 10: Top 10 parameter names with types recovered.
Name Category # Repository % KDE Tesla
Qt Framework 45, 635 35.70% Name Count Name Count
ROS Robotics 16, 796 13.14% positionChanged 14 invokeObjectMethodCompleted 22
Boost Framework 6, 205 4.85% orientationChanged 9 findViewCompleted 22
MFC Framework 4, 409 3.45% pressedChanged 9 takeScreenshotOfViewCompleted 22
Cocos2d Game Engine 3, 587 2.81% changed 9 flashViewCompleted 22
OpenFrameworks Framework 3, 264 2.55% countChanged 8 changed 20
JUCE Framework 2, 204 1.72% urlChanged 7 set_valet_modeCompleted 17
PCL Robotics 1, 719 1.34% visualPositionChanged 7 set_tds_modeCompleted 17
imgui GUI 1, 557 1.22% selectionChanged 7 pop_questionCompleted 17
wxWidgets GUI 1, 076 0.84% activated 7 reset_valet_pinCompleted 17
Cinder Framework 1, 042 0.82% iconChanged 7 auto_conditioning_stopCompleted 17
Allegro Game Engine 958 0.75%
Godot Game Engine 682 0.53% Table 11: Top 10 function (signal & slot) names recovered.
GamePlay Game Engine 561 0.44%
dlib Framework 547 0.43%
FLTK GUI 518 0.41%
GTK++ GUI 436 0.34% C Detailed Results of Callback and Symbol
LibU
raylib
Framework
Game Engine
425
376
0.33%
0.29%
Recovery in KDE and Tesla Binaries
gtkmm GUI 349 0.27%
We present the detailed experiment results of callback and
Table 8: Top 20 C++ frameworks from our empirical symbol recovery for all KDE and Tesla binaries in Table 12
measurement study. and Table 13, including the analysis time, number of symbols,
call graph edges recovered by A NGR, G HIDRA, and G HIDRA
with Q T RE. Note that A NGR raised exceptions when
B Recovered Symbols from Q T RE analyzing five Tesla binaries for call graph generation, and we
use N/A to denote the results as in the tables. In addition, we
In this section, we show the top 10 most frequent symbol also show the detailed statistics of the callbacks and symbols
names of properties and function parameters in Table 9 and contributed from Q T RE, which cannot be identified by either
Table 10. As shown, the top 3 most frequent properties are A NGR or G HIDRA. For callbacks, we show the number of
position, pressed, and title, and the top 3 parameter callbacks recovered by Q T RE as well as the total number of
names are index, current, and text. In addition, we show callbacks (by counting the connect call sites). For symbols,
the top 10 most frequent names of signals and slots in Table 11 we further present the statistics for each category including
from KDE and Tesla firmware. As shown, the function names signals, slots, parameters, and function arguments. As shown
are often ended with changed and completed, which stand in the tables, a Tesla binary has approximately 480 Qt symbols
for callbacks to monitor the state of a specific variable. For ex- and 206 callbacks on average, which also indicates that the
ample, whenever a URL gets changed, the corresponding slot Tesla developers tend to use Qt’s callback and meta object
urlChanged with be invoked to perform specific updates, al- system more frequently than KDE.
lowing one to quickly locate the handling logic of the variable.
# CGE # CGE # CGE Callback From Q T RE Symbols From Q T RE
Binary Name Time(s) # Symbol
A NGR G HIDRA G HIDRA+Q T RE # Recovered % # Total # Prop. # Signal # Slot # Param. # Total
libKF5KHtml 965 5, 931 140, 805 66, 208 66, 282 74 38.5% 192 9 14 96 68 187
libQt5MultimediaWidgets 37 434 1, 888 869 869 0 0% 22 14 11 16 22 63
libkonsoleprivate 269 3, 118 24, 702 15, 322 15, 715 393 99.2% 396 0 0 0 0 0
libkbolt 4 275 2, 571 1, 200 1, 205 5 100% 5 0 0 0 0 0
libkworkspace5 181 391 4, 119 2, 494 2, 511 17 100% 17 9 20 12 3 44
libKF5Plasma 48 1, 539 12, 445 7, 474 7, 533 59 78.7% 75 0 0 0 0 0
libKF5GlobalAccel 244 271 2, 355 1, 514 1, 515 1 25% 4 8 1 3 10 22
libQt5Svg 73 1, 092 5, 722 3, 103 3, 107 4 57.1% 7 6 1 7 11 25
libpolkit-qt5-core-1 4 370 2, 145 1, 073 1, 073 0 0% 1 0 0 0 0 0
libQt5TextToSpeech 62 192 758 367 369 2 100% 2 6 8 9 14 37
libpackagekitqt5 12 416 5, 166 2, 230 2, 236 6 50% 12 0 0 0 0 0
libsignon-extension 2 268 934 414 414 0 0% 3 0 0 0 0 0
libplasmacomicprovidercore 2 151 554 279 285 6 100% 6 0 2 0 2 4
libkdeinit5_klipper 39 996 6, 251 3, 693 3, 764 71 94.7% 75 0 0 0 0 0
liboxygenstyleconfig5 4 281 834 452 457 5 33.3% 15 0 0 0 0 0
libKF5KIOCore 96 2, 828 27, 104 17, 570 17, 707 137 93.8% 146 6 42 9 46 103
libkcupslib 62 1, 029 5, 770 3, 564 3, 622 58 89.2% 65 11 39 43 230 323
libKScreenLocker 20 544 4, 733 2, 873 2, 919 46 86.8% 53 0 0 0 0 0
libgwenviewlib 433 4, 084 24, 933 14, 486 14, 729 243 89.7% 271 0 0 0 0 0
libKF5ModemManagerQt 27 818 8, 597 5, 464 5, 478 14 46.7% 30 0 0 0 0 0
libKF5Torrent 84 3, 524 34, 338 16, 785 16, 890 105 98.1% 107 0 0 0 0 0
libKF5ItemModels 26 640 4, 368 2, 521 2, 563 42 82.4% 51 13 19 29 61 122
libKF5IconThemes 43 796 4, 212 2, 445 2, 453 8 38.1% 21 3 4 13 9 29
libqaccessibilityclient-qt5 81 337 3, 881 2, 638 2, 641 3 9.4% 32 0 0 0 0 0
libKF5Style 1 138 459 178 179 1 100% 1 0 0 0 0 0
libpowerdevilconfigcommonprivate 3 279 989 488 492 4 100% 4 0 1 3 1 5
libKF5CompactDisc 4 283 2, 263 1, 136 1, 137 1 33.3% 3 0 9 17 17 43
libKF5WaylandClient 48 2, 672 12, 625 6, 659 6, 784 125 100% 125 0 0 0 0 0
libKF5Activities 16 397 3, 379 2, 036 2, 068 32 72.7% 44 0 0 0 0 0
libQt5QuickTemplates2 1, 778 4, 196 25, 336 14, 735 14, 871 136 92.5% 147 477 498 105 102 1, 182
libKF5Sane 57 671 5, 554 3, 117 3, 163 46 63.9% 72 0 0 0 0 0
libKF5Solid 44 828 13, 389 8, 434 8, 475 41 50.6% 81 0 0 0 0 0
libkdeinit5_kcalc 84 728 11, 607 5, 159 5, 328 169 99.4% 170 0 0 0 0 0
libKF5Cddb 10 466 3, 991 2, 153 2, 168 15 93.8% 16 0 0 0 0 0
libKF5Su 3 254 1, 364 826 826 0 0% 1 0 0 0 0 0
libKF5KDEGamesPrivate 34 1, 586 9, 134 5, 130 5, 167 37 80.4% 46 0 44 25 101 170
libkcardgame 9 800 3, 531 1, 942 1, 951 9 100% 9 0 13 8 11 32
libKF5UnitConversion 9 220 1, 834 1, 134 1, 134 0 0% 0 0 0 0 0 0
libQt5WebKitWidgets 104 1, 154 5, 268 2, 419 2, 428 9 31% 29 40 58 22 64 184
libKF5Baloo 97 532 3, 244 1, 812 1, 815 3 75% 4 20 10 0 0 30
libkImageAnnotator 55 3, 386 13, 386 6, 876 7, 002 126 100% 126 0 0 0 0 0
libKF5Purpose 4 412 1, 924 1, 037 1, 044 7 100% 7 0 0 0 0 0
libKF5KDEGames 52 1, 341 6, 411 3, 444 3, 468 24 85.7% 28 25 17 16 20 78
libksgrd 5 208 1, 146 585 592 7 77.8% 9 0 0 0 0 0
libkdsoap 11 905 5, 185 2, 477 2, 477 0 0% 12 0 5 7 16 28
libvclplug_qt5lo 25 1, 526 12, 692 6, 270 6, 282 12 60% 20 0 2 2 6 10
libOkular5Core 42 2, 531 17, 767 10, 400 10, 435 35 100% 35 0 0 0 0 0
libKF5PlasmaQuick 33 982 6, 101 3, 334 3, 369 35 59.3% 59 0 0 0 0 0
libKF5Parts 16 1, 335 5, 914 3, 392 3, 412 20 87% 23 0 0 0 0 0
libQt5MultimediaQuick 35 344 1, 976 1, 045 1, 046 1 16.7% 6 9 7 14 10 40
libKF5People 10 444 2, 672 1, 564 1, 570 6 66.7% 9 0 2 2 5 9
libQt5Gui 862 9, 964 85, 017 45, 416 45, 423 7 38.9% 18 108 132 58 150 448
libdolphinprivate 156 3, 480 20, 525 12, 644 12, 871 227 100% 227 16 119 147 281 563
libmilou 8 555 2, 387 1, 284 1, 302 18 100% 18 0 0 0 0 0
libKF5PulseAudioQt 25 697 5, 062 2, 838 2, 870 32 97% 33 0 0 0 0 0
libplasma-geolocation-interface 1 88 344 168 169 1 100% 1 0 2 2 4 8
libkdeinit5_ksysguard 65 1, 139 9, 514 5, 882 5, 969 87 84.5% 103 0 0 0 0 0
libFcitxQt5DBusAddons 34 314 2, 537 1, 019 1, 022 3 75% 4 6 10 13 35 64
libKF5ItemViews 13 787 3, 625 2, 011 2, 022 11 40.7% 27 7 6 17 25 55
libKF5Package 6 387 2, 861 1, 626 1, 629 3 100% 3 0 0 0 0 0
libKF5DBusAddons 4 277 1, 407 750 755 5 83.3% 6 0 8 2 11 21
libplasmanm_internal 33 892 5, 662 3, 443 3, 488 45 97.8% 46 12 5 22 32 71
libphonon4qt5 36 1, 201 8, 808 5, 039 5, 056 17 25.8% 66 0 0 0 0 0
libReviewboardHelpers 3 224 1, 159 633 639 6 100% 6 0 0 0 0 0
libKF5IdleTime 2 170 722 341 342 1 33.3% 3 0 5 14 9 28
libKF5KrossUi 8 623 1, 957 1, 033 1, 044 11 50% 22 0 0 0 0 0
libKUserFeedbackWidgets 31 329 1, 167 632 647 15 78.9% 19 0 0 0 0 0
libkf5be1lo 3 134 912 460 461 1 100% 1 0 0 0 0 0
libdebconf-kde 8 343 2, 508 1, 467 1, 475 8 66.7% 12 0 0 0 0 0
libKWaylandServer 272 2, 725 30, 498 18, 326 18, 422 96 100% 96 0 0 0 0 0
libQt5XcbQpa 55 2, 180 22, 374 13, 190 13, 201 11 42.3% 26 0 1 3 4 8
libKF5GuiAddons 6 481 2, 334 1, 141 1, 149 8 100% 8 5 18 0 21 44
libprocessui 93 1, 049 5, 220 2, 931 2, 966 35 89.7% 39 7 2 23 27 59
libKF5Bookmarks 17 956 5, 225 3, 216 3, 254 38 86.4% 44 0 10 6 16 32
libqca-qt5 44 2, 438 20, 023 9, 898 9, 965 67 100% 67 0 0 0 0 0
libKF5WaylandServer 64 3, 061 17, 154 10, 029 10, 186 157 100% 157 0 0 0 0 0
libpolkit-qt5-gui-1 2 146 635 277 281 4 66.7% 6 0 0 0 0 0
libkhotkeysprivate 30 963 4, 796 2, 762 2, 768 6 40% 15 0 0 0 0 0
libKF5KIOFileWidgets 199 2, 297 17, 111 10, 499 10, 701 202 99% 204 0 37 76 78 191
libQt5HunspellInputMethod 4 266 2, 037 1, 068 1, 069 1 100% 1 0 0 0 0 0
Table 12: Detailed results of callback and semantics recovery in KDE binaries (CGE stands for call graph edges).
# CGE # CGE # CGE Callback From Q T RE Symbols From Q T RE
Binary Name Time(s) # Symbol
A NGR G HIDRA G HIDRA+Q T RE # Recovered % # Total # Prop. # Signal # Slot # Param. # Total
libQtSql 0 671 4, 206 2, 207 2, 207 0 0 0 0 5 9 13 27
QtCarCluster 337 40, 712 69, 025 23, 503 23, 636 133 84.2% 158 0 373 192 357 922
QtCarParrot 661 58, 175 72, 324 21, 296 21, 302 6 15.8% 38 0 673 185 378 1, 236
QtCarNavServer 248 15, 154 40, 223 17, 981 18, 262 281 96.2% 292 0 160 82 223 465
libQtCarSim 298 45, 882 57, 169 21, 430 21, 464 34 100% 34 0 528 3 280 811
libQtCarServiceMgr 22 3, 232 7, 124 3, 449 3, 491 42 89.4% 47 0 14 12 11 37
libQtXmlPatterns 168 703 72, 277 17, 893 17, 897 4 28.6% 14 0 0 0 0 0
libQtCore 200 4, 191 34, 100 18, 773 18, 783 10 24.4% 41 52 75 50 85 262
libQtGui 1, 726 14, 534 N/A 88, 111 88, 413 302 48.6% 622 649 349 602 546 2, 146
QtCarMonitor 30 8, 500 11, 106 6, 660 6, 660 0 0% 3 0 114 0 87 201
QtCarNetManager 440 53, 842 69, 943 23, 493 23, 559 66 69.5% 95 0 626 49 355 1, 030
libQtCarGUI 4, 636 138, 461 N/A 146, 878 149, 833 2, 955 90% 3, 282 15 845 1, 160 1, 022 3, 042
QtCarSpeechRecognizer 410 51, 547 68, 072 24, 277 24, 322 45 95.7% 47 0 585 50 347 982
libQtCarCANData 0 12, 712 N/A 1, 539 1, 539 0 0 0 0 12 0 18 30
libQtSvg 12 714 5, 680 2, 197 2, 201 4 36.4% 11 5 1 7 11 24
QtCarEbServerIC 486 43, 050 54, 939 25, 909 26, 106 197 90.4% 218 0 449 37 228 714
libQtCarMediaV2 267 35, 362 75, 291 20, 901 21, 252 351 72.4% 485 0 64 18 78 160
QtCarEVLogService 194 34, 420 38, 690 15, 775 15, 781 6 75% 8 0 382 9 188 579
libQtNetwork 102 1, 816 N/A 10, 273 10, 364 91 56.5% 161 1 96 78 73 248
QtCarScreenshot 0 465 421 219 219 0 0 0 0 0 0 0 0
QtCarMediaServerV2 64 13, 811 20, 655 11, 705 11, 758 53 84.1% 63 0 150 0 124 274
libQtCarUtils 99 17, 161 32, 439 12, 650 12, 695 45 67.2% 67 0 63 55 98 216
QtCarVehicle 1, 567 41, 296 48, 564 29, 204 30, 779 1, 575 100% 1, 575 0 437 88 283 808
libQtDBus 17 726 9, 858 4, 769 4, 783 14 51.9% 27 3 12 12 26 53
libQtOpenGL 16 1, 477 8, 479 4, 716 4, 720 4 66.7% 6 0 1 0 1 2
libQtCarAlerts 5 1, 096 1, 220 711 715 4 50% 8 0 12 5 13 30
libQtDesigner 43 6, 724 73, 650 36, 919 36, 919 0 0 0 7 31 29 51 118
libQtHelp 40 1, 015 10, 180 3, 874 3, 901 27 32.9% 82 3 20 16 20 59
libQtMultimedia 5 419 1, 289 615 631 16 100% 16 0 11 0 2 13
libQtCarPower 18 4, 594 8, 060 4, 914 4, 918 4 100% 4 0 65 3 55 123
libQtWebKit 1, 301 214, 219 N/A 159, 019 159, 059 40 61.5% 65 38 57 23 46 164
QtCarGpsManager 359 48, 125 63, 677 23, 226 23, 242 16 84.2% 19 0 507 9 281 797
QtCarBrowser 59 14, 966 23, 097 10, 580 10, 606 26 92.9% 28 0 186 4 146 336
libQtCarUIFramework 653 61, 679 80, 410 49, 863 49, 991 128 84.8% 151 0 450 215 290 955
libQtMultimediaKit 193 13, 970 12, 977 6, 835 6, 835 0 0% 1 78 201 123 194 596
libQtLocation 38 2, 192 11, 654 6, 649 6, 661 12 12.9% 93 0 0 0 0 0
libQtDeclarative 64 3, 652 57, 123 29, 555 29, 555 0 0 0 100 83 38 47 268
libQtTest 0 276 1, 773 940 940 0 0 0 0 0 0 0 0
QtCarServer 940 75, 816 126, 496 38, 481 38, 567 86 85.1% 101 0 878 117 744 1, 739
libQtCarVAPI 308 46, 571 54, 498 21, 917 21, 936 19 95% 20 0 369 28 184 581
QtCarAudiod 305 35, 108 41, 079 25, 310 25, 405 95 88% 108 0 376 17 162 555
QtCarSimService 2, 169 2, 076 1, 247 701 1, 553 852 100% 852 0 5 0 0 5
libQtScript 131 878 26, 078 11, 346 11, 347 1 33.3% 3 0 1 1 1 3
Table 13: Detailed results of callback and semantics recovery in Tesla binaries (CGE stands for call graph edges, N/A
indicates unavailable results due to exceptions when analyzing the binary).