Software Reverse Engineering Education
Software Reverse Engineering Education
Software Reverse Engineering Education
SJSU ScholarWorks
Master's Theses Master's Theses and Graduate Research
2009
Recommended Citation
Cipresso, Teodoro, "Software reverse engineering education" (2009). Master's Theses. Paper 3734. http://scholarworks.sjsu.edu/etd_theses/3734
This Thesis is brought to you for free and open access by the Master's Theses and Graduate Research at SJSU ScholarWorks. It has been accepted for inclusion in Master's Theses by an authorized administrator of SJSU ScholarWorks. For more information, please contact [email protected].
A Thesis Presented to The Faculty of the Department of Computer Science San Jose State University
All rights reserved INFORMATION TO ALL USERS The quality of this reproduction is dependent upon the quality of the copy submitted. In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted. Also, if material had to be removed, a note will indicate the deletion.
UMT
Dissertation Publishing
UMI 1478574 Copyright 2010 by ProQuest LLC. All rights reserved. This edition of the work is protected against unauthorized copying under Title 17, United States Code.
ProQuest LLC 789 East Eisenhower Parkway P.O. Box 1346 Ann Arbor, Ml 48106-1346
SAN JOSE STATE UNIVERSITY The Undersigned Thesis Committee Approves the Thesis Titled SOFTWARE REVERSE ENGINEERING EDUCATION by Teodoro Cipresso APPROVED FOR THE DEPARTMENT OF COMPUTER SCIENCE
'(kkJ^
Jr. Mark Stamp, Department of Computer Science Dr. David Taylor, Department of Computer Science
5~
Date
m
Date
S/zo/a?
Associate Dean
Date
ABSTRACT SOFTWARE REVERSE ENGINEERING EDUCATION by Teodoro Cipresso Software Reverse Engineering (SRE) is the practice of analyzing a software system, either in whole or in part, to extract design and implementation information. A typical SRE scenario would involve a software module that has worked for years and carries several rules of a business in its lines of code. Unfortunately the source code of the application has been lost; what remains is "native" or "binary" code. Reverse engineering skills are also used to detect and neutralize viruses and malware as well as to protect intellectual property. It became frighteningly apparent during the Y2K crisis that reverse engineering skills were not commonly held amongst programmers. Since that time, much research has been undertaken to formalize the types of activities that fall into the category of reverse engineering so that these skills can be taught to computer programmers and testers. To help address the lack of software reverse engineering education, several peer-reviewed articles on software reverse engineering, re-engineering, reuse, maintenance, evolution, and security were gathered with the objective of developing relevant, practical exercises for instructional purposes. The research revealed that SRE is fairly well described and most of the related activities fall into one of two categories: software development related and security related. Hands-on reverse engineering exercises were developed in the spirit of these two categories with the goal of providing a baseline education in reversing both Wintel machine code and Java bytecode.
ACKNOWLEDGEMENTS I would like to thank Dr. Mark Stamp for his enduring patience as I struggled to flush out the details of this work. I would also like to thank my committee members, Dr. David Taylor and Dr. Robert Chun, for their support in this effort. Last but not least, I would like to thank my wife Karyn, who has encouraged me throughout my graduate career to persevere through the rough patches, and my cat Freddy, who always kept me company as I typed many suns to sleep.
Table of Contents
1 2 3 4 Introduction Reverse Engineering in Software Development Reverse Engineering in Software Security Reversing and Patching Wintel Machine Code 4.1 Decompilation and Disassembly of Machine Code 4.2 Wintel Machine Code Reversing and Patching Exercise 4.3 Recommended Reversing Tool for the Wintel Exercise 4.4 Animated Solution to the Wintel Reversing Exercise 5 Reversing and Patching Java Bytecode 5.1 Decompiling and Disassembling Java Bytecode 5.2 Java Bytecode Reversing and Patching Exercise 5.3 Recommended Reversing Tool for the Java Exercise 5.4 Animated Solution to the Java Reversing Exercise 6 Basic Anti-Reversing Techniques 7 Applying Anti-Reversing Techniques to Wintel Machine Code 7.1 Eliminating Symbolic Information in Wintel Machine Code 7.2 Basic Obfuscation of Wintel Machine Code 7.3 Protecting Source Code Through Obfuscation 7.4 Advanced Obfuscation of Machine Code 7.5 Wintel Machine Code Anti-Reversing Exercise 7.6 Solution to the Wintel Anti-Reversing Exercise 7.6.1 Encryption of String Literals 7.6.2 Obfuscating the Numeric Representation of the Record Limit 7.6.3 Control Flow Obfuscation for the Record Limit Check 7.6.4 Analysis of the Control Flow Obfuscation Using Run Traces 8 Applying Anti-Reversing Techniques to Java Bytecode 8.1 Eliminating Symbolic Information in Java Bytecode 8.2 Preventing Decompilation of Java Bytecode 8.3 A Java Bytecode Code Anti-Reversing Exercise 8.4 Animated Solution to the Java Bytecode Anti-Reversing Exercise 9 Reengineering and Reuse of Legacy Software Applications 9.1 Legacy Software Reengineering and Reuse Exercise 9.2 Legacy Software Reengineering and Reuse Exercise Solution 10 Identifying, Monitoring, and Reporting Malware 10.1 Malware Identification and Monitoring Exercise 10.2 Malware Identification and Monitoring Exercise Solution Conclusion References 1 3 6 9 11 14 15 17 20 21 25 26 27 29 31 31 35 40 42 44 44 45 47 48 53 56 58 63 68 69 70 84 86 98 106 106 107 109
vi
List of Tables Table 4.1 Result of decompiling HelloWorld.exe using Boomerang. Table 4.2 Quick reference for panes in CPU window of OllyDbg. Table 5.1 Source listing for ListArguments.java. Table 5.2 Java bytecode contained in ListArguments.class. Table 5.3 Jad decompilation of ListArguments.class. Table 7.1 Debugging information inserted into machine code. Table 7.2 Listing of VerifyPassword.cpp and disassembly ofVerifyPassword.exe. Table 7.3 Simple substitution cipher used to protect string constants. Table 7.4 VerifyPasswordObfuscated.cpp and corresponding disassembly. Table 7.5 COBF obfuscation results for VerifyPassword.cpp. Table 7.6 Encrypted strings are decrypted each time they are displayed. Table 7.7 Using a function of the record limit to obfuscate the condition. Table 7.8 Implementation of the control flow obfuscation in Fig. 7.3. Table 7.9 Statistical data gathered for randomized control-flow obfuscation. Table 8.1 Unobfuscated source listing of CheckLimitation.java. Table 8.2 Jad decompilation of ProGuard obfuscated bytecode. Table 8.3 Jad decompilation of SandMark (and ProGuard) obfuscated bytecode. Table 8.4 Listing of DateTime.Java. Table 8.5 Jad decompilation of DateTime.class obfuscated by Zelix Klassmaster. Table 9.1 Sample business logic component to reuse and reengineer. Table 9.2 Interface data structure SMPLCALC-INTERFACE in SMPLCALC.cpy. Table 9.3 XML Schema generated the from COBOL data structure. Table 9.4 Partial listing of SmplCalcJaxbMarshaller.java interaction with JAXB. Table 9.5 Updates to JSimpleCalculator.java in support of JAXB marshalling. Table 9.6 Example native method declaration for the JNI XML bridge. Table 9.7 Example implementation of the Java to COBOL JNI XML bridge. Table 9.8 Implementation of a COBOL XML layer to the legacy application Table 9.9 Example run of the solution code with debug statements turned on. vn 13 16 22 23 24 33 36 38 39 41 45 47 51 56 60 61 62 67 68 76 86 87 90 91 92 93 95 96
List of Figures
Figure 2.1 Figure 2.2 Figure 3.1 Figure 4.1 Figure 4.2 Figure 5.1 Figure 5.2 Figure 7.1 Figure 7.2 Figure 7.3 Figure 7.4 Figure 8.1 Figure 8.2 Figure 9.1 Figure 9.2 Figure 9.3 Figure 9.4 Figure 9.5 Development process for maintaining legacy software. Development related software reverse engineering scenarios. Security related software reverse engineering scenarios. The five panes of the OllyDbg graphical workbench. Sample slide from the machine code reversing animated tutorial. Execution of Java bytecode versus machine code. FrontEnd Plus workbench session for ListArguments, class. Result of obfuscating all string literals in the program. Obfuscated control flow logic for testing the password record limit. Edit-distances between three run traces of the trial limitation check. Usage of opaque predicates to prevent decompilation. Sample slide from the Java antireversing animated tutorial. Layers of a well-structured legacy software application. Mapping legacy functional discriminators to an object-oriented design. Example JCA implementation for accessing a legacy application. Architecture for legacy application reengineering and reuse from Java. Console-based Java interface to the legacy COBOL program. 4 5 8 16 19 21 27 46 51 55 65 70 73 75 79 83 89 102 104 105
Figure 10.1 Process Monitor session for the Password Vault application. Figure 10.2 Example ThreatExpert report summary for submitted malware. Figure 10.3 Console-based UI for the Alarm Clock example software Trojan.
Vlll
1 Introduction
From very early on in life we engage in constant investigation of existing things to understand how and even why they work. The practice of Software Reverse Engineering (SRE) calls upon this investigative nature when one needs to learn how and why, often in the absence of adequate documentation, an existing piece of software helpful or maliciousworks. The sections that follow cover the most popular uses of SRE and, to some degree, the importance of imparting knowledge of them to those who write, test, and maintain software. More formally, SRE can be described as the practice of analyzing a software system to create abstractions that identify the individual components and their dependencies, and, if possible, the overall system architecture [1], [2]. Once the components and design of an existing system have been recovered, it becomes possible to repair and even enhance them. Events in recent history have caused SRE to become a very active area of research. In the early nineties, the Y2K problem spurred the need for the development of tools that could read large amounts of source or binary code for the 2-digit year vulnerability [2]. Shortly after the preparation for the Y2K problem, in the mid to late nineties, the adoption of the Internet by businesses and organizations brought about the need to understand in-house legacy systems so that the information held within them could be made available on the Web [3]. The desire for businesses to expand to the Internet for what was promised to be limitless potential for new revenue caused the creation of many Business to Consumer (B2C) web sites. 1
Today's technology is unfortunately tomorrow's legacy system. For example, the Web 2.0 revolution sees the current crop of web sites as legacy Web applications comprised of multiple HTML pages; Web 2.0 envisions sites where a user interacts with a single dynamic pagerendering a user experience that is more like traditional desktop applications [2]. Porting the current crop of legacy web sites to Web 2.0 will require understanding the architecture and design of these legacy sitesagain requiring reverse engineering skills and tools. At first glance, it may seem that the need for SRE can be lessened by simply maintaining good documentation for all software that is written. While the presence of that ideal would definitely decrease the need; it just has not become a reality. For example, even a company that has brought software to market may no longer understand it because the original designers and developers may have left, or components of the software may have been acquired from a vendor who is no longer in business [1]. Going forward, the vision is to include SRE incrementally, as part of the normal development, or "forward engineering" of software systems. At regular points during the development cycle, code would be reversed to rediscover its design so that the documentation can be updated. This would help avoid the typical situation where detailed information about a software system such as its architecture, design constraints, and trade-offs are found only in the memory of its developer [1].
Design Recovery No
Software Engineer
/ \
[ Depl Legacy
^^mSm*! 3t*-^B"
<H| System
L
Deploy .
111
Figure 2.1. Development process for maintaining legacy software, understanding, [3] advises that "practice with reverse engineering techniques improves ability to understand a given system quickly and efficiently." Even though several tools already exist to aid software engineers with the program understanding process, the tools focus on transferring information about a software system's design into the mind of the developer [1]. The expectation is that the developer has enough skill to efficiently integrate the information into their own mental model of the system's architecture. It's not likely that even the most sophisticated tools can replace experience with building mental models of existing software; [4] states "commercial reverse engineering tools produce various kinds of output, but software engineers usually don't how to interpret and use these pictures and reports." The lack of reverse engineering skills in most programmers is a serious risk to the long-term viability of any organization that employs information technology. The problem of software maintenance cannot be dispelled with some clever technique, [7] argues "re-engineering
code to create a system that will not need to be reverse engineered again in the futureis presently unattainable." According to [5], there are four software development related reverse engineering scenarios; the scenarios cover a broad spectrum of activities that include software maintenance, reuse, re-engineering, evolution, interoperability, and testing. Fig. 2.2 summarizes the software development related reverse engineering scenarios.
Figure 2.2. Development related software reverse engineering scenarios. The following are tasks one might perform in each of the reversing scenarios [5]: > Achieving Interoperability with Proprietary Software: Develop applications or device drivers that interoperate (use) proprietary libraries in operating systems or applications. > Verification that Implementation Matches Design: Verify that code produced during the forward development process matches the envisioned design by reversing the code back into an abstract design.
> Evaluating Software Quality and Robustness: Ensure the quality of software before purchasing it by performing heuristic analysis of the binaries to check for certain instruction sequences that appear in poor quality code. > Legacy Software Maintenance, Re-engineering, and Evolution: Recover the design of legacy software modules when source is not available to make possible the maintenance, evolution, and reuse of the modules.
workings and witness first-hand how vulnerabilities find their way into computer programs. Reversing software that has been infected with a virus, is a technique used by the developers of anti-virus products to identify and neutralize new viruses or understand the behavior of malware. Programming languages like Java, which do not require computer programmers to manage low-level system details, have become ubiquitous. As a result, computer programmers have increasingly lost touch with what happens in a system during execution of programs. [3] suggests that programmers can gain a better and deeper understanding of software and hardware through learning reverse engineering concepts. Hackers and crackers have been quite vocal and active in proving that they possess a deeper understanding of low-level system details than their professional counterparts [3]. According to [5], there are four software security related reverse engineering scenarios. Similar to development related reverse engineeringthe scenarios cover a broad spectrum of activities: ensuring that software is safe to deploy and use, protecting clever algorithms or business processes, preventing pirating of software and digital media such as music, movies, and booksand making sure that cryptographic algorithms are not vulnerable to attacks. Fig. 3.1 summarizes the software security related reverse engineering scenarios. The following are tasks one might perform in each of the reversing scenarios [5]: > Detecting and Neutralizing Viruses and Malware: Detect, analyze, or neutralize (clean) malware, viruses, spyware, and adware.
7
Testing Cryptographic Algorithms for Weaknesses: Test the level of data security provided by a given cryptographic algorithm by analyzing it for weaknesses. Testing DRM or License Protection (anti-reversing): Protect software and media digital-rights through application and testing of anti-reversing techniques. Auditing the Security of Program Binaries: Audit a program for security vulnerabilities without access to the source code by scanning instruction sequences for potential exploits.
i
Security Related Software Reverse Engineering
To relieve a high-level language compiler from the difficult task of generating machine instructions, some compilers do not generate machine code directly, instead they generate code in a low-level language such as assembly [8]. This allows for a separation of concerns where the compiler doesn't have to know how to encode and format machine instructions for every target platform or processorit can instead just concentrate on generating valid assembly code for an assembler on the target platform. Some compilers, such as the C and C++ compilers in the GNU Compiler Collection (GCC), have the option to output the intermediate assembly code that the compiler would otherwise feed to the assemblerallowing advanced programmers to tweak the code [9]. Therefore the C and C++ compilers in GCC are examples of compilers that translate high-level language programs to assembly code instead of machine code; they rely on an assembler to translate their output into instructions the target processor can understand. [9] outlines the compilation process undertaken by GCC compiler to render an executable file is as follows: > Preprocessing: Expand macros in the high-level language source file. > Compilation: Translate the high-level source code to assembly language. > Assembly: Translate assembly language to object code (machine code). > Linking (Create the final executable): > Statically or dynamically link together the object code with the object code of the programs and libraries it depends on.
10
> Establish initial relative addresses for the variables, constants, and entry points in the object code.
decompilers for binary executables, but recognizes that other experts disagreeraising the point that some information is "irretrievably lost during the compilation process." Boomerang is a well-known open-source decompiler project that seeks to one day be able to decompile machine code to high-level language source code with respectable results [11]. For those reverse engineers interested in recovering the source code of a program, decompilation may not offer much hope because as [11] states "a general decompiler does not attempt to reverse every action of the compiler, rather it transforms the input program repeatedly until the result is high level source code. It therefore won't recreate the original source file; probably nothing like it." To get a sense of the effectiveness of Boomerang as a reversing tool, a simple program, HelloWorld.c was compiled and linked using the GNU C++ compiler for Microsoft Windows and then decompiled using Boomerang. The C code generated by the Boomerang decompiler when given HelloWorld.exe as input was quite disappointing: the generated code looked like a hybrid of C and assembly language, had countless syntax errors, and ultimately bore no resemblance to the original program. Table 4.1 contains the source of HelloWorld.c and some of the code generated by Boomerang. Incidentally, the Boomerang decompiler was unable to produce any output when HelloWorld.exe, was built using Microsoft's Visual C++ 2008 edition compiler. The full length of the C code generated by Boomerang for the HelloWorld.exe program contained 180 lines of confusing, nonsensical control structures and function calls to undefined methods. It is surprising to see such a poor decompilation result, but as 12
Table 4.1. Result of decompiling HelloWorld.exe using Boomerang. HelloWorld.c: 01 02 03 04 05 06 #include <stdio.h> int main(int argc, char *argv[]) { printf("Hello Boomerang World\n"); return 0;
Boomerang decompilation of HelloWorld.exe (abbreviated): 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 union { size32[] x83; unsigned int x84; } globallO; size32 global3 = - 1 ; / / 4 bytes // address: 0x401280 void _start() { set__app_type () ; procl(); } // address: 0x401150 void procl() { size32 eax; // r24 size32 ebp; // r29 size32 ebx; // r27 int ecx; // r25 int edx; // r2 6 int esp; // r28 SetUnhandledExceptionFilter(); proc5(pc, pc, 0x401000, ebx, ebp, eax, ecx, edx, ebx, esp - 4, SUBFLAGS32(esp - 44, 4, esp - 4 8 ) , esp - 48 == 0, (unsigned int) (esp - 44) < 4 ) ;
22:
[11] states: "Machine code decompilation, unlike Java/.NET decompilation, is still a very immature technology." To ensure that decompilation was given a fair trial, another decompiler was tried on the HelloWorld.exe executable. The Reversing Engineering Compiler or REC is both a compiler and a decompiler that claims to be able to produce a "C-like" representation of machine code [12]. Unfortunately, the results of the decompilation using REC were similar to that of Boomerang. Based on the current state 13
of decompilation technology for machine code, using a decompiler to recover the highlevel language source of an executable doesn't seem feasible; however, because of the one-to-one correspondence between machine code and assembly language statements [10], we can obtain a low-level language representation. Fortunately there are graphical tools available that not only include a disassembler, a tool which generates assembly language from machine code, but also allow for debugging and altering the machine code during execution.
Password Vault application defines a constant named "TRIALVERSION" which causes the resulting executable to limit the number of password records a user may create to only five, using conditional compilation. This limitation is very similar to limitations found in many shareware and trialware applications that are available on the Internet.
15
Disassembler
Registers
-J_
Plugins Options Window Help
Oily Dbg PassworriVault.exe - [CPU - main thread, module Password] F M ! 3 t 3 | C | File View\Oebug
.__.
...
6jjlix
liMMfe!:M B84012S1 00481283 004S1286 0040128D 00401293 00401298 004EU299 004012R9 004012fil 00401203
* ss
. . . . . . . . . .
PUSH EBP now EBP,ESP 89E5 SUB ESP 8 83EC 08 C70424 010000 MOU DWORD PTR SS:[ESP],1 FF1S 9892S900 CALL DWORD PTR DS: [<&msucrt._ E8 B8FEFFFF CALL Passuord.00401150 90 NOP 8DB426 000000 LEfl ESI,DWORD PTR DSlCESI] PUSH EBP SS MOU EBP,ESP 89ES 83EC 08 SUB ESP, 8
MOU niiinRn P T B <W. T F ^ P I ?
mLMMMM
EBP=3022FFF0
EBX 80000000 ECX 0022FFB0 EDX 7C90E4F4 r v EBX 7FFDDB00 ESP 6022FFC4 EBP S022FFF0 ESI FFFFFFFF EDI 7C910208 n EIP 00401280 P C 0 ES 8023 3 P 1 CS 001B 3 i! SSjafiS
Dump
Information
Stack
Pane Disassembler
Capabilities > Edit, debug, test, and patch a binary executable using actions available on a popup menu. > Patch an executable by copying edits to the disassembly back to the binary. > Display the contents of memory or a file in one of 7 predefined formats: byte, text, integer, float, address, disassembly, or PE Header. > Set memory breakpoints (triggered when a particular memory location is read from or written to). > Locate references to data in the disassembly (executable code). > Decode and resolve the arguments of the currently selected assembly instruction in the Disassembler pane. > Modify the value of register arguments. 16
Dump
Information
Registers
> View memory locations referenced by each argument in either the Disassembler of Dump panes. > Decodes and displays the values of the CPU and FPU (FloatingPoint Unit) registers for the currently executing thread. > Floating point register decoding can be configured for MMX (Intel) or 3DNow! (AMD) multimedia extensions. > Modify the value of CPU registers. > Display the stack of the currently executing thread. > Trace stack frames. In general, stack frames are used to: Restore the state of registers and memory on return from a call statement. Allocate storage for the local variables, parameters, and return value of the called subroutine. Provide a return address.
Stack
For instructional purposes, an animated tutorial that demonstrates the complete end-to-end reverse engineering of the C/C++ Password Vault application was created using Qarbon Viewlet Builder and can be viewed using Macromedia Flash Player. The tutorial begins with the Password Vault application and OllyDbg already installed on a Windows XP machine. Fig. 4.2 contains an example slide from the animated tutorial. The animated tutorial, source, and installer for the machine code version of Password Vault can be downloaded from the following locations: > Wintel Reversing & Patching Animated Solution: http://reversingproject.info/repository.php?fileID=4_l_l > Password Vault C/C+ + Source code: http://reversingproject.info/repository.php?filelD=4_l _2 > Password Vault C/C++ Windows installer: http://reversingproject.info/repository.php?filelD=4_l_3 Begin viewing the animated tutorial by extracting passwordjvault_cpp _reversing_exercise.zip to a local directory and either running password_vault_cpp_reversing_exercise.exe which should launch the standalone version
18
OlIvDbg - PasswordVault.e e - [Dump - Password: rdala 0 0 5 5 D 0 0 0 [ p j File View Debug Plugins Options Window Help
---
00590FFF)
HHQ
klfii*]
i
il
a la c
The Dump window allows for searching through the data and setting a breakpoint that is triggered when data beginning at a specified address is read or modified.
/4 7w3idu2aj7M 6m6w6m.6 F 7W 6W 4&fa)-& AI 4 fl 0 4 5 F 41 2 F 0 72 65
73 26 53 74 6F 20 6C 74 63 63 27 2C 74 6F 20 72 6F 2R 73 69 63 6E 6F 65 6E 6F 6E 68 72 6F 6E 6D ?? 6F SO 6E 20 54 t<> 69 20 20 69 2B 00 74 65 6F 74 2C 20 29 72 61 64 6E 7D 77 74 6F 29 72 29 61 74 67 65 65 72 63 20 69 72 20 75 77 68 -7A 6E 6 7 7 0 20 28 20 32 35 36 6F:6E 29 2 8 2@ 3 5 43 68 6 1 20!50 61 29161 20 72 64 00 65i 72 7 3 20 27 22 6 1 6E 6 4 6 1 6C 6C 64 28 64 29 50 61 80 54 68 29 77 61 2 2 2E 0 0 6F 7 2 6 4 2 0 61 2 8 72 64 09 65 78 69 64 20 76 73 70 6S 60i65 20 2 0 6 6 6F 2 0 6E 6 5 20 73 70 6E 74 20 64 20 65 6F 7 2 7 2 7 7 6 9 6C 74!68 6F 20^73 61 9 9 ! 4 1 6E 29!6F 70 61 73 20 6 5 ; 2 0 6E fiC -Sffl 'il 72 20 2D 75 2E 6E 73 50 00 20 27 20 6F 61 73 65 73 44 28 50 5B 73 61 63 22 75 77 65 76 6E 65 6C 75 66 20 74 73 61 7D 6F;6fl 29;7C 42:69 73!69 3512E 67;65 73 77 61- 73 09 54 27 30 2C 2 8 2 7 ; 3E 77: 65 74!61 73177 29!64 20! 73 69 73 52 65 61 73 45 72 74:69 75!6C 69!66 7E!2fl 6E! 6 4 20 76 63 69 61 75 74 65 63 74 20!6E 74!29 65! 74 69L6E 69:6F 79 65 6D 6 5 "51-1 7 n 65 63 90 00 74 2 0 6E 6 7 32 20 20 74 6F 7 2 73 77 68 65 20 72 27 2F 27 20 64 20 2E 0 9 6F 7 2 65 73 65 74 7 0 6C 6 3 6F 73 77 7 2 6F 6E 6 7 74 20 69 65 7D 2 2 2D 2D 61 75 66 69 6C 7 4 72 65 2E 2 9 6F 77 73 61 7 9 2E 76 61 6E 2 9 63 69 20 77 ->-3 1C 7 4 2E 6 9 0 0 7C 2 0 45 6E 63 29 43 72 20;7C 00 68,65 20 64;00 43 6F:72 64 20;63 68 65 74 75 27,2C 29 6 1 : 72 6 5 69;6E 29 4 4 6 5 6C 64 20 52 6 3 i 7 2 69 20s 74 6F 61i 79 29 72 64 73 6F; 72 6 4 72'5D 29 20;70 61 66.6F 72 6 4 ; 2 0 75 29;77 61 61s 7 3 7 3 6 C ! 7 4 2E 65,64 20 20;70 61 64 28 69 59 7 2 6F 29^65 78 7 6 6 9 6E 06;5B 49 6C!69 64 6 E ! 7 5 6D 66 69 65 6 1 ; 7 320 Cta AC CC
3
1
0 0 OBSSITRJB 1 00=bCHO 63 Q85ED120 7 2 8 0 i S L l : O 6F 88550140 45 : (305501 "0 7 0 :00550160 74 : 8855D178 29 80S5D130 75 O055D138 6 1 00b5DlP8 65 ' C1055C1B0 6 1 .6855D3C0 20 86?5D1D0 27 8 0 5 5 0 i E O 6F eSE^DlFO 63 , 00550JO0 65 , 0G5SD218 6 F GO5SD220 6 9 O05SD2 30 ?B 88550240 73 0055029? 64 88SSD260 6 5 SOtEDl^O 41 85502f:n 77 00E^:J:::<O 68 0855D2R9 7 2 1 O 0 5 5 D 2 B B 6E .0O5SD2Ce 6 9 i OO55D2D0 5 4 0055D2E0 72 ,09S50ZFO 77 0 8 5 5 0 : 0 0 69 0855?^10 61 88SSD320 29 0055O3EO 6 6 0 8 5 5 0 3 4 8 6F on^DBse 6 5 GQSSCSuB 7 2 O u S S O ? ^ 90 tA
C IS i xy ;... ; ion /A M I 76 65 c t : t eodoro@reve 6E 6 6 r s i n g p r o j e c t . i n f 20 41 0 I... ! fi 7 2 7 9 ES 2 5 6 - B i t E n c r y 7 9 7 8 Dt Ion u s i n g C r y p 28 00 to++ 5 . 5 . 2 ! . ( . 56 61 ) .Change t h e U a 72 65 u l t Password.Cre 26 52 a t e a Password R 61 72 e c o r d . . . T h e char 7 2 6E a c t e r s ' : r e t u r n 2 7 3C 2 0 6E ' , a n d ' > ' a r e n 72 65 ot a l l o w e d in r e 65 74 c o r d d a t a . . D e l e t 6 5 6 3 e a P a s s w o r d Rec 70 74 ord.The de-script 2 0 2 2 i o n was s e t t o " 5 0 6 1 { # > " . . D i s p l a y Pa 0 0 45 s s w o r d R e c o r d s . E 20 52 d i t a Password R 00 90 e c o r d . [ E r r o r ] . . 7 3 7 3 fin en i s t i n g p a s s 2 0 74 w o r d v a u I t f o r t 7 3 6 5 h e s p e c i f Led u s e 7 3 2 9 rnaroe " < # > " was 7 5 6D n o t f o u n d a s s u c i 8 8 8 9 i n g new v a u i t . . . 6 3 7 5 T h e s p e c i f i e d CM 73 73 r r e n t v a u l t pass 73 29 word e n t e r e d i s Progr 67 72 i n c o r r e c t . 6 9 7 4 a n u 1 1 L now en i t without saving 67 29 safety..[Inf 6E 66 f o r 2 0 6D 0 ] . O n i n v a t i d n 6 2 65 enu o p t i o n nunbe 6 4 2E r was s p e c i f l e d . 7 3 6 5 . T h e nacie w a s s e It* -Vi 1_ ; . - ; _ " .';-
fca
*v
<>V.
tons
Float Disassemble Special
Appearance
j/^achedpracesspaused^rtol.Dl^t^.Ppint'
.; J
'
-J Paused
Figure 4.2. Sample slide from the machine code reversing animated tutorial.
19
top-level public class to be defined per *.java source file and requires that the bytecode be stored in a file with whose name matches the pattern TopLevelClassName.class.
Machine instruction
SSHilBH
Machine code
CPU
Figure 5.1. Execution of Java bytecode versus machine code.
21
Table 5.1. Source listing for ListArguments.java. 01 02 03 04 05 06 07 08 09 package info.reversingproject.listarguments; public class ListArguments { public static void main(String[] arguments){ for (int i = 0; i < arguments.length; i++) { System.out.println("Argument[" + i + " ] : " + arguments[i])
Bytecode is stored in a binary format that is not human-readable and therefore must be "disassembled" in order to be read. Recall that the result of disassembling machine code is assembly language that can be converted back into machine code using an assembler; unfortunately, the same does not hold for disassembling Java bytecode. Sun Microsystem's Java Development Toolkit (JDK) comes with javap a command-line tool for disassembling Java bytecode; to say that javap "disassembles" bytecode is a bit of a misnomer since the output of javap is unstructured text which cannot be converted back into bytecode. The output of javap is nonetheless useful as a debugging and performance tuning aid since one can see which JVM instructions are generated from high-level Java language statements. Table 5.2 lists the Java bytecode for the main method of ListArguments class; notice that the fully qualified name of each method invoked by the bytecode is preserved. It may seem curious that while ListArguments.java contains no references to the class java.lang.StringBuilder, there are many references to it in the bytecode; this is because the use of the "+" operator to concatenate strings is a convenience offered by the Java language that has no direct representation in bytecode. To perform the concatenation, the 22
bytecode creates a new instance of the StringBuilder class and invokes its append method for each occurrence of the "+" operator in the original Java source code (there are three). A loss of information has indeed occurred, but we'll see that it's still possible to generate Java source code equivalent to the original in function, but not in syntax.
Table 5.2. Java bytecode contained in ListArguments.class. 0 1 2 3 4 5 8 i:.:
11 1:
iconst 0 istore 1 iload 1 aload 0 arraylength if icmpge getstatic new #3; dup invokespecial ldc #5; invokevirtual iload 1 invokevirtual ldc #8; invokevirtual aload 0 iload_l aaload invokevirtual invokevirtual invokevirtual iinc 1, 1 goto 2 return
50 #2;
Table 5.3 lists the result of decompiling ListArguments.class using Jad; while the code is different from the original List Arguments.Java program, it is functionally equivalent and syntactically correct, which is a much better result than that seen earlier with decompiling machine code.
23
02 03 04 05 06 07 08 09 10 11 12:
package info.reversingproject.listarguments; import Java.io.PrintStream; public class ListArguments { public static void main(String args[]) { for (int i = 0; i < args.length; i++) System.out.printIn((new StringBuilder()).append("Argument[") .append(i) .append("] :") .append(args[i]) .toString ()); }
An advanced programmer who is fluent in the Java Virtual Machine specification could use a hex editor or a program to modify Java bytecode directly, but this is similar to editing machine code directly, which is error-prone and difficult. In Section 4, which covered reversing and patching of machine code, it was determined through discussion and an animated tutorial that one should work with disassembly to make changes to a binary executable. However, the result of disassembling Java bytecode is a pseudoassembly language, a language that cannot be compiled or assembled but serves to provide a more abstract, readable representation of the bytecode. Being that directly editing bytecode is difficult, and that disassembling bytecode results in pseudo-assembly which cannot be compiled, it would seem that losing Java source code is more dire of a situation than losing C/C++ source code, but of course this is not the case because, as we've seen using Jad, Java bytecode can be successfully decompiled to equivalent Java source code.
24
25
1*1 FiontEnd Plus v1.04 - ListArguments Java - [ListArgu men ts. Java] File Edit Search Window Tools and Options Feedback Homepages Help i i
7]5|x)
! : A i 4 i c P | [ b i ^ a B f i i a M | #4 ft V
!: jjL |3* I B ^
! Courier New
..
A
I10
i
- l ^ s m l i f ^ l
0001 0002 B - O Imports ! 0003 - -*l java.io.PrintStream; 0004 B-CS Methods 0005 % ListArguments() 0006 $ static void main(String argsfl) 0 0 0 7 0008 0009 0010 0011 0012 0013 0014 0015 0016 0017 0018 0019 0020 0021
class
info.reversingproject.listarguments;
ListArguments
p u b l i c L i s t A r g u m e n t s ()
{
iJ
args[])
>
public s t a t i c void main(String
{
")
>
>
Martin Cowley
<l 1
j
zJ 1
27
end-to-end reverse engineering of the Java Password Vault application was created using Qarbon Viewlet Builder and can be viewed using Macromedia Flash Player. The tutorial begins with the Java Password Vault application, FrontEnd Plus, and Sun's Java JDK vl.6 installed on a Windows XP machine. Fig. 5.3 contains an example slide from the animated tutorial. The animated tutorial, source, and installer for the Java version of Password Vault can be downloaded from the following locations: > Java Bytecode Reversing & Patching Animated Solution: http://reversingproject.info/repository. php?fileID=5_4_l > Password Vault Java Source code: http://reversingproject.info/repository.php?fileID=5_4_2 > Password Vault (Java Version) Windows installer: http://reversingproject.info/repository.php?fileID=5_4_3 Begin viewing the tutorial by extractmgpassword_vaultjava_reversing_exercise.zip local directory and either running password_vaultjava_reversing_exercise.exe which to a
should launch the standalone version of Macromedia Flash Player, or by opening the file password_vaultjava_reversing_exercise_viewlet._swf.html in a Web browser.
28
[5] describes three basic anti-reversing techniques: > Eliminating Symbolic Information: The first and most obvious step in preventing reverse engineering of a program is to render unrecognizable, all symbolic information in machine code or bytecode because such information can be quite useful to a reverse engineer. Symbolic information includes class names, method names, variable names, and string constants that are still readable after a program has been compiled down to machine code or bytecode. > Obfuscating the Program: Obfuscation includes eliminating symbolic information, but goes much further. Obfuscation strategies include: modifying the layout of a program, introducing confusing non-essential logic or control flow, and storing data in difficult to interpret organizations or formats. Applying all of these techniques can render a program difficult to reverse, however care must be taken to ensure the original functionality of the application remains intact. > Embedding Antidebugger Code: Static analysis of machine code is usually carried out using a disassembler and heuristic algorithms that attempt to understand the structure of the program. Active or live analysis of machine code is done using an interactive debugger-disassembler that can attach to a running program and allow a reverse engineer to step through each instruction and observe the behavior of the program at key points during it's execution. Live analysis is how most reverse engineers get the job done, so it's common for developers to want to implement guards against binary debuggers.
30
those methods or functions will appear in the .idata (import data) section of the Windows PE header. In production versions of a program, the machine code doesn't directly contain any symbolic information from the original source codesuch as method names, variable names, or line numbers; the executable file only contains the machine instructions that were produced by the compiler [9]. This lack of information about the connection between the machine instructions and the original source is unacceptable for purposes of debuggingthis is why most modern compilers, like GCC, include an option to generate debugging information into the executable file that allow one to trace a failure occurring at a particular machine instruction back to a line in the original source code [9]. To show the various kinds of symbolic information that are inserted into machine code to enable debugging of an application, the GNU C++ compiler was directed to compile the program Calculator.cpp with debugging information but to generate assembly language instead of machine code. The source code for Calculator.cpp and the generated assembly language equivalent are given in Table 7.1. The GNU compiler stores debug information in the symbol tables (.stabs) section of the Windows PE header so that it will be loaded into memory as part of the program image. It should be clear from the generated assembly in Table 7.1 that the debugging information inserted by GCC is by no means a replacement for the original source code of the program. A source-level debugger, like the GNU Project Debugger (GDB), must be able to locate the original source code file to make use of the debugging information embedded in the executable. Nevertheless, debugging information can give plenty of hints to a reverse
32
engineer, such as the count and type of parameters one must pass to a given method. An obvious recommendation to make here, assuming there is an interest in protecting machine code from being reverse engineered, is to ensure that source code is not compiled for debugging when generating machine code for use by customers.
Table 7.1. Debugging information inserted into machine code. Calculator.cpp: int main(int argc, char *argv[]) 01 02 { 03 string input; int opl, op2; char fnc; long res 04 cout << "Enter integer 1: "; 05 getline (cin, input); opl = atoi(input.c_str()) 06 cout "Enter integer 2: "; 07 getline (cin, input); op2 = atoi(input.c_str()) 08 cout "Enter function [+|-|*]: "; 09 getline (cin, input); fnc = input.at (0); 10 switch (fnc) 11 { 12 case ' + ' : 13 res = doAdd(opl, op2); break; 14 case '-': 15 res = doSub(opl, op2); break; 16 case '*': 17 res = doMul(opl, op2); break; 18 } 19 cout "Result: " << res endl; 20 return 0; 21 } 22 23 long doAdd(int opl, int op2) { return opl + op2; } 24 long doSub(int opl, int op2) { return opl - op2; } long doMul(int opl, int op2) { return opl * op2; } Calculators (abbreviated assembly): "Calculator.cpp" 01 .file "C:/SRECD/MiscCPPSource/Calculator/",100,0,0,LtextO 02 .stabs 03 .stabs "Calculator.cpp",100,0,0, LtextO 04 .stabs "main:F(0,3)",36,0,12,_main 05 .stabs "argc:p (0,3)",160,0,12,8 06 .stabs "argv:p(40,35)",160,0,12,12 06 main: 07 .stabs "Calculator.cpp",132,0,0,Ltext 08 call Z5doAddii 09 call Z5doSubii 10 call Z5doMulii
33
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
.stabs "_Z5doAddii:F(0,18)",36,0,33, .stabs "opl:p(0,3)",160,0,33,8 .stabs Mop2:p(0,3)",160,0,33,12 Z5doAddii: movl 12(%ebp), %eax addl 8(%ebp), %eax .stabs "_Z5doSubii:F(0,18)",36,0,34, .stabs "opl:p(0,3)",160,0,34,8 .stabs "op2:p(0,3)",160,0,34,12 Z5doSubii: .stabn 68,0,34,LM33- Z5doSubii movl 8(%ebp), %eax subl %edx, %eax .stabs "_Z5doMulii:F(0,18)",3 6,0,35, .stabs "opl:p(0,3)",160,0,35,8 .stabs "op2:p(0,3)",160,0,35,12 Z5doMulii: .stabn 68,0,35,LM35- Z5doMulii movl 8(%ebp), %eax imull 12(%ebp), %eax
Z5doAddii
Z5doSubii
Z5doMulii
The hunt for symbolic information doesn't end with information embedded by debuggers, it continues on to include the most prolific author of such helpful information the programmer. Recall that in the animated tutorial on reversing Wintel machine code (see Section 4) the key piece of information that led to the solution was the trial limitation message found in the .rdata {read-only) section of the executable. One can imagine that something as simple as having the Password Vault application load the trial limitation message from a file each time time it's needed and immediately clearing it from memory would have prevented the placement of a memory breakpoint on the trial message, which was an anchor for the entire tutorial. An alternative to moving the trial limitation message out of the executable would be to encrypt it so that a search of the dump would not turn up any hits; of course encrypted symbolic information would need to be decrypted before it is used. Encryption of symbolic information, as was discussed
34
in relation to the Wintel animated tutorial, is an activity related to the obfuscation of a program, which we discuss next.
to the machine code for the Password Vault application rendered it extremely difficult to understand. The transformations performed by EXECryptor caused such extreme differences in the machine code, including having compressed parts of it, that it was not possible to line up the differences between the original and obfuscated versions of the machine code to show evidence of the obfuscations. Therefore, to demonstrate machine code obfuscations in a way that is easy to follow, we'll perform obfuscations at the source code level and observe the differences in the assembly language generated by the GNU C++ compiler. The key idea here is that the obfuscated program has the same functionality as the original, but is more difficult to understand during live or static analysis. There are no standards for code obfuscation, but it's relatively important to ensure that the obfuscations applied to a program are not easily undone because deobfuscation tools can be used to eliminate easily identified obfuscations [5]. Table 7.2 contains the source code and disassembly of VerifyPassword.cpp, a simple C++ program that contains an insecure password check that is no weaker than the implementation of the Password Vault trial limitation check. To find the relevant parts of .text and .rdata sections that are related to the password check, the now familiar technique of setting a breakpoint on a constant in the .rdata section was used.
Table 7.2. Listing of VerifyPassword.cpp and disassembly ofVerifyPassword.exe. VerifyPassword.cpp: 01: i n t m a i n ( i n t argc, char *argv[]) 02: { 03: const char *password = "juplter"; 04: string specified; 05: cout << "Enter password: ";
36
06 07 08 09 10 11 12 13 14
getline (cin, specified); if (specified.compare(password) == 0) { cout << "[OK] Access granted." << endl; } else { cout << "[Error] Access denied." << endl;
# "j up Iter"
004014A7 MOV DWORD PTR SS:[ESP+4],VerifyPa.00443019 004014CD MOV DWORD PTR SS:[ESP+4],VerifyPa.0044302E RDATA SECTION 00443000 6A75702174657200456E746572207061 00443010 7373776F72643A20005B4F4B5D204163 00443020 63657373206772616E7465642E005B45 00443030 72726F725D204163636573732064656E 00443040 6965642E000000000000000000000000 jup!ter.Enter pa ssword: .[OK] Ac cess granted..[E rror] Access den ied
Using the simple program VerifyPassword.cpp, we now investigate applying obfuscations to make machine code more difficult to reverse engineer. The first obfuscation that will be applied is a data transformation technique which [5] calls "Modifying Variable Encoding". Essentially this technique prescribes that all meaningful and sensitive constants in a program be stored or represented in an alternate encoding, such as ciphertext. For numerics, one can imagine storing or working with a function of a number instead of the number itself; for example, instead of testing for a <
37
10, we can obscure the test by checking if 1.2 < 1.210 instead. To make string constants unreadable in a dump of the .rdata section we can employ a simple substitution cipher whose decryption function would become part of the machine code. A simple substitution cipher is an encryption algorithm where each character in the original string is replaced by another using a one-to-one mapping [20]. Substitution ciphers are easily broken because the algorithm is the secret [21], so while we will use one for ease of demonstration, stronger encryption algorithms should be used in real-world scenarios. Table 7.3 contains the definition of a simple substitution cipher that shifts each character 13 positions to the right in the local 8-bit ASCII or EBCDIC character set. Ciphertext is generated or read in printable hexadecimal to allow all members of the character set, including control characters, to be used in the mappings. Note: unlike ROT13 [22], this cipher is not it's own inversemeaning that shifting each character an additional 13 positions to the right will not perform decryption.
Table 7.3. Simple substitution cipher used to protect string constants. SubstitutionCipher.h: 01 class SubstitutionCipher 02 03 public: SubstitutionCipher (); 04 string encryptToHex(string plainText); 05 string decryptFromHex(string cipherText) 06 07 private: unsigned char encryptTable[256]; 08 09 unsigned char decryptTable[256]; 10 char hexByte[2]; 11
38
Using the substitution cipher given in Table 7.3, we replace each string constant in VerifyPassword, cpp with its equivalent ciphertext. Even strings with format modifiers such as "%s" and "%d" can be encrypted as these inserts are not interpreted by methods such as printf and sprint/until execution time. Table 7.4 contains the source code and disassembly for VerifyPasswordObfuscated.exe, where each string constant in the program is stored as ciphertext; when the program needs to display a message, the ciphertext is passed to the bundled decryption routine. The transformation we've manually applied removes the helpful information the string constants provided when they were stored in the clear. Given that modern languages have well-documented grammars, it should be possible to develop a tool that automatically extracts and replaces all string constants with ciphertext that is wrapped by a call to the decryption routine.
Table 7.4. VerifyPasswordObfuscated.cpp and corresponding disassembly. VerifyPasswordObfuscated.cpp: 01: #include "substitutioncipher.h" 02: using namespace std; 03: static const char *password = "77827D2E81727F"; 04: static const char *enter_password = "527B81727F2D7D6E8080847C 7F71472D"; 05: static const char *password_ok = "685C586A2D4E70707280802D747 F6E7B8172713B"; 06: static const char *password_bad = "68527F7F7C7F6A2D4E70707280 802D71727B7672713B"; 07 int main(int argc, char *argv[]) 08 SubstitutionCipher cipher; 09 string specified; 10 11 cout << cipher . decryptFromHex (enter__password) ; 12 getline(cin, specified); 13 if (specified.compare(cipher.decryptFromHex(password)) == 0) 14 { 15 cout << cipher.decryptFromHex(password_ok) << endl; 16 } else 17 {
39
Once all constants have been stored in an alternate encoding, the next step one could take to further protect the VerifyPassword.cpp program would be to obfuscate the condition in the code that tests for the correct password. Applying transformations to disguise key logic in a program is an activity related to the anti-reversing technique Obfuscating the Program. For purposes of demonstration we'll implement some obfuscations to the trial limitation check in the C++ version of the Password Vault application, which was introduced in Section 4, but first we discuss an additional application of the technique Obfuscating the Program that helps protect intellectual property when proprietary software is shipped as source code.
40
intellectual property that is worth protecting, one can perform transformations to the source code which make it difficult to read, but have no impact on the machine code that would ultimately be generated when the program is compiled. To demonstrate source code obfuscation, COBF [23], a free C/C++ source code obfuscator was configured and given VerifyPassword.cpp as input; the results of which are displayed in Table 7.5.
Table 7.5. COBF obfuscation results for VerifyPassword.cpp. COBF invocation: 01 C:\cobf_l.0 6\src\win32\release\cobf.exe 02 @C:\cobf_l.06\src\setup_cpp_tokens.inv -o cobfoutput -b -p C: 03 \cobf_l.0 6\etc\pp_eng_msvc.bat VerifyPassword.cpp COBF obfuscated source for VerifyPassword.cpp: 01 #include"cobf.h" 02 Is lp Ik;If lo(lf ln,ld*lj[]){11 Id*lc="\x6a\x75\x70\x21\x74 03 \x65\x72";lh la;Ib"\x4 5\x6e\x7 4\x65\x72\x2 0\x7 0\x61\x7 3\x7 3 04 \x7 7\x6f\x7 2\x64""\x3a\x20";li(lq,la) ; lm (la. lg (lc) ==0) {lb"\x5b 05 \x4f\x4b\x5d\x20\x41" "\x63\x63\x65\x7 3\x73\x20\x67\x7 2\x61\x6e 06 \x7 4\x65\x64\x2e"le; } lr {Ib"\x5b\x4 5\x72\x72\x6f \x72\x5d 07 \x2 0\x41\x63\x63\x65\x73\x7 3\x2 0\x64" "\x65\x6e\x6 9\x65 08 \x64\x2e"le; } } COBF generated header (cobf.h): 01 02 03 04 05 06 07 08 #define #define #define #define #define #define define #define Is lp Ik If lo Id 11 lh using namespace std int main char const string 09 10 11 12 13 14 15 #define #define #define #define #define #define #define lb li lq lm lg le lr cout getline cin if compare endl else
COBF replaces all user-defined method and variables in the immediate source file with meaningless identifiers. In addition, COBF replaces standard language keywords and library calls with meaningless identifiers, however these replacements must be undone before compilation; for example, the keyword "if cannot be left as "lm". 41
Therefore, COBF generates the cobf.h header file which includes the necessary substitutions to make the obfuscated soure compilable. Through this process, all userdefined method and variable names within the immediate file are lost, rendering the source code difficult to understand, even if one performs the substitutions prescribed in cobf.h. Since COBF generates obfuscated source as a continuous line, any formatting in the source code that served to make it more readable is lost. While the original formatting cannot be recovered, a code formatter such as Artistic Style can be used to format the code using ANSI formatting schemes so that methods and control structures can again be identified via visual inspection. Source code obfuscation is a fairly weak form of intellectual property protection, but it does serve a purpose in real-world scenarios where a given application needs to be built on the end-user's target computer instead of being pre-built and delivered on installation media.
of a program through tracing or stepping through instructions, we can employ control flow obfuscations, which introduce confusing, randomized, benign logic that serves to make live and static analysis (debugging and tracing) difficult. The often randomized and recursive nature of effective control flow obfuscations can make traces more difficult to understand and interactive debugging sessions less helpful: randomization makes the execution of the program appear different each time it's run, while recursion makes stepping through code more difficult because of deeply nested procedure calls. In [5], three types of control flow transformations are introduced: computation, aggregation, and ordering. Computation transformations reduce the readability of machine code and, in the case of opaque predicates, can make it difficult for a decompiler to generate equivalent high-level language source code. Aggregation transformations destroy the high-level language structure of a program. For example, if a programmer used the structured programming technique of functional decomposition, inlining the code of many functions into a single function in the machine code would make it impossible to recover the original program structure. Ordering transformations randomize the order of operations in a program to make it more difficult to follow the logic of a program during live or static analysis (debugging or tracing). To provide an example of how control flow obfuscations can be applied to protect a non-trivial program, we'll apply both a computation and ordering control flow obfuscation to the trial limitation check in the Password Vault application, and analyze their potential effectiveness, by gathering some statistics during execution of the obfuscated code.
43
44
The net effect of encrypting the literals is shown in Fig. 7.1 where a dump of the .rdata section of the Password Vault program image no longer yields the clues it once did. Since the literals are no longer readable, one cannot simply locate and set a breakpoint on the trial limitation messageas was done in the solution to the Wintel machine code
45
reversing exercisecausing a reverser to choose an alternate strategy. Note that more than just the trial limitation message would need to be encrypted otherwise it would look quite suspicious in a memory dump alongside other non-encrypted strings!
30400300 00401000 005SC000 00550000 00591080 00592000 00599000 0059B000 00001008 00156600 00001000 00034333 00301003 00007003 00301003 33802033 Password Password Password Password Password Password Password Password .text .data . i-data PE header code data I nag I nag I nag Inag I nag Inag Inag Inag
R R R
R
RUIE
RUE
RUIE FttUE RUIE RUIE
/4
imports resources
R R R R
RUE RUE
73 22 64 50 69 76 2E 54 79 75 61 6D 72 68 6E
73 7B 20 61 6C 65 00 68 69 6C 63 29 64 69 2E
77 2P, 73 73 65 64 5B 61 6E 74 68 6E 73 73 00
6F>?2 7D122 75163 73:77 20122 20:73 57:61 6E;6B 67 20 21 20 65|64 75*6D 20:61 20:74 4D: 65
64 20 63 6F 7B 75 72 20 50 59 20 62 6C 72 73
20 77 65 72 2P, 63 6E 79 61 6F 74 65 6C 69 73
66! 6F 61,73 73:73 64:20 7D; 22 63:65 69! 6E 6F!75 73173 75:20 68|6S 72:20 6FI77 61.6C 61-67
72 20 66 76 20 73 67 20 77 68 20 6F 65 20 65
20 63 75 61 77 73 SD 66 6F 61 6D 66 64 76 20
75 73 68 61 6C 6C 75 6C 61 73 66 75 20 00 6F 72 72 64 76i65 61 78 20 72 20 69 65 72 4E 6F 38 37 32 38 37 38 35 32 37 32 38 37 33 42 37 mi 37 32 37 32 37 37 38 37 37 36 37 38 37 35 35
65 6E 79 74 20 6C 00 20 20 20 69 65 6E 73 74 30 43 46 30 31 30 44 44 36 46 33 32 42 37 38 46 44 31 44 32 31 35 41 46 45 36 31 46 41 42
72 67 2E 20 73 6C 00 74 56 72 6D 63 20 69 20 38 37 38 32 32 37 36 38 37 32 37 38 00 34 32 32 35 32 36 32 32 37 36 37 37 37 37 38 37 37
20 65 00 66 61 79 00 72 61 65 75 6F 74 6F 46
ssword for user "C*>" was change d successfuIly.. Password vauIt f ile "{#}" was sa ved successfully ..CUarn ing3 .... Thank you for tr y ing Password Va u It? Vou have re ached the nax inu n number- of reco rds a I Lowed in t h is trial versio n..Message Not F 34 44 37 30 30 32 30 45 32 34 31 30 36 32 36 31 45 33 43 46 31 41 32 30 39 44 36 36 30 31 37 38 38 37 38 37 38 38 32 36 32 37 34 44 37 37 38 36 38 37 37 38 37 37 37 38 36 37 38 32 43i 37 32i38 41 32 35136 32137 39 37 30 38 32 37 44 32 45 38 44 38 33 38 36 45 00 36 43 38 46 38 30 38 45|38 32132 32136 3Sj37 321 37 461 32 43 37 43 38 31 37 45 37 43 37 30 36 44! 35 46 30 46 45 30 39 34 39 46 30 30 32 37 31 32 36 30 32 44 45 32 41 44 46 34 35 39 42 45 33 37 37 32 37 37 38 37 38 38 32 38 37 46 37 32 37 38 37 37 37 32 32 37 37 37 37 32 33 37 37 31 ! 32 32 37 44 38 42:37 30 37 36 33 43|37 31 32 38-33 44:38 32; 37 39: 37 37: 42 35 36 4437 36-37 34:37 39, 38 35 36 30 37 44:37 44: 37 43 37 31i 38 32! 37 36:38 44; 38 42:00 34-37 43:38 44 46 34 34 32 42 46 44 37 30 30 39 37 45 33 42 43 31 45 35 41 42 33 30 31 30 33 00 32 32 37 32 36 37 38 00 37 37 38 36 37 38 36 37 37 37 37 32 38 37 36 38 32 32 32 32 37 00 32 37 33 44 45 32 30 00 31 33 41 45 38 36 37 42 43 34 46 45 33 32 45 32 44 44 44 44 32 00 44 42 8080847C7F712D73 7C7F2D8280727F2D 2F88378FI2F2D346E 802D70756E7B7472 712D808278707280 8073827979863B.. SD6E8080847C7F71 2D836E8279812D73 7679722D2F88378P, 2F2D846E8B2D806E 8372712D80827070 7280807382797986 3B.68646E7F7B767 B746R2D.617S6E7B 782D867C822D737C 7F2D817F86767B74 2D5D6E8080847C7F 712D636E8279812E 2D667C822D756E83 722D7F726E707572 712D817572207P.6E 857670827R2D7B82 7B6F727F2D7C732D 7F72707C7F71802D 6E79797C8472712D 767B2D817S76802D 817F766E792D8372 7F80767C7B3B 5R7280806E74722D 5B7C812D537C827B
30138 46:32 38:33 44 37 44 38 33 38 45:38 33 36 39 37 44:38 32:37 30:38 36:38 36:41 44:38 44:38 44:36 44:36 36:37 44:37 44:38 36:37 46:37 32137 39i37 42:32 46!37 30!37 32:38 43:38
46
The effects of the source code changes in Table 7.7 on the machine code are shown in Fig. 7.2. A function of the record limit is referenced during execution instead of 47
the limit itself. This type of obfuscation is as strong as the function used to obscure the actual condition is to unravel. Keep in mind that a reverse engineer will not have the non-obfuscated machine code for reference, so even a very weak function, like the one used in this solution, may be effective at wasting some of a reverser's time. The numeric function used here is very simple; more complex functions can be devised that would further decrease the readability of the machine code.
48
if (passwordStore.getRecords().size()
084078E0 83BD B0FFFFFF 04 884070E7 -76 21 00487BE9 C74424 08 03000008 084078F1 C74424 04 00000000 004078F9 C70424 36000000 08407180 E8 01B3FFFF 80407105 -E9 BR070000 0040718B 8D45 D8 004071SD S90424 00407110 C785 08FFFFFF FFFFFFFF 0040711A E8 3BBCFFFF
>= T R I f t L R E C O R D L I M I T )
CMP DWORD PTR SS:[EBP-1003,4 JBE SHORT Password.0040710A MOU DWORD PTR SS:CESP+S],3 MOU DWORD PTR SS:[ESP+4],0 MOU DWORD PTR SS:CESP],36 CALL Password.00401406 JMP Password.004078C4 LEA EAX,DWORD PTR SS:CEBP-28] MOU DWORD PTR SS:[ESP],EAX MOU DWORD PTR SS:CEBP-F8],-1 CALL Password.00402D5H
if ((pou(2.0,
Computation Obfuscation
DS:C00SEE420D=32.00000000000000 Stack SS:[0022FBR0]=32.00000000008000 00407100 08407106 8040710E 08407118 00407111 80487113 . DD05 20E45S00 . D08S F8FEFFFF FLD QWORD FLD QWORD FUCOMPP FSTSW RX SRHF JNB SHORT JMP SHORT PTR DS:[SSE4203 PTR SS:[EBP-1083
,,:mm :
Password.004071IS Password.00407140
The record limit of 5 is obscured by the use of the value 32.0 (2A5) when the operands are loaded and the condition is tested.
Figure 7.2. Record limit comperands are represented as exponents with a base of 2. The record limit check is abstracted out into the method isRecordLimitReached which returns whether or not the record limit is reached after having invoked the method isRecordLimitReached_0. The method isRecordLimitReached_0 invokes itself recursively a random number of times, growing the call stack by a minimum of 16 and a maximum of 64 frames. Each invocation of isRecordLimitReached 0 tests whether the record limit has been reached, locally storing the result, before randomly invoking one of
49
isRecordLimitReached_2, or
finally returns whether or not the record limit is reached in the method isRecordLimitReached. Table 7.8 shows the required code changes to implement the control flow obfuscation. Note that a sum of random numbers returned from methods isRecordLimitReached_1, isRecordLimitReachedJ2, and isRecordLimitReached'_3 is stored in randCallSum, a private attribute of the class; this is to protect against a compiler optimizer discarding the calls because they would otherwise have no effect on the state of any variables in the program.
50
reached
Figure 7.3. Obfuscated control flow logic for testing the password record limit. Table 7.8. Implementation of the control flow obfuscation in Fig. 7.3. PasswordVault.cpp: if (passwordStore.getRecords() .size () >= TRIAL_RECORD_LIMIT) ===> if (isRecordLimitReached()) 01 02 03 04 05 06 bool PasswordVault::isRecordLimitReached() { srand(time(NULL)); controlFlowAltRemain = max (4, abs(randO) % 64) return isRecordLimitReached 0();
51
bool PasswordVault::isRecordLimitReached_0() { while (controlFlowAltRemain > 0) { controlFlowAltRemain--; isRecordLimitReached 0() ; bool reached = (pow(2.0, (double)passwordStore.getRecords().size()) >= pow(2.0, 5.0)); 17: randCallSum = 0; switch (abs(randO) % 3) { case 0: randCallSum += isRecordLimitReached_l() ; break; case 1: randCallSum += isRecordLimitReached_2() ; break; case 2: randCallSum += isRecordLimitReached_3() ; break; return reached; unsigned int PasswordVault::isRecordLimitReached_l() return abs(rand()); unsigned int PasswordVault::isRecordLimitReached_2() return abs(rand());
52
to filter the run tracesleaving only instructions executed by the "Password" module. To analyze the effectiveness of the ordering (control flow) obfuscation, statistics on the differences between three different run traces were gathered using a modification of Levenshtein Distance (LD), a generalization of Hamming Distance, to compute the edit-distancethe number of assembly instruction insertions, deletions, or substitutions needed to transform one trace into the other; we've modified LD to consider each instruction instead of each character in the run traces. Fig. 7.4 illustrates the significant differences that exist between the traces at the point of the obfuscated trial limitation check. The randomized control-flow obfuscation causes significant differences in subsequent executions of the trial limitation checkhopefully creating enough of a deterrent for a reverse engineer by hampering live and static analysis efforts. Table 7.9 contains the statistical data that was gathered for the analysis. A C++ implementation of Levenshtein Distance, written for this solution, can be downloaded from http://reversingproject.info/repository.php?fileID=7_6_l. Note that computing the edit-distance between two large files of any type can take many hours a modern PC. For reference, the average size of three traces analyzed in this section is 10MB, and to compute the edit-distance between two of them required an average of-20 hours of CPU time on an Intel Pentium 1.6GHz Dual-core processor. The LD implementation employed in this analysis uses a dynamic-programming approach that requires O(m) space; note that some reference implementations of LD require O(mn) space since they use a(m+ l ) x ( n + 1) matrix which is impractical for large files [25].
54
The ~20 hour execution time for the LD implementation is mainly because the dynamic programming algorithm is quite naive; perhaps an approximation algorithm would perform significantly better.
Analysis of Run Trace Levenshtein Edit Distances Code path: obfuscated thai limitation check * Edit Distance
Trace # 2 - T r a c e # 3 Trace #1 Trace #2 Trace #1 Trace #3 Figure 7.4. Edit-distances between three run traces of the trial limitation check.
55
Trace Comparison Trace #1 -> Trace #2 Trace #2 - Trace #3 Trace #1 -> Trace #3 Mean Edit Distance Trace Comparison Trace #1 -* Trace #2 Trace #2 - Trace #3 Trace #1 - Trace #3
Edit Distance 101414 67590 168892 112632 Standard Deviation 7932.32 31849.5 39781.83
56
constructs result in compatible bytecode with optionally-utilized metadata provides the benefit of allowing legacy Java bytecode to run on newer JVMs, however if a decompiler doesn't know to look for the metadata, some information is lost; for example, the fact that a program used generics would not be recovered and all collections would be of type Object (with cast statements of course). Recall that in Section 4.1 the Boomerang decompiler failed to decompile the machine code for a simple C/C++ "Hello World" program, however in Section 5.1, the Jad decompiler produced correct Java source code for a slightly larger program. Given these results, one does need to be concerned with with protecting Java bytecode from decompilation if there is significant intellectual property in the program. The techniques used to protect machine code in the anti-reversing exercise solution, detailed in section 7.6, can also be applied to Java source code to produce bytecode that is obfuscated. Since Java bytecode is standardized and well-documented there are many free Java obfuscation tools available on the Internet such as SandMark [27], ProGuard [29], and RetroGuard [28] which perform transformations directly on the Java bytecode instead of on the Java source code itself. Obfuscating bytecode is inherently easier than obfuscating source code because bytecode has a significantly more strict and organized representation than source codemaking it much more easy to parse. For example, instead of parsing through Java source code looking for string constants to encrypt (protect), one can easily look in the constant pool section of the bytecode. The constant pool section of a Java Class File, unlike the .rdata section of Wintel machine code, contains a well-documented
57
table data structure that makes available the name and length of each constant; on the other hand, the .rdata section of Wintel machine code simply contains all the constants in the program in a contiguous, unstructured bytestream. The variable names, method names, and string literals, in the constant pool section of Java bytecode provide a wealth of information to a reverse engineer regarding the structure and operation of the bytecode and hence should be obfuscated to protect the software. Therefore, we now look at applying the technique Eliminating Symbolic Information in the context of Java bytecode.
a Java bytecode watermarking and obfuscation research tool, is capable of applying transformation (2), although not easily. Experimentation with SandMark V3.4 was not promising since its "String Encoder" obfuscation function only worked on a trivial Java program; it failed when given more substantial input such as some of the classes that implement the Java version of the Password Vault application. It's clear from a survey of existing Java bytecode obfuscators that a full-function, robust, open-source bytecode obfuscator is sorely needed. Zelix Klassmaster, a commercial product capable of all the three transformations mentioned above, is said to be the best overall choice of Java bytecode obfsucator in [19]. A 30-day evaluation version of Zelix Klassmaster can be downloaded from the company's web site. Of course one can always make small-scale modifications to Java bytecode with a bytecode editor such as CafeBabe [30]. Incidentally, CafeBabe gets its catchy name from the fact that the hexadecimal value OxCAFEBABE comprises the first four bytes of every Java class file; this value is known as the "magic number" which identifies every valid Java class file. To demonstrate applying transformations to Java bytecode, we'll target the bytecode for program CheckLimitation.java whose source code is given in Table 8.1. For this demonstration, assume that a reverse engineer is interested in eliminating the limit on the number of passwords and that we are interested in protecting the software. Begin obfuscating CheckLimtiation.java by applying transformation (1) Name Obfuscation: rename all variables and methods in the bytecode so they no longer provide hints to a reverser when the bytecode is decompiled or edited. Using ProGuard, 59
obfuscate the bytecode and then decompile it using Jad to observe the effectiveness of the obfuscation; the result of decompiling the obfuscated bytecode using Jad is given Table 8.2. As expected, all user-defined variable and method names have been changed to meaningless ones; of course the names of Java standard library methods must be left asis. ProGuard seems to use a different obfuscation scheme for local variables within a
60
Table 8.2. Jad decompilation of ProGuard obfuscated bytecode. public class CheckLimitation 02 03 private static int a = 5; 04 private ArrayList b; 05 06 public CheckLimitation () 07 { 08 b = new ArrayList(); 09 10 11 public boolean a(String s) 12 { 13 if (b.size () >= a) 14 15 System.out.println("[Error] The maximum number of passwords has been exceeded!"); 16 return false; 17 else 18 19 b.add(s); 20 System.out.println((new StringBuilder()).append("[Info] password(") .append(s) .append(") added successfully.") .toString ()); 21: return true; 22: } 23: } 24: public static void main(String args[]) 25 26 27 CheckLimitation checklimitation = new CheckLimitation(); 28 boolean flag = true; 29 for (int i = 0; i < args.length && flag; i++) 30 if(!checklimitation.a(args[i])) flag = false; 31 32 33
01
method; it's not clear why the variable "loop" in the main method has been changed to "flag" since it's still a very descriptive name. Next we further obfuscate the bytecode by applying transformation (2) String Encryption, and we do so by employing the "String Encoder" obfuscation in SandMark to protect the string literals in the program from being understood by a reverser. The "String 61
Encoder" function in SandMark implements an encryption strategy for literals in the bytecode that is similar to the one which was demonstrated at the source code level in the Wintel machine code anti-reversing background section: each string literal is stored in a weakly encrypted form and decrypted on-demand by a bundled decryption function. Table 8.3 contains the Jad decompilation result for the CheckLimitation.java bytecode that was first obfuscated using ProGuard and subsequently obfuscated using the "String Encoder" functionality in SandMark.
Table 8.3. Jad decompilation of SandMark (and ProGuard) obfuscated bytecode.
01 public class CheckLimitation { 02: 03: private static int a = 5; 04 private ArrayList b; 05: public CheckLimitation() 06: 07: { 08: b = new ArrayList(); 09: } 10: public boolean a(String argO) 11: 12: { 13: if(b.size() >= a) 14: 15: System.out.println(Obfuscator.DecodeString("\253\315\253\315\uFF9E\u2A3 Du5D69\u2AA5\u3884\u91CF\u5341\u5604\uDF5B\uA902\uB6C8\u0C8E\u67 61\ulF3 5\u35 9D\uBD96\uADA4\u94 6F\u8 5EE\uE8A0\u9274\u58 67\u2C9F\u3 07 7\u5E67\u2A 0B\u90D2\uB83 9\u58FC\uBE95\u0EBA\uDDF4\u313C\uB7 51\uFA9D\ul6 6C\u42A3\u6 DlD\uB25A\uA15E\u02 6E\u6ECE\u908C\u557B\u6ABD\uC5D5\u80 0C\uD38A\u3D97\u FB5E\uC4C2\uBBAC\u9ADC\u253E\u769E\u4D32\u4FB3\u0CC7")); 16 return false; 17 else 18 19 b.add(argO); 20 System.out.println((new StringBuilder()).append(Obfuscator.DecodeString("\253\315\253\315\uFF9E \u2A31\u5D7 5\u2ABl\u3 884\u91E0\u533C\u5 654\uDF6E\uA919\uB6DE\u0CD9\u67 6 3\ulF26\u3581\uBDDF\uADEl")).append(argO).append(Obfuscator.DecodeStrin g("\253\315\253\315\uFFEC\u2A58\u5D7A\u2AB3\u388F\u91D8\u5378\u5604\uDF 7C\uA91F\uB6CE\u0CCD\u67 69\ulF27\u35 96\uBD9 9\uADBC\u947 6\u85EF\uE8F9\u9
62
234") ) .toStringO ) ; 21 return true; 22 23 24 public static void main(String arg0[]) 25 26 27 CheckLimitation checklimitation = new CheckLimitation() 28 boolean flag = true; 29 for(int i = 0; i < argO.length && flag; i++) 30 if(!checklimitation.a(argO[i])) flag = false; 31 32
Note that each string literal is decrypted using the Obfuscator class which was generated by SandMark. Since Obfuscator is a public class, it must be generated into a separate file named Obfuscator.classmaking it very straight-forward for a reverser to isolate, decompile, and learn the encryption algorithm. The danger of giving away the code for the string decryption algorithm is that it could then be used to programmatically update the constants pool section of the bytecode to contain the plaintext versions of each string literal, essentially undoing the obfuscation. Ideally, we would like to prevent a reverser from being able to successfully decompile the obfuscated bytecode; this can be accomplished through control flow obfuscations which we explore next.
conditions "if ( 1 == 1 )" and "if ( 1 == 2 )" implement opaque predicates because the first always evaluates to true, and the second always to false. The essential element in preventing decompilation with opaque predicates is to insert invalid instructions in the else branch of an always-true predicate (or the if-body of an always false predicate). Since the invalid instructions will never be reached during normal operation of the program there is no impact on the program's operation. The obfuscation only interferes with decompilation, where a naive decompiler will evaluate both "possibilities" of the opaque predicate and fail on attempting to decompile the invalid, unreachable instructions. Fig. 8.1 illustrates how opaque predicates would be used to protect bytecode from decompilation. Unfortunately, this technique, often used in protecting machine code from disassembly, cannot be used with Java bytecode because of the presence of the Java Bytecode Verifier in the JVM. Before executing bytecode, the JVM performs the following checks using single-pass static analysis to ensure that the bytecode has not been tampered with; to understand why this is beneficial, imagine bytecode being executed as it's received over a network connection. [31] documents the following checks made by the Java Bytecode Verifier: > Type correctness: arguments of an instruction, whether on the stack or in registers, should always be of the type expected by the instruction. > No stack overflow or underflow: instructions which remove items from the stack should never do so when the stack is empty (or does not contain at least the number of arguments that the instruction will pop off the stack). Likewise, 64
instructions should not attempt to put items on top of the stack when the stack is full (as calculated and declared for each method by the compiler). > Register initialization: Within a single method any use of a register must come after the initialization of that register (within the method). That is, there should be at least one store operation to that register before a load operation on that register. > Object initialization: Creation of object instances must always be followed by a call to one of the possible initialization methods for that object (these are the constructors) before it can be used. > Access control: Method calls, field accesses, and class references must always adhere to the Java visibility policies for that method, field, or reference. These policies are encoded in the modifiers (private, protected, public, etc.).
Opaque Predicate Template
doWorkO;
Figure 8.1. Usage of opaque predicates to prevent decompilation. Based on the high-level of bytecode integrity expected by the JVM, introducing 65
garbage or illegal instructions into bytecode is not feasible. However, this technique does remain viable for machine code, though there is some evidence that good disassemblers, such as IDA Pro, do check for rudimentary opaque predicates [5]. The authors of SandMark claim that the sole presence of opaque predicates in Java bytecode, without garbage bytes of course, can make decompilation more difficult. Therefore, SandMark implements several different algorithms for sprinkling opaque predicates throughout bytecode. For example, SandMark includes an experimental "irreducibility" obfuscation function which is briefly documented as "insert jumps into a method via opaque predicates so that the control flow graph is irreducible. This inhibits decompilation." Unfortunately this was not the case with the program DateTime.java shown in Table 8.4 as Jad was still able to decompile DateTime.class without any problems despite the changes made by SandMark's "irreducibility" obfuscation. The bytes of the unobfuscated and obfuscated class files were compared to verify that SandMark did make significant changes; perhaps SandMark does work for special cases, so more investigation is likely warranted. In any event, opaque predicates seem to be far more effective when inserted into machine code because of the absence of any type of verifier that validates all machine instructions in a native binary before allowing it to execute. SandMark's approach of using control flow obfuscations that leverage opaque predicates in an attempt to the confuse decompilers is not unique because Zelix Klassmaster, a commercial product, implements this approach as well. When Zelix Klassmaster V5.2.3a was given DateTime.class as input with both "aggressive" control
66
Table 8.4. Listing of DateTime.java Listing of DateTime.java (abbreviated): 01 public static void main(String arguments[]) 02 new DisplayDateTime().doDisplayDateTime(); 03 04 05 06 public void doDisplayDateTime() 07 08 Date date = new Date(); 09 System.out.println(String.format(DATE_TIME_MASK, date.toString())); 10: }
flow and "String Encryption" selected, some interesting results were observed in the corresponding Jad decompilation. Table 8.5 lists the Jad decompilation of Zelix's attempt at obfuscating DateTime. class. Zelix performed the same kind of name obfuscation seen with ProGuard, except it went a little too far and renamed the main method; this was corrected by manually adding an exception for methods named "main" in the tool. The results of the decompilation show that Zelix's control flow obfuscation and use of opaque predicates is somewhat effective for this particular example because even though Jad was able to decompile most of the logic in DateTime.class; Zelix's obfuscation caused Jad to lose the value of the constant DATE TIME MASK when using it on line 12, and generate a large block of static, invalid code starting at line 22. In the next two sections (8.3 and 8.4), a Java anti-reversing exercise with a complete animated solution is provided. In the solution, decompilation of Java bytecode is prevented through the use of a class encryption obfuscation implemented by SandMark. Issues regarding the use of this obfuscation technique are discussed in the animated solution.
67
Table 8.5. Jad decompilation of DateTime.class obfuscated by Zelix Klassmaster. Listing of Jad decompilation of DateTime.class (abbreviated): 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 public class a public static void main(String as[]) { (new a()) .a ();
public void a() { boolean flag = c; Date date = new Date(); System.out.println(String.format (a, new Object[] date.toString ()})); if(flag) b = !b;
private static final String a; public static boolean b; public static boolean c; static { "'?X@MA%O\005@@wY\001ZQw\\\016J\024#T\rK\024>N@\013Gy"; -1; goto _L1 _L5: a; break MISSING_BLOCK_LABEL_116; _L1: JVM INSTR swap ; toCharArray(); JVM INSTR dup ;
attempting to implement a custom control flow obfuscation to inhibit static and dynamic analysis as was done in the solution to the Wintel machine code anti-reversing exercise, apply one or more of the control flow obfuscations available in SandMark and observe their impact by decompiling the obfuscated bytecode using Jad. Show that the Java bytecode reversing solution illustrated in the animated tutorial in Section 5.4 can no longer be carried out as demonstrated.
69
SandMaik 3.4.0 (Mystiqi Help f Decompile f Quick Protect f Static Birthmark | Dynamic Birthmark Dynamic Watermark Static Watermark Obfuscate Algorithm: j Class Encrypter Optimize
File
Diff j View
C:PassworcfVaultTrialJavd'obfuscatertPassworifVaiiltjar iC:\PasswordVaultTrialJava\oWuscated\PasswordVaultEncrypted.jar
^ i
Browse
Encryption Key ||
7 ^ Specify an encryption key that will be used to encrypt and decrypt the ['".class] files in the Java archive. SandMark will bundle a new class in the output Java archive which overrides Java's default class loader to support loading the encrypted classes by first decrypting them.
Help
Obfuscate
Class Encrypter encrypts class files and causes them to be decrypted at runtime.
Figure 8.2. Sample slide from the Java anti-reversing animated tutorial.
diagrams that illustrate the components of a system as well as their interactions during execution. The hope is that these diagrams will be literally translated in to program code, with a perfect correlation between the envisioned system and the implementation. For small projects, it might be fairly easy to check for consistency between the envisioned design, and what has been implemented, but this is not likely so for large projects. When reverse engineering is continuously used during software development, the information gained could be used to update the design diagrams at all levels of granularity [2]. The challenge here is for the computer programmer to interpret the information gathered from these reverse engineering tools. This will require the programmer to draw upon a skillset that ranges from low-level system concepts to high-level design. Unfortunately, the future offers little help in undoing the mistakes of the past. The problem of identifying concrete, reusable components within a software system is especially difficult because as [7] states, "engineers do not know how to design and build truly modular systems from scratch, let alone when starting from legacy code." In [7], Weide and Hollingsworth's main thesis is that while reverse engineering of legacy software is inherently intractable, some of us will inevitably find ourselves in a situation where no other option is available because the cost of rewriting a large, complex software system is prohibitive. In addition, should one choose to absorb the cost of rewriting a system from scratch, there are no known software development techniques that can guarantee a newly-designed system that will not need to be reengineered at some point down the road.
71
The question of whether to reengineer or reuse components of a software system most often arises in the context of large business or government organizations. Over time the processes and procedures of a business or organization will inevitably be reflected in the software systems that enable efficient, day-to-day operations [32]. Therefore, it is not possible to change processes and procedures without adjusting or enhancing the software systems that implement them. If good development practices were followed, a legacy software application is typically composed of three layers [32]: > Presentation Layer: components of a software system that accept input and generate output using various types of hardware devices. Input and output can be entered or analyzed by a human or another by another program. > Business Logic Layer: implementation of some subset of the processes and procedures of the business or organization that is relevant to the application. It is unusual for the business logic of one application to implement all of the processes and procedures. For example, the order processing and payroll applications are not likely to have much business logic in common. > Data Access Layer: this layer is responsible for servicing requests to store or retrieve data on behalf of the presentation and business logic layers. The nature of the code in this layer varies depending on the database technology being used. Technology choices range from simple sequential files to industrial-strength relational or hierarchical databases.
72
Fig. 9.1 illustrates the program architecture one would hope was used when when looking to update, reengineer, or reuse a legacy software application.
Frontend Presentation Layer Business Logic Layer Data Access Layer Backend
Figure 9.1. Layers of a well-structured legacy software application. Legacy applications that are not sufficiently componentized, such that their general organization resembles the three layers, are not good candidates for reengineering and reuse. More often than not, most software development projects in business are done under fairly aggressive time constraints, therefore it it not uncommon to find an interleaving of the layersbusiness logic in the presentation logic, and data access logic in the business logic. The most widely accepted technique to reuse legacy application components is that of Wrappering [32], where a new piece of code provides an interface to a legacy application component or layer without requiring code changes to it. This
73
technique is employed even when the complete source code of a legacy application is available for several reasons: (1) the number of lines of code in any one component or layer is extensive and poorly documentedmaking the cost of understanding the code well enough to make changes too high, (2) modifying the legacy application to be reusable without a wrapper would require locating all of the application's dependencies so that it can be recompiled and tested (3) application modernization, where a nontraditional interface to the application such as XML in the case of Web or RESTful services is desired. Creating a wrapper to a legacy machine code application can be quite challenging especially if all of the source code for the application has been lost. Unless enough of an application's source code remains such that it's possible to identify the names of reusable entry points (procedures) and their I/O data structures, attempting to reuse the application is haphazard at best. While it is possible to learn the names of entry points that have been explicitly exported by an application in the case of a DLL, the names don't indicate the layout of the expected I/O data structures. Probably the best way to discover the entry points and I/O data structures in legacy machine code is to read the source code of other applications which depend on it. For example, if a program a calls procedure 9 of program (3 passing an I/O data structure 5, and a produces correct results, there is good reason to believe that procedure 0 in program p can be reused by a third program p using signature 8. The COBOL programming language is most often associated with legacy 74
software applications. Typically, COBOL programs have a single entry point, which makes the process of identifying reusable methods all but unnecessary because, instead of declaring multiple entry points, it is general practice for legacy COBOL programs to include functional discriminators in their I/O data structure(s) that indicate the desired action(s) to be taken by the program on behalf of the caller. For example, a field called "TRANSACTION-TYPE" with the possible values "DEP", "WTH", and "BAL" would serve the same purpose in a COBOL program as the methods "doDeposit(8)", "doWithdrawl(S)", and "getBalance(8)" would serve in a Java program.
BankAccount doDeposit(...) doWithdrawlf...) getBalance(...)
01
BANK-ACCOUNT-INTERFACE. 02 TRANSACTION-TYPE-CODE P I C XXX. 83 DEPOSIT VALUE 'DEP ' . _-' 88 WITHDRAWL VALUE ' W T H 1 . " 88 BALANCE VALUE 'BAL ' . 02 AC COUNT-NUMBER P I C X ( 3 2 ) .
Fig. 9.2 illustrates how a functional discriminator in a legacy COBOL data structure maps to modern programming strategies such as object-oriented design. In a real-world situation, we would be looking to reuse legacy components whose machine code is the result of thousands of lines of high-level language statements (COBOL) that implement a particular business process. Instead of going through the error-prone process of rewriting the legacy component, which is likely decades old, we wish to reuse and reengineer it so that it is easily consumed by modern programs and
75
interfaces. Since our focus is more on reuse and reengineering of legacy code at a basic level, it's not necessary to encumber ourselves with a very large program in order to learn strategies for reuse and reengineering. Therefore, for purposes of demonstration, an example COBOL program SMPLCALC.cbl, which implements a simple calculator for integer-only arithmetic, was written to simulate a potentially useful component found in the business logic layer of a legacy business application. The source code for SMPLCALC.cbl is given Table 9.1; the program has single entry point that operates on the I/O data structure SMPCALC-INTERFACE.
Table 9.1. Sample business logic component to reuse and reengineer.
01
02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
****************************************************************** ** Simple COBOL program that performs integer arithmetic ** ****************************************************************** IDENTIFICATION DIVISION. PROGRAM-ID. 'SMPLCALC. DATA DIVISION. WORKING-STORAGE SECTION. 77 MSG-NUMERIC-OVERFLOW PIC X(25) VALUE 'Numeric overflow occurred'. 77 MSG-SUCCESSFUL PIC X(22) VALUE 'Completed successfully'. LINKAGE SECTION. * Input/Output data structure 01 SMPLCALC-INTERFACE. 02 SI-OPERAND-1 PIC S9(9) COMP-5. 02 SI-OPERAND-2 PIC S9(9) COMP-5. 02 SI-OPERATION PIC X. 88 DO-ADD VALUE '+'. 8 8 DO-SUB VALUE '-'. 88 DO-MUL VALUE '*'. 02 SI-RESULT PIC S9(18) COMP-3. 02 SI-RESULT-MESSAGE PIC X(128). PROCEDURE DIVISION USING BY REFERENCE SMPLCALC-INTERFACE. MAINLINE SECTION. * Perform requested arithmetic INITIALIZE SI-RESULT SI-RESULT-MESSAGE EVALUATE TRUE WHEN DO-ADD
76
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58
COMPUTE SI-RESULT = SI-OPERAND-1 + SI-OPERAND-2 ON SIZE ERROR PERFORM HANDLE-SIZE-ERROR END-COMPUTE WHEN DO-SUB COMPUTE SI-RESULT = SI-OPERAND-1 - SI-OPERAND-2 ON SIZE ERROR PERFORM HANDLE-SIZE-ERROR END-COMPUTE WHEN DO-MUL COMPUTE SI-RESULT = SI-OPERAND-1 * SI-OPERAND-2 ON SIZE ERROR PERFORM HANDLE-SIZE-ERROR END-COMPUTE END-EVALUATE * Successful return MOVE MSG-SUCCESSFUL TO SI-RESULT-MESSAGE MOVE 2 TO RETURN-CODE GOBACK
** Handle numeric overflow and end the program HANDLE-SIZE-ERROR. MOVE MSG-NUMERIC-OVERFLOW TO SI-RESULT-MESSAGE MOVE 16 TO RETURN-CODE GOBACK END PROGRAM 'SMPLCALC'.
**
Looking at the source code for the COBOL program SMPLCALC.cbl, we can easily determine the entry point name and the data layout of the I/O data structure. However, even knowing the full details of the application's interface does not solve the problem of making it easily reusable from Java or C because of the differences in the language data type systems. For example, Packed Decimal (Computational-3) is a numeric type that is commonly found in COBOL mainframe programs, but is not directly supported in the Java and C/C++ languages. Even floating-point numbers can be problematic because some COBOL compilers, including IBM's, do not use the standard
77
IEEE floating point representation; they instead use decimal floating point [44]. Without detailing all the differences between the COBOL, Java, and C/C++ data models, it suffices to say that writing custom code to convert between COBOL's data model and the language we wish to invoke it from is error-prone and tedious and there are better alternatives. The problem of mating disparate data models so that new programs, written in modern languages, can interact with legacy software systems, is far from new. There are many commercial tools on the market that can import a COBOL data structure and generate Java helper classes that a programmer can use to build to meet the legacy binary interface using familiar getters and setters. A great many of these tools, including IBM's Rational Application Developer (RAD) [33], leverage Sun Microsystems J2EE Connector Architecture (JCA) [34] to provide a tightly coupled integration between a Java application running in a J2EE container (server) and an enterprise application (likely written in COBOL or PL/I) running on a mainframe. The JCA architecture requires a good deal of middleware to exist between a calling Java application running in the J2EE container and the COBOL application running on a legacy software system. While this middleware is powerful because of its capability to marshall Java data into COBOL and PL/I data, it cannot easily be reused for a local scenario where no server runtimes are involved. Fig. 9.3 illustrates how the JCA architecture is used by commercial products to enable legacy business applications to be accessed from Java applications running on distributed J2EE application servers.
78
J2EE Server
J Sim DleCpIri ilatnr
inframe
Transaction Server
l >
c
A
J A
^ , SMPLCALC
IT1 T
^ >
JSimpleCalculator: Java application that provides a new front-end to the SMPLCALC COBOL application. SmplCalcInterfaceHelper: helper class for building the interface COBOL data structure, can be generated by a commercial product such as RAD. Java to COBOL Marshaller: class library that implements marshalling of Java data types to/from COBOL data types, likely comes with a commercial J2EE server such as WebSphere Application Server (WAS).
>
Figure 9.3. Example JCA implementation for accessing a legacy application. A popular alternative to using the JCA architecture to reengineer and reuse legacy applications is to implement a Service Oriented Architecture (SOA) [38]. When migrating a legacy software system to an SOA, application programs that are candidates for reuse are identified. Typically, candidate applications should be well structured such that the business logic can be isolated, encapsulated, and made into reusable components. These SOA components become capable of communicating without the tight and fragile coupling of traditional binary interfaces because they are wrappered with a platformneutral interface such as XML and Web services. Once a business or organization has created a collection of reusable components from stable and well tested code, it becomes possible to quickly assemble new applications without having to rewrite and test the 79
underlying business logic. When XML is used as envisioned, all data, both of type character and numeric are represented as printable text, completely divorced from any platform-specific representation or encoding. The net effect is that two entities or programs can interact without having to know the data structures that comprise each other's binary interface. Of course, the XML that is exchanged cannot be arbitrary, so industry standards such as XML Schema (XSD) [39], [40] and Web Services Definition Language (WSDL) [41], [42] were developed. XML Schema is used to formally describe XML documents, while WSDL is used to describe services and the operations they support. Operations in Web services are akin to public methods in the object-oriented programming paradigm. A Web service is considered to be WS-I compliant [43], or generally interoperable, if it meets many criteria, not the least of which is using XML documents for the input and output of each operation. There are many criteria defined by WS-I that apply to a Web service definition, but this particular facet, where XML is the interoperable interface of choice, sets the stage for a meaningful exercise where the focus is on the activity of making a component from a COBOL program that is reusable from Java using XML in a light-weight, local environment. In recent history, the ability to parse and generate XML documents has been added to the COBOL language in many implementations including the Micro Focus and IBM COBOL compilers and runtimes [37], [44]. XML parsing in COBOL is accomplished through the use of the XML PARSE statement, which performs an event80
driven parse of an XML document. In a event-driven parse, the initiator registers a handler which the XML parser invokes with each XML construct found in the document. For example, the start and end of an XML element would be reported as two separate events. XML generation in COBOL is accomplished through the use of the XML GENERATE statement, which, given a COBOL data structure and an output buffer, will generate XML that has the same hierarchical organization as the data structure [37, 44]. By default, the XML GENERATE statement will form XML element and attribute names using the name of each member in the COBOL data structure. This can be less than ideal in circumstances where data structure members have cryptic names that don't conform to the spirit of XML where each XML element and attribute is given a name that describes its content. Fortunately, Micro Focus COBOL provides the capability to assign custom XML element and attribute names to each data structure member, which allows for defining an XML Schema that has meaningful element and attribute names [37]. In the exercise which accompanies this section, we are asked to create a languageneutral XML interface to the "legacy" SMPCALC.cbl application program and invoke it from a Java program which incidentally makes it reusable to other Java programs. To describe an XML interface to the legacy COBOL program so that other programs may consume it, an XML Schema must be created; this can be done with a tool that can generate XML Schema from a COBOL data declaration, or by hand using an XML editor. Once an XML interface has been described using XML Schema, it is necessary to implement XML marshalling layers between the calling Java program and the legacy
81
COBOL program. In the example exercise, the XML marshalling layer for each program is implemented in the target language itself. So that the Java program can generate and consume XML based on the XML Schema that describes the interface to the COBOL program, we employ the Java Architecture for XML Binding (JAXB) [35]. JAXB facilitates the conversion of Java objects to XML and vice versa. Sun's Java JDK includes a command-line utility xjc which generates Java marshalling code from an XML Schemamaking it quite easy to write a Java program which consumes and generates XML based on an XML Schema. While generation of XML is nicely handled in COBOL by the XML GENERATE statement, consuming XML involves coding an event handler for the XML PARSE statement. Of course, complete code for both the Java and COBOL XML marshalling layers is included in the solution to the exercise, so if COBOL is a foreign language to you, there's no need for concern. Once the XML marshalling layers are in place, there's one more loose end that needs to be tied up; and that is to figure out how to pass XML documents between the two layers. Since we are in a local scenario, TCP/IP is not an option, therefore a thin Java Native Interface (JNI) layer is needed through which the Java and COBOL marshalling layers can exchange XML; note that the COBOL XML marshalling layer invokes the legacy COBOL application. Fig. 9.4 illustrates the program architecture for the exercise.
82
Java
ISirnpleCalculator JSimpleCalculator longdoAdd(int, int) longdoSub(int, int) JAXB longdoMul(int, int) XMLi
1
XML ^PARSE JNI ^XML XML GEN. &
88 88 88
SMPLCALC XML
DO-ADD DO-SUB DO-MUL
| COBOL Figure 9.4. Architecture for legacy application reengineering and reuse from Java.
In order to try out the code in this section and complete the exercise that accompanies it, a COBOL compiler and runtime environment are needed. The COBOL programs in this section, and in the solution to the exercise which accompanies it, were written, compiled, and run using a student version of Micro Focus Net Express [37]. At the time of this writing, no reasonably functional open source COBOL compiler was available that could compile, link, and run even the most simple COBOL program given in this section; this may have to do with the fact that COBOL remains a very lucrative enterprise for many businesses, so there is little interest in giving away implementations to the open source community. For example, the COBOL for GCC project has not made significant progress yet on the code generation part of the compiler [36]. When and if an open source COBOL compiler gets off the ground, it will be interesting to see what features of the commercial COBOL compilers are implemented.
83
84
6) Write a small C/C++ JNI program JavalCblXmlBridge.cpp which exports a method "Java2SmplCalc" which: a) Invokes XML2CALC.cbl (see Step 7), passing the XML document received from JSimpleCalculator.java. b) Returns the XML document generated by XML2CALC.cbl (see Step 7) on return from SMPLCALC.cbl to JSimpleCalculator.java. 7) Write a COBOL program XML2CALC.cbl which: a) Marshalls XML received from the Java2CblXmlBridge.cpp, based on the XML Schema created in Step 2, into SMPLCALC-INTERFACE. b) Invokes SMPLCALC.cbl, passing SMPLCALC-INTERFACE by reference. c) Marshalls SMPLCALC-INTERFACE back to XML before returning to Java2CblXmlBridge. cpp. 8) Compile XML2CALC.cbl and link it with the machine/object code for SMPLCALC.cbl (SMPLCALC.obj). a) To simulate a situation where only partial source code for an application is available, do not recompile SMPLCALC.cbl; use the object file (machine code) that comes with this exercise instead. 9) Create a DLL that can be loaded an used by JSimpleCalculator.java by compiling and linking Java2CblXmlBridge.cpp with the object code for XML2CALC.cbl. 10) Update JSimpleCalculator.java to use the XAC-generated marshalling code to send/receive XML through the JNI method defined in Step 8 and display the results of the computations performed downstream by SMPLCALC.cbl.
85
! 1) Locate the interface data structure for SMPLCALC.cbl in the copybook (source include file) SMPLCALC.cpy. There is only one data structure in the copybook. The interface data structure for SMPLCALC.cbl is located in SMPLCALC.cpy and is named SMPLCALC-INTERFACE (see Table 9.2). COBOL data structures begin with a level 01 declaration and are usually hierarchical but can be elementary.
Table 9.2. Interface data structure SMPLCALC-INTERFACE in SMPLCALC.cpy. 01 02 03 04 05 06 07 08 09 10 * Input/Output data structure 01 SMPLCALC-INTERFACE. 02 SI-OPERAND-1 PIC S9(9) COMP-5. 02 SI-OPERAND-2 PIC S9(9) COMP-5. 02 SI-OPERATION PIC X. 88 DO-ADD VALUE '+'. 88 DO-SUB VALUE '-'. 8 8 DO-MUL VALUE '*'. 02 SI-RESULT PIC S9(18) COMP-5. 02 SI-RESULT-MESSAGE PIC X(128).
86
2) Create an XML Schema which represents all of the data in the SMPLCALCINTERFACE COBOL data structure. Instead of writing this by hand, you can use the Micro Focus Net Express CBL2XML wizard [3 7]. The CBL2XML wizard in Micro Focus Net Express conveniently generates an XML Schema from a COBOL data structure. The result of using SMPLCALC.cpy as input to the CBL2XML wizard is given in Table 9.3.
Table 9.3. XML Schema generated from the COBOL data structure. <?xml version="l.0" encoding="UTF-8"?> <schema xmlns="http://www.w3.org/2 0 01/XMLSchema" elementFormDefault="qualified"> <element name="SMPLCALC-INTERFACE"> <complexType> <sequence> <element name="SI-OPERAND-l"> <simpleType> <restriction base="integer"> <totalDigits value="9" /> </restriction> </simpleType> </element> <element name="SI-OPERAND-2"> <simpleType> <restriction base="integer"> <totalDigits value="9" /> </restriction> </simpleType> </element> <element name="SI-OPERATION"> <simpleType> <restriction base="string"> <enumeration value="+" /> <enumeration value="-" /> <enumeration value="*" /> </restriction> </simpleType> </element> <element name="SI-RESULT"> <simpleType> <restriction base="integer"> <totalDigits value="18" /> </restriction> </simpleType> </element>
87
27 28 29 30 31 32 33 34 35 36 37
<element name="SI-RESULT-MESSAGE"> <simpleType> <restriction base="string"> <maxLength value="128" /> </restriction> </simpleType> </element> </sequence> </complexType> </element> </schema>
4) Write a Java class JSimpleCalculator.java that implements the interface defined in ISimpleCalculator.java and provides a user interface for: a) Specifying which computation (add, sub, mul) is desired. b) Specifying the operands to the computation. c) Displaying the result of the computation (can be an error). There is a great deal of flexibility in this part of the exercise. Some examples of the types of user interfaces that can be implemented include: command-line interactive (console-based), graphical, Java servlet (Web-based). A command-line interactive interface was implemented for the solution. A screen capture of the interface is given Fig. 9.5. Notice that a debugging mode is available to trace the various steps in the process of exchanging XML between the Java and COBOL XML marshalling layers.
88
**************************************************** ** Program: Java Front-end to COBOL Calculator ** ** Purpose: Demonstrate reengineering and reuse ** ** of a COBOL program from Java by ** ** establishing an XML bridge leveraging ** ** JAXB, JNI, and COBOL XML support. -* ** Author: Teodoro Cipresso ** ** [email protected] ** **************************************************** Select a task from the following menu: (1) (2) (3) (4) (5) Addition Subtraction Multiplication Toggle Debug ON Quit Program
Specify selection: 3 Specify integer operand #1: 12 Specify integer operand #2: 12 [***] COBOL multiplication result: 144
5) Use the Java command-line utility xjc, in combination with the XML Schema created in Step 2, to generate Java to XML marshalling code (JAXB). Update JSimpleCalculator.java to call this marshalling code. The xjc command-line utility generates two types of artifacts for each global (top level) element in an XML Schema: (1) Java classes that expose getters and setters for the data contained in instances of the XML Schema (XML documents), (2) Java classes that serve as metadata for the JAXB XML marshalling engine. In the solution archive file, the two classes generated by JAXB are: SmplCalcJaxbFactory.java (getters and setters) and SmplCalcJaxbMarshaller.java (JAXB XML marshalling metadata). Note these are not the default class names generated by xjc.
89
To cleanly integrate the JAXB marshalling with JSimpleCalculator.java, SmplCalcJaxbMarshaller.java, which encapsulates the interaction with the JAXB, was created. Table 9.4 gives an abbreviated listing of this class.
Table 9.4. Partial listing of SmplCalcJaxbMarshaller.java interaction with JAXB. private static JAXBContext jaxbContext = null; 02 private static Marshaller marshaller = null; 03 private static Unmarshaller unmarshaller = null; 04 05 static 06 { 07 try 08 { 09 jaxbContext = JAXBContext.newlnstance(SMPLCALCINTERFACE.class) ; 10 marshaller = jaxbContext.createMarshaller(); 11 unmarshaller = jaxbContext.createUnmarshaller(); 12 } catch (JAXBException _je) {...} 13 14 15 public static String serializeXML(SMPLCALCINTERFACE request) 16 { ByteArrayOutputStream xmlBytes = new ByteArrayOutputStream(); 17 try 18 { 19 20 marshaller.marshal(request, xmlBytes); 21 } catch (JAXBException _je) {...} 22 String xmlDoc = new String(xmlBytes.toByteArray()); 23 return xmlDoc; 24 25 26 public static SMPLCALCINTERFACE loadXML(String xmlDoc) 27 { 28 SMPLCALCINTERFACE response = null; 29 ByteArraylnputStream xmlBytes = new ByteArraylnputStream(xmlDoc.getBytes ()); 30 try 31 { 32 response = (SMPLCALCINTERFACE)unmarshaller.unmarshal(xmlBytes); 33 } catch (JAXBException _je) {...} 34 return response; 35
01
Next we need to update the add, subtract, and multiply methods in JSimpleCalculator.java to use SmplCalcJaxbMarshaller.java to generate and consume 90
XML in preparation to use the JNI XML bridge to the legacy COBOL application. Table 9.5 contains an abbreviated listing of the updated to JsimpleCalculator.java. Note that the call to method smplCalcXmllnterface is commented out. This is a call to the JNI XML bridge which will be implemented in a later step.
Table 9.5. Updates to JSimpleCalculator.java in support of JAXB marshalling.
01 public long doAddfint _lstOp, int _2ndOp) 02 { SMPLCALCINTERFACE addResult = invokeXmllnterface("+", IstOp, 03 2ndOp); return addResult.getSIRESULT().longValue(); 04 05 06 07 public SMPLCALCINTERFACE invokeXmllnterface(String calcType, int _lstOp, int _2ndOp) 08: { 09: SMPLCALCINTERFACE inputData = new SmplCalcJaxbFactory(). createSMPLCALCINTERFACE(); 10: inputData.setSIOPERATION(calcType); 11: inputData.setSIOPERANDl(Biglnteger.valueOf(_lstOp) ) ; 12: inputData.setSIOPERAND2(Biglnteger.valueOf(_2ndOp)); 13: inputData.setSIRESULTMESSAGE(""); 14: inputData.setSIRESULT(Biglnteger.valueOf(0)); 15: String inputXml = SmplCalcJaxbMarshaller.serializeXML(inputData);
16:
17:
outputXml
smplCalcXmllnterface(inputXml);
loadXML(outputXml);
18: return outputData; 19: }
6) Write a small C/C++ JNI program Java2CblXmlBridge.cpp which exports a method "Java2SmplCalc" which: a) Invokes XML2CALC.cbl (see Step 7), passing the XML document receivedfrom JSimpleCalculator.java. b) Returns the XML document generated by XMLlCALC.cbl (see Step 7) on return from SMPLCALC.cbl to JSimpleCalculator.java Sun's Java SDK includes the command-line utility javah that generates
appropriate C/C++ header files for a native method declaration in a Java class. The 91
generated header file will contain a function prototype that reflects the fully qualified name and signature of the method. Using the function prototype, it is the responsibility of the programmer to write a C/C++ method that conforms to it and interacts properly with the JVM. Please note that garbage collection does not apply to any memory allocated by the native code, so be sure to free it. To generate the JNI header file, we must first declare a native method in JsimpleCalculator.java that we wish to implement in C/C++. In addition, we must also indicate the name of the DLL Java will need to load in order to call it. Table 9.6 contains the needed additions to JsimpleCalculator.java to declare the native method. Note that on the System.loadLibrary call, the file extension of the DLL file is not specified.
Table 9.6. Example native method declaration for the JNI XML bridge. 01: public class JSimpleCalculator implements ISimpleCalculator 02: { 03: native String smplCalcXmllnterface(String xmldoc); 04: static 05: { 06: System.loadLibrary("Java2CblXmlBridge") ; 07: } 08: ... 09: }
When using the javah command-line utility, keep in mind that it operates on *.class files instead of *.java files; this is because the Java reflection APIs are used to get the qualified name and signature of the native method declaration instead of having to parse the source file. To generate a C/C++ header file from the JSimpleCalculator.class file, issue the command "javah -jni info.reversingproject.jsimplecalculator.JSimpleCalculator. " Table 9.7 gives the source 92
for the JNI program Java2CblXmlBridge.cpp, which implements the JNI method described in the generated header file.
Table 9.7. Example implementation of the Java to COBOL JNI XML bridge. 01: #include "package_JSimpleCalculator.h" 02: #include "cobcall.h" /* * Class : info_reversingproject_jsimplecalculator_JSimpleCalculator * Method: smplCalcXmllnterface * Signature: (Ljava/lang/String;)Ljava/lang/String; */ 03: jstring JNICALL Java_info_reversingproject_j simplecalculator__JSimpleCalculator_smplCalc Xmllnterface (JNIEnv *env, jobject parent_obect, jstring xml_doc) 04 { 05 // Get input XML document passed from Java 06 jboolean iscopy; 07 jstring output_xml; 08 char *xml_buffer = NULL; 09 char *xml_buffer_ptr = NULL; 10 const char *xml_input = (*env)->GetStringUTFChars(env, xml_doc, Siscopy); 11 int xml_len = strlen(xml_input); 12 // Allocate XML I/O buffer and copy input XML 13 xml_buffer = (char*)malloc(32767); 14 memset(xml_buffer, 0x00, 32767); // initialize 15 memcpy(xml_buffer, xml_input, xml_len); 16 // Free JNI memory used for MBCS to SBCS conversion 17 (*env)->ReleaseStringUTFChars(env, xml_doc, &iscopy); 18 // call COBOL to XML marshalling layer, passing XML I/O buffer 19 cobinitO; // Initialize Micro Focus COBOL runtime 20 XML2CALC(&xml_len, xml_buffer); // Call COBOL 21 // Null terminate XML returned from COBOL 22 xml_buffer_ptr = xml_buffer; 23 xml_buffer_ptr += xml_len; 24 *(xml_buffer_ptr) = 0x00; 25 // Allocate UTF version of XML to return to Java 26 output_xml = (*env)->NewStringUTF(env, xml_buffer); 27 // Free XML I/O buffer 28 free(xml_buffer); 29 // Return XML generated by COBOL as Java String 30 return output_xml; 31
93
7) Write a COBOL program XML2CALC.cbl which: a) Marshalls XML received from the Java2CblXm.lBridge.cpp, based on the XML Schema created in Step 2, into SMPLCALC-INTERFACE. b) Invokes SMPLCALC.cbl, passing SMPLCALC-INTERFACE by reference. c) Marshalls SMPLCALC-INTERFACE back into XML document before returning to Java2CblXmlBridge.cpp. Using the recently added XML support in COBOL [37, 44], parsing and generation of XML is fairly straight-forward. Two statements in the COBOL language, XML PARSE and XML GENERATE, are used to implement the program XML2CALC.cbl. Note that the XML GENERATE statement only allows assignment of non-default XML element names to data structure members when reading or writing from an XML file. Since we are working with XML in a stream, the XML Schema defined in the solution to Step 2 uses the default XML element names generated by the Micro Focus Net Express CBL2XML wizard. Table 9.8 gives the source code for XML2CALC.cbl, the XML layer to the legacy COBOL application.
94
Table 9.8. Implementation of a COBOL XML layer to the legacy application. $set preprocess(prexml) o(foo.pp) warn endp
K i t * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
** Wrapper program that provides an XML interface to SMPLCALC IDENTIFICATION DIVISION. PROGRAM-ID. 'XML2 CALC' . DATA DIVISION. WORKING-STORAGE SECTION.
* Input/Output data structure
**
i t * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
01 SMPLCALC-INTERFACE. 02 SI-OPERAND-1 PIC S9(9) COMP-5. 02 SI-OPERAND-2 PIC S9(9) COMP-5. 02 SI-OPERATION PIC X. 8 8 DO-ADD VALUE '+'. 8 8 DO-SUB VALUE 8 8 DO-MUL VALUE '*'. 02 SI-RESULT PIC S9(18) COMP-5. 02 SI-RESULT-MESSAGE PIC X(128).
* XML parsing state
01 XML-PARSE-STATE. 02 CURR-ELE-NAME PIC X(256). 02 CURR-ELE-CONT PIC X(256). LINKAGE SECTION. 01 XML-DOC-LEN PIC S9(9) COMP-5. 01 XML-DOC-TXT PIC X(32767). PROCEDURE DIVISION USING XML-DOC-LEN XML-DOC-TXT. MAINLINE SECTION.
* Parse XML into SMPLCALC-INTERFACE
GOBACK *+ * | XML event handler for marshalling XML into COBOL data * + XML-HANDLER. EVALUATE XML-EVENT WHEN 'START-OF-ELEMENT' MOVE XML-TEXT TO CURR-ELE-NAME WHEN 'CONTENT-CHARACTERS' EVALUATE CURR-ELE-NAME
+
I +
95
50 51 52 53 54 55 56 57 58 59 60 61
WHEN 'SI-OPERAND-1' MOVE FUNCTION NUMVAL(XML-TEXT) TO SI-OPERAND-1 WHEN 'SI-OPERAND-2' MOVE FUNCTION NUMVAL(XML-TEXT) TO SI-OPERAND-2 WHEN 'SI-OPERATION' MOVE XML-TEXT TO SI-OPERATION END-EVALUATE WHEN 'END-OF-ELEMENT' INITIALIZE CURR-ELE-NAME END-EVALUATE END PROGRAM 'XML2CALC.
\ 10) Update JSimpleCalculator.java to use the X/C-generated marshalling code to send/receive XML through the JNI method defined in Step 8 and display the results | of the computations performed downstream by SMPLCALC.cbl.
To begin using the JNI XML bridge, create or uncomment a line in your code that corresponds to the bolded line in Table 9.5. Essentially, code a call to method Java2CblXmlBridge.smplCalcXmlInterface(inputXmlDoc), passing the JAXB generated XML document, to invoke the legacy COBOL application SMPLCALC.cbl through JNI and the XML layers. Table 9.9 lists the results of running the complete solution code for the exercise with debug tracing turned on.
Table 9.9. Example run of the solution code with debug statements turned on. 01 02 03 04 05 06 07 08 09 10 11 12 **************************************************** ** Program: Java Front-end to COBOL Calculator ** ** Purpose: Demonstrate reengineering and reuse ** ** of a COBOL program from Java by ** ** establishing an XML bridge leveraging ** ** JAXB, JNI, and COBOL XML support. ** ** Author: Teodoro Cipresso ** ** [email protected] ** **************************************************** Select a task from the following menu:
96
(1) Addition 13 (2) Subtraction 14 (3) Multiplication 15 (4) Toggle Debug OFF 16 17 (5) Quit Program 18 19 Specify selection: 3 20 21 Specify integer operand #1: 16 22 23 Specify integer operand #2: 32 24 [D] JSimpleCalculator.doMultiply(16, 32) 25 26 [D] JSimpleCalculator.invokeXmllnterface(*, 16, 32) 27 28 [D] SmplCal c JaxbMar sha H e r .serial izeXML () 29 30 31: [D] SmplCalcJaxbMarshaller.serializeXML().xmlDoc[<?xml version="l.0" encoding="UTF-8" standalone="yes" 7 X S M P L C A L C INTERFACEXSI-OPERAND-l>16</SI-OPERAND-lXSI-OPERAND-2>32</SI-OPERAND2><SI-OPERATION>*</SI-OPERATIONXSI-RESULT>0</SI-RESULTXSI-RESULTMESSAGEX/SI-RESULT-MESSAGEX/SMPLCALC-INTERFACED 32: 33: [D] JSimpleCalculator.invokeXmllnterface(): Before call to Java2CblXmlBridge 34: 35: [D] JSimpleCalculator.invokeXmllnterface(): After call to Java2CblXmlBridge 36: 37: [D] SmplCalcJaxbMarshaller.loadXMLO .xmlDoc[<SMPLCALCINTERFACEXSI-OPERAND-l>16</SI-OPERAND-lXSI-OPERAND-2>32</SI-OPERAND2><SI-OPERATION>*</SI-OPERATIONXSI-RESULT>512</SI-RESULTXSI-RESULTMESSAGE>Completed success fully</SI-RESULT-MESSAGEX/SMPLCALCINTERFACE>] 38: 39: [***] COBOL multiplication result: 512
97
To experiment with most of the types of malware listed here is dangerous. Therefore, if one decides to try one's hand at analyzing real-life malware, using the machine code and bytecode reversing techniques demonstrated in this paper, one should do so in a carefully prepared environment. One should not install any malware on a computer that must remain in operating condition. Worms and backdoors can be especially dangerous because they can propagate to other systems on computer networks. Be aware that using virtualization tools such as VMware to create secondary operating system images on which to install malware can still result in the infection of the primary operating system, especially if the VMware-hosted image has connectivity enabled. The goal of this section is to help you become familiar with using software tools to identify, monitor, report, and securely delete software that you suspect to be malicious. Since it's not practical to ask that you install a virus, worm, backdoor, or rabbit on your machine, we are left with the possibility of a guaranteed benign software Trojan. It's important to note here that malware usually isn't of just one type; for example, 3 of the top 10 malicious codes families reported in 2008 were Trojans with a backdoor component [45]. It turns that focusing on software Trojans is appropriate because as Symantec's 2009 Global Internet Security Threat Report [45] states, "Trojans made up 68 percent of the volume of the top 50 malicious code samples reported in 2008", and "Five of the top 10 staged downloaders in 2008 were Trojans." For the vast majority of us, the story of the Trojan horse from antiquity is quite familiar. Essentially, the Greeks, in a 10-year siege against the city of Troy, devised a 99
brilliant plan of putting 40 of their best soldiers into the body of a large wooden horse while the rest of the army sailed away out of sight. The Trojans, assuming that the Greeks had given up, pulled the horse into their city as a trophy of their victory. As night fell over the city of Troy, the Greek army sailed back to shore. Meanwhile, the soldiers in the Trojan horse silenced some guards and opened the gatesallowing the Greek army to flood in and take the city by surprise. So what does all this have to do with software? Not too surprising, a Trojan software program is one that is not entirely what it seems. For example, imagine a program is offered for free on the Internet that claims to be able to convert audio files between different formats. The program fits the needs of many, and is definitely the right price, so it has a large install base. What users of the program are not told is that while the program is performing its advertised functions, it will perform other annoying or malicious tasks in the background such as: scanning the system for sensitive information and uploading it to a rogue site, affecting the stability and performance of the system by doing repeated expensive operations. In 1996, Mark Russinovich founded a company called "Winternals Software" where he was the chief software architect on a comprehensive suite of tools for diagnosing, debugging, and repairing Windows systems and applications [46]. Mark's company has since been purchased by Microsoft and his suite of tools have been rebranded "Windows Sysinteraals" and are offered for free on Microsoft Technet. An example of one of the more powerful tools in the Sysinternals suite is the Process 100
Monitor. The Process Monitor can capture detailed information about any running process in a Windows system including: filesystem, registry, and network activity. Just the Process Monitor alone is helpful in analyzing the behavior of an application when making the determination of whether or not it is malicious. As an aside, Mark's story is an interesting one because he is recognized as a true expert on the internals of Windows even though he did not participate in its developmenta true testament to what can be learned about software through reverse engineering. At the time of this writing, the Sysinteraals suite contained 66 different utilities, but we'll focus on the most useful one in this context of analyzing the behavior of malware: Process Monitor. In the exercise that accompanies this section, it is recommended that you use Process Monitor to complete it. If you have the opportunity to experiment with other tools in the Sysinternals suite, you are encouraged to do so. The following description of Process Monitor is given on the Windows Sysinternals web site [46]: "Process Monitor is an advanced monitoring tool for Windows that shows real-time file system, Registry and process/thread activity. It combines the features of two legacy Sysinternals utilities, Filemon and Regmon, and adds an extensive list of enhancements including rich and non-destructive filtering, comprehensive event properties such session IDs and user names, reliable process information, full thread stacks with integrated symbol support for each operation, simultaneous logging to a file, and much more. Its uniquely powerful features will make Process Monitor a core utility in your system troubleshooting and malware hunting toolkit. " Fig. 10.1 contains a capture of a Process Monitor session where the filesystem activity of the Password Vault application is recorded. When using Process Monitor, you can selectively monitor registry, filesystem, network, and thread activity.
101
File
Edit
Event
Filter
Tools
Options
Help
? H
i ^ m E> i v
PIP 5072 5072 5072 5072 5072 5072 5072 5072 5072 5072 5072 5072 5072 5072 5072
i1
Process Name 1 PasswordVault. exe IPasswordVault.exe 6 3 PasswordVault. exe IPasswordVault.exe jPasswordVault.exe 6 3 PasswordVault. exe JPasswordVault.exe JPasswordVault.exe JPasswordVault.exe JPasswordVault.exe 1 Pass wordVault. exe ] Pass wordVault. exe JPasswordVault.exe JPasswordVault.exe IPasswordVault.exe
|; Operation 0JRP_MJ .CREATE C:\PasswordVaultTrialCpp\user01 ;dat' y*FAST 10. .QUE RY_... C APasswordVauItT rialCpp\user01d a t | EMRP_MJ. .READ CAPasswordVaultTrialCpp\user01 dat p. 0URP_MJ_ CLEANUP CAPasswordVaultTrialCpp\user01 dat|: ''&.. yjjRP_MJ .Q UE RY_... C APasswordVauItT rialCpp m 0URP_MJ CR EAT E C: \PasswordVaultT rialCpp\user01datss*: 0JRP_MJ .CREATE C APasswordVauItT rialCpp 0URP_MJ_ CLEANUP C APasswordVauItT rialCpp 0URP_MJ_ .CLOSE C APasswordVauItT rialCpp &.IRP_MJ_ WRITE C APasswordVauItT rialCpp\user01 dat BURP_MJ .CLEANUP CAPasswordVauItTrialCpp\user01 dat 0URP_MJ CLO SE C APasswordVauItT rialCpp\user01 dat @URP_MJ READ C: /: 0JRP_MJ CLEANUr gklRP_MJ CL0SE
JU
^4
Figure 10.1. Process Monitor session for the Password Vault application. Most of the malicious operations carried out by Trojans can be detected using Process Monitor, including those that contain Backdoors. Of course, Process Monitor itself doesn't identify malware, it simply reports what a process is doing. With a little bit of ingenuity, one can identify activities that don't seem to fit with the advertised functionality of a program. For example, a program that accesses registry keys, files, or network locations that are unrelated to it, is probably malicious. It's common practice these days for users to download free software from the Internet, and because we've been convinced that open-source software, which is sometimes confused with free software, should have the fewest number of vulnerabilities, we do it without much afterthought. Incidentally, the data on the number of vulnerabilities found in popular Internet browsers
102
does not support this belief. [45] reports that "Mozilla browsers were affected by 99 new vulnerabilities in 2008, more than any other browser; there were 47 new vulnerabilities identified in Internet Explorer, 40 in Apple Safari, 35 in Opera, and 11 in Google Chrome." It seems counter-intuitive that an open-source browser would have twice as many security holes than a closed-source browser like Internet Explorer. Mozilla is not malware, but it's interesting to note that in the case of software, open-source doesn't guarantee security. Becoming familiar with the Windows Sysinternals suite can help you evaluate whether the software on your Windows machine is acting in your best interest. If you suspect a particular program to be malware, it can be submitted online to a service called ThreatExpert [47]. ThreatExpert is a Web-based tool that supports submission of software executables that are to be evaluated against an on-line malware database. The tool analyzes the instruction sequences in submitted executables and attempts to match them against those of known malware. Matching against existing malware is just one part of ThreatExpert's automated engine; the service actually tries to execute suspected malware in an isolated environment in order to perform heuristic analysis of its actions. An example of a report generated by ThreatExpert for a particularly dangerous piece of malware is shown in Fig. 10.2. The figure contains only the top-level summary of the report whereas the full report contains much more detail, such as filesystem, memory, registry, network and other activity. Note that all of the malicious behaviors of the submitted executable could have been learned by
103
m ThreatExpert
Submission Summary:
a Submission details: Submission received: 2 May 2009, 1 3 : 5 3 : 2 5 Processing time: 6 min 33 sec S u b m i t t e d sample: File MDS: 0xD5D9730AF3DE7006C9940791E96B20CE File S H A - 1 : Alias: Virus,Win32.Parite.b [Kaspersky Lab] Virus,Win32,Parite [Ikarus] Summary o f t h e findings: What's been found Severity Level
Bssssssms]
OxC4AD816CC3AD6206735E24903DC58729AAB6B388
Filesize: 4 0 6 , 7 7 1 b y t e s
A n e t w o r k - a w a r e worm t h a t uses known exploit(s) in order to replicate across vulnerable n e t w o r k s . M S 0 4 - 0 1 1 : LSASS Overflow exploit - replication across TCP 445 (common for Sasser, Bobax, Kibuv, Kongo, Gaobot, S p y b o t , Randex, o t h e r IRC Bots). : Replication across networks by exploiting weakly r e s t r i c t e d shares ! (common for Randex family of worms). 1 Communication w i t h a remote IRC server. I Downloads/requests other files from I n t e r n e t , I Creates a s t a r t u p registry e n t r y . ; There were some s y s t e m executable files modified, which might 1 indicate t h e presence o f a PE-file infector.
;:i@gg@iSe@i';
filflQQJl
I
| Contains c h a r a c t e r i s t i c s of an identified security risk.
[9@HSS3@SEI9i
Figure 10.2. Example ThreatExpert report summary for submitted malware. monitoring it using Process Monitor, though it would have taken much more time. To facilitate the exercise which accompanies this section, a benign Java software
104
Trojan named "Alarm Clock" was written. The Alarm Clock program is a multithreaded, console-based application that allows you to interact with it while it continually checks whether or not to sound the alarm. Obviously, the Alarm Clock program does a bit more than its advertised function, and the goal of the exercise is to help build familiarity with the Windows Systinternals tool suite through attempting to figure out what the additional actions taken by the program are. Keep in mind that malware will not necessarily accomplish its goals as quickly possible, it may spread out or pace malicious activity in order to use fewer system resourceshelping it stay under the radar of the user. The user interface of the Alarm Clock application is shown in Fig. 10.3.
+ I (1) (2) (3) (4) + |
Alarm Clock VI.0 Display the current date and time. Display the alarm date and time. Set the alarm date and time. Quit.
>> Type an option number and press Enter: 1 [INFO] The current time is (05/02/09 13:49:48). + I + (1) (2) (3) (4) + | +
Alarm Clock VI.0 Display the current date and time. Display the alarm date and time. Set the alarm date and time. Quit.
>> Type an option number and press Enter: 3 >> Specify the alarm date and time...(mm/dd/yy HH:MM:SS). The current date and time is (05/02/09 13:49:53). >> Type the alarm date and time to set ==> 05/03/09 08:00:00 [INFO] Alarm set is successful.
Figure 10.3. Console-based Ul for the Alarm Clock example software Trojan.
105
106
"Documents and Settings" folder. > IP addresses of random Internet/Intranet hosts that respond to an ICMP ping.
Conclusion
Unless something is done to include a required amount of reverse engineering instruction in computer science and software engineering programs of study, new engineers will remain ill-equipped to work with legacy software systems as well as be unable to ensure that software is secure and safe to deploy. Most large companies have existing software systems that have been the underpinning of their business for years. It's highly difficult, not to mention cost-prohibitive, to rip and replace mission-critical software systems in response to the emergence of a new technology. As a result, organizations are always looking for candidates that can help them understand what they have and how it can be evolved to interact with the latest technologies. Students and practicing engineers need reverse engineering skills to be able to help organizations, both large and small, understand their current technology stack and recommend an integration strategy for new technologies. Software security issues, such as how the latest virus or worm infects computer systems, also require extensive reverse engineering knowledge. Since students and engineers need to learn reverse engineering, instructors need to be able to teach it to them. At the present time, even experienced computer science and software engineering instructors may not have enough knowledge of reverse engineering to teach a course on it. Compounding the problem is the fact that materials for teaching a course on reverse engineering may be difficult to find in a format that is compatible with 107
classroom delivery. Several books exist on reverse engineering that cater to industry professionals or those interested in self-study. However, in a university setting, instructors engage students in ordered learning through exercises, quizzes, and exams. Since SRE is not a standard part of the computer science curriculum, instructors will be mostly on their own to create a course that they feel gives an adequate education on the subject. Since the uses of software reverse engineering have been well documented in the literature, it is certainly feasible to provide education on the topic, though coming up with good exericses is challenging. The importance of making this education available was emphasized by El-Ramly at the 28th International Conference on Software Engineering when he stated "Reengineering skills are survival skills for those who have to carry out software renovation and modernization projects" [48]. The integration of reverse engineering techniques as part of learning in traditional computer science courses has been tried at the University of Missouri-Rolla [3]. When students were polled, 77% indicated that applying reverse engineering techniques to their normal programming assignments reinforced concepts taught during lectures [3]. Furthermore, 82% of students wanted reverse engineering to be blended in future courses, especially those that dealt with design [3]. Given these promising trials, universities should continue to work toward establishing standard content for software reverse engineering and software maintenance courses.
108
References
H. A. Miiller, J. H. Jahnke, D. B. Smith, M. Storey, S. R. Tilley, and K. Wong, "Reverse engineering: a roadmap," in Proc. Conf. Future of Software Engineering, Limerick, Ireland, 2000, pp. 47-60. G. Canfora and M. Di Penta, "New Frontiers of Reverse Engineering," in Proc. Future of Software Engineering, Minneapolis, MN, 2007, pp. 326-341. M. R. Ali, "Why teach reverse engineering?" ACM SIGSOFT SEN, v.30, n.4, pp. 1-4, Jul 2005. A. V. Deursen, J. Favre, R. Koschke, and J. Rilling, "Experiences in Teaching Software Evolution and Program Comprehension," in Proc. 11th IEEE Int. Workshop on Program Comprehension, Washington, DC, 2003, pp. 2834-284. E. Eliam, Secrets of Reverse Engineering, Indianapolis, IN: Wiley, 2005. L. Cunningham. (2008, Jul 9). COBOL Reborn [Online]. Available: http://it.toolbox.com/blogs/oracle-guide/cobol-reborn-25896 B. W Weide, W D. Heym, J. E. Hollingsworth, "Reverse engineering of legacy code exposed," in Proc. 17th Int. Conf. Software Engineering, Seattle, Washington, WA, 1995, pp. 327-331. Wikipedia contributors. (2008, Sept 9). Compiler [Online]. Available: http://en.wikipedia.org/w/index.php?title=Compiler&oldid=237244781 B. Gough,^ introduction to GCCfor the GNU Compilers gcc and g++, Bristol, United Kingdom: Network Theory Limited, 2005. K. Irvine, Assembly Language: For Intel-Based Computers, Upper Saddle River, NJ: Prentice Hall, 2007. Boomerang Decompiler Project. (2006), Boomerang: a general, open source, retargetable decompiler of machine code programs (Version 0.3.2) [Online]. Available: http://boomerang.sourceforge.net Backer Street Software. (2007). REC: Reverse Engineering Compiler (Version 2.1) [Online]. Available: http://www.backerstreet.com/rec/rec.htm O. Yuschuk. (2000). OllyDbg: 32-bit assembler level analysing debugger for Microsoft Windows (Version 1.1) [Online]. Available: http://www.ollydbg.de Wikipedia contributors. (2008, Oct 2008). Machine code [Online]. Available: http://en.wikipedia.org/w/index.php?title=Machine_code&oldid=246690032 P, Haggar. (2001, Jul 1). Java bytecode: Understanding bytecode makes you a better programmer [Online]. Available: http://www.ibm.com/developerworks/ibm/library/it-haggar_bytecode
109
[16] P. Kouznetsov. (2001). J ad: J ad is a Java decompiler, i.e. program that reads one or more Java class files and converts them into Java source files which can be compiled again (Version 1.5.8g) [Online]. Available: http://www.kpdus.com/jad.html [17] Wei Dai. (2008). Crypto++ Library, Crypto+ + Library is a free C+ + class library of cryptographic schemes (Version 5.5.2) [Online]. Available: http ://ww w. cryptopp. com [18] G.M.Weinberg, The Psychology of Computer Programming, New York, New York: Dorset House Publishing, 1998. [19] A. Kalinovsky, Covert Java: Techniques for Decompiling, Patching, and Reverse Engineering, Indianapolis, IN: Sam's Publishing, 2004. [20] A. Sinkov, Elementary Cryptanalysis: A Mathematical Approach. Washington, DC: The Mathematical Association of America, 1980. [21] M. Stamp, Information Security: Principles and Practice, Hoboken, NJ: John Wiley & Sons, 2006. [22] Wikipedia contributors. (2009, Feb 9). ROT13 [Online]. Availble: http://en.wikipedia.org/w/index.php?title=ROT13&oldid=269492700 [23] B. Baier. (2006). COBF: the Freeware C/C++ Sourcecode Obfuscator (Version 1.06) [Online]. Available: http://home.arcor.de/bernhard.baier/cobf [24] T.J. McCabe, "A Complexity Measure," IEEE Trans. Softw. Eng, vol. 2, no. 4, pp. 308-320, July 1976. Available: http://www.literateprogramming.com/mccabe.pdf [25] Wikipedia contributors. (2008, Sept 26). Levenshtein distance [Online]. Available: http://en.wikipedia.Org/w/index.php? title=Levenshtein_distance&oldid=273450805 [26] Zelix Pty Ltd. (2009). Zelix Klassmaster: Java Bytecode Obfuscator (Version 5.2) [Online]. Available: http://www.zelix.com/klassmaster/features.html [27] The University of Arizona, Department of Computer Science. (2004). SandMark: A Tool for the Study of Software Protection Algorithms (Version 3.4) [Online]. Available: http://sandmark.cs.arizona.edu [28] Retrologic Systems. (2007). RetroGuardfor Java Obfuscation (Version 2.3.1) [Online]. Available: http://www.retrologic.com/retroguard-main.html [29] E. Lafortune. (2008). ProGuard v4.3: a Free Java bytecode Shrinker, Optimizer, Obfuscator, andPreverifier (Version 4.3) [Online]. Available: http://proguard.sourceforge.net [30] A. G. Shvets. (1999). CafeBabe: Graphical Classfile Disassembler, Editor, Stripper, Migrator, Compactor and Obfuscator (Version 1.2.7.a) [Online]. Available: http://www.geocities.com/CapeCanaveral/Hall/2334/programs.html
110
[31] M. R. Batchelder, "Java Bytecode Obfuscation", M.S. Thesis, Dept. Comp Sci., McGill Univ., Montreal, Canada, 2007. Available: http://digitool.library.mcgill.ca: 1801/webclient/StreamGate? folder_id=0&dvs=1236657408333~988 H. M. Sneed, "Encapsualtion of legacy software: A technique for reusing legacy software components", in Ann. Software Engineering, v.9, n.4, pp.293-313, 2000. IBM, (2008). IBM Rational Application Developer for WebSphere Software (Version 7.5.1) [Online]. Available: http://www01 .ibm.com/software/awdtools/developer/application Sun Microsystems. (2005, May 11). J2EE Connector Architecture [Online]. Available: http://java.sun.com/j2ee/connector Wikipedia contributors. (2009, Mar 24). Java Architecture for XML Binding [Online]. Available: http://en.wikipedia.Org/w/index.php? title=Java_Architecture_for_XML_Binding&oldid=279402856 Free Software Foundation. (2000). COBOL For GCC: a project to produce a free COBOL compiler compliant with the COBOL 85 Standard, integrated into the GNU Compiler Collection (GCC) (Version 0.1.2) [Online]. Available: http://cobolforgcc.sourceforge.net Micro Focus Ltd (2008). Net Express Personal Edition: a complete environment for quickly building and modernizing COBOL enterprise components and business applications (Version 5.1) [Online]. Available: http://www.microfocus.com/Resources/Communities/Academic World Wide Web Consortium contributors. (2004, Feb 11). Web Services Architecture [Online] .Available: http://www.w3.org/TR/2004/NOTE-ws-arch20040211 World Wide Web Consortium contributors. (2004, Oct 28). XML Schema Part 1: Structures (2nd ed.) [Online]. Available: http://www.w3.org/TR/xmlschema-l World Wide Web Consortium contributors. (2004, Oct 28). XML Schema Part 2: Datatypes (2nd ed.) [Online]. Available: http://www.w3.org/TR/xmlschema-2 World Wide Web Consortium contributors. (2004, Jun 26). Web Services Description Language (WSDL) Part 1: Core Language (Version 2.0) [Online]. Available: http://www.w3.org/TR/wsdl20 World Wide Web Consortium contributors. (2004, Jun 26). Web Services Description Language (WSDL) Part 2: Adjuncts (Version 2.0) [Online]. Available: http://www.w3.org/TR/wsdl20-adjuncts Web Services Interoperability Organization. (2007, Oct 24). Basic Profile (Version 1.2) [Online]. Available: http://www.ws-i.org/Profiles/BasicProfilel_2(WGAD).html
111
[44] IBM. (2007). Enterprise COBOL for z/OS: Language Reference V4R1. (1st ed.) [Online]. Available: http://publibfp.boulder.ibm.com/epubs/pdf/igy31r40.pdf [45] Symantec Corp. (2009 Apr). Symantec Global Internet Security Threat Report (1st ed.) [Online]. Volume 14(1). Available: http://eval.symantec.com/mktginfo/enterprise/white_papers/bwhitepaper_internet_security_threat_report_xiv_04-2009.en-us.pdf [46] Microsoft TechNet. (2009, May 7). Windows Sysinternals: utilities to help manage, troubleshoot and diagnose Windows systems and applications. [Online]. Available: http://technet.microsoft.com/en-us/sysinternals/default.aspx [47] ThreatExpert Ltd. (2009) ThreatExpert: ThreatExpert is an advanced automated threat analysis system designed to analyze and report the behavior of computer viruses, worms, trojans, adware, spyware, and other security related risks in a fully automated mode. [Online]. Available: http://www.threatexpert.com [48] M. El-Ramly, "Experience in teaching a software reengineering course," in Proc. 28th Int. Conf on Software Engineering. Shanghai, China, 2006, pp. 699-702.
112