PHD Proposal

Research Proposal
An Effective Semantic Knowledge Representation Using Ontology To

Support Redocumentation Process for Legacy System in Software
Maintenance

SUGUMARAN A/L NALLUSAMY
PC073041

Supervisor : Dr.Suhaimi Ibrahim

School of Graduate Studies
Universiti Teknologi Malaysia
2010
ii

TABLE OF CONTENTS

1 INTRODUCTION .............................................................................. 1
1.1 Background of the problem ................................................................................. 1
1.2 Statement of the Problem .................................................................................... 3
1.3 Objectives ............................................................................................................ 4
1.4 Scope of Study ..................................................................................................... 4
1.5 Significant of Study ............................................................................................. 5
2 CHAPTER 2 LITERATURE REVIEW .......................................... 7
2.1 Introduction ......................................................................................................... 7
2.2 Definition of Software Maintenance ................................................................... 8
2.3 Definition of Legacy Systems ............................................................................. 9
2.4 Definitions of Software Documentation .............................................................. 9
2.5 Definitions of Reverse Engineering .................................................................. 11
2.5.1 GoalofReverseEngineering................................................................................12
2.5.2 Abstraction...........................................................................................................12
2.5.3 LevelsofReverseEngineering..............................................................................13
2.5.4 ReverseEngineeringMethods,TechniquesandTools........................................14
2.6 Definitions of Redocumentation........................................................................ 17
2.6.1 RedocumentationProcess...................................................................................18
2.7 Document Quality ............................................................................................. 23
2.8 Definition of Knowledge Representation .......................................................... 26
iii

2.9 Definition of Reasoning .................................................................................... 27
2.10 Definition of Ontology .................................................................................. 28
2.10.1 GeneralityofOntologies......................................................................................29
2.10.2 MethodologyforconstructingOntologies...........................................................30
2.10.3 TypeofOntology..................................................................................................31
2.10.4 OntologiesLanguages..........................................................................................31
2.10.5 ApplicationofOntologies.....................................................................................33
2.11 Redocumentation Approaches and Tools ...................................................... 34
2.11.1 XMLBasedApproach...........................................................................................34
2.11.2 ModelOrientedRedocumentationApproach.....................................................35
2.11.3 IncrementalRedocumentationApproach...........................................................37
2.11.4 IslandGrammarApproach...................................................................................40
2.11.5 DocLikeModularizedGraphApproach................................................................41
2.12 Redocumentation Tools ................................................................................. 43
2.12.1 Rigi........................................................................................................................43
2.12.2 HaddockToolforHaskellDocumentation...........................................................44
2.12.3 ScribbleTool.........................................................................................................46
2.12.4 UniversalReport..................................................................................................46
2.13 Summary of Redocumentation approaches and tools .................................... 48
3 COMPARATIVE EVALUATION OF THE STATE OF ART
APPROACHES ........................................................................................... 51
3.1 Analysis Criteria ................................................................................................ 51
3.1.1 DocumentQuality................................................................................................51
3.1.2 KnowledgeRepresentationCriteria.....................................................................54
3.2 Discussion ......................................................................................................... 56
3.3 Related Work ..................................................................................................... 58
iv

4 CHAPTER 4 RESEARCH METHODOLOGY .......................... 59
4.1 Research Design and Procedure ........................................................................ 59
4.2 Operational Framework ..................................................................................... 64
4.3 Assumptions and Limitation ............................................................................. 66
4.4 Assumptions and Limitations ............................................................................ 66
4.5 Research Planning and Schedule ....................................................................... 67
4.6 Summary ........................................................................................................... 68
5 CHAPTER 5 PRELIMINARY FINDINGS AND EXPECTED
RESULTS ..................................................................................................... 69
5.1 Preliminary Findings ......................................................................................... 69
5.2 Definition of the Proposed Framework ............................................................. 69
5.2.1 Input.....................................................................................................................71
5.2.2 Knowledgeautomationprocess..........................................................................73
5.2.3 Output..................................................................................................................75
5.3 Components of the Proposed Framework ......................................................... 76

v

LIST OF FIGURES
Figure 2.1 Level of abstraction in a software system[13] ................................................ 14
Figure 2.2 : Sample of reverse engineering methods and tools [31] ................................ 15
Figure 2.3: Reverse Engineering Tools Architecture [1] ................................................. 16
Figure 2.4: Redocumentation Process .............................................................................. 19
Figure 2.5 : Geographic visualization for geographic ontology [54] ............................... 29
Figure 2.6 : RDF Triples in graph model ......................................................................... 32
Figure 2.7: Model Oriented Framework [7] ..................................................................... 36
Figure 2.8: Final Documentation Produced using MIP tool [7] ...................................... 37
Figure 2.9 : Example on using Island Grammar Approach[37] ....................................... 40
Figure 2.10 DocLike Viewer Prototype Tool[6] .............................................................. 42
Figure 2.11 Programming Language Supported by Universal Report ............................. 47
Figure 4.1 : Research Procedure ...................................................................................... 62
Figure 4.2 : Research Flow Chart .................................................................................... 63
Figure 5.1: The Proposed of Redocumentation Process Framework ............................... 71

LIST OF TABLES

Table 2-1: Sample Documentation for Different Level of Abstraction 13
Table 2-2: Key Product Attributes (KPA) and the description for each maturity level 26
Table 2-3 : Summary of Redocumentation Approaches 49
Table 2-4: Summary of Redocumentation Tools 50
Table 3-1: Comparing redocumentation approaches 52
Table 3-2 : Comparing redocumentation tools 53
Table 3-3 : The Evaluation using Knowledge Representation Criteria for
Redocumentation Approaches 55
Table 3-4 : The evaluation using Knowledge Representation Criteria for
Redocumentation Tools 56
Table 4-1 : Operational Framework 64
Table 5-1 : User Interface Description 76

CHAPTER 1

INTRODUCTION

1 INTRODUCTION
In this chapter an introduction to research proposal is provided. First of all, the
background of the problem to be solved is described. After that, the problem statement,
and also objective, scope, and importance of the study are described respectively.

1.1 Background of the problem

Software documentation is an important component in the software development
and software engineering as a general. It is one the best resources to improve the
development and maintain the understanding of the program and oldest practices that
continues till now[2]. However, such documentation suffer from the following
problems: not up-to-date, poor quality, not standardize, lack of interest of the
programmers in the documentation, provides just single perspective and produced in the
format which is not suitable for the maintenance [3]. Most of the legacy software
2

system faces this problem in providing the understanding on software system for
software evolution or software maintenance. The situation get worst if the people
involve maintaining the software system did not participate in the development of the
system. Ignoring the legacy system by creating a new system will be not a good idea in
most of the situation because the current system may have accumulated critical
knowledge which is not presented elsewhere.

Software redocumentation is one the approach used as aiding for program
understanding to support the maintenance and evolution. It helps to extract the artifacts
from the sources via reverse engineering technique and presented in form of
documentation needed. They are some research have been done related to software
redocumentation to improve the understanding of the software system. Most of
solutions, emphasis to link source code and documentation, incremental
redocumentation for program comprehension, flexible browsing the system design and
transform legacy system to UML diagram. There is very few solutions focus for high
level abstraction. In other words, current solutions emphasis on representation switches
such as providing alternate views [4], annotations [5] ,structural documentation[6] and
model oriented documentation [7]. The tools developed for redocumentation are slow
and unable to answer questions in software maintenance such as what is the underlying
principle or purpose for this code? and what is the relation between diagrams to the
requirements. The tools developed also become doubtful in producing the standard
documents. The cause is the content and the styles of the software documentation differ
among programmers which become time consuming for the maintainer to digest and
understand the content of the documents.

The main issues need to be addressed here is the software documentation produced
from redocumentation process need to emphasis on the important of explicit
documenting domain knowledge to improve the program comprehension in software
maintenance and presented in standard documentation. The maintainer needs a better
understanding about the semantic relationships among the component from real world
3

domain point of view especially if the maintainer is the new member in the program
domain. The benefits of generating the document not only realize by the maintainer but
also other personnel such as programmer, software engineer and technical manager to
use as a communication tool.

1.2 Statement of the Problem

This research is intended to deal with the problems related to software
documentation using redocumentation process on legacy system for program
understanding in software maintenance. The main question is How to produce an
effective software documentation using redocumentation model and approach that can
show domain knowledge on legacy system for program understanding to improve the
software maintenance.
The sub questions of the main research question are as follows:
i. What are the current redocumentation approaches and tools?
ii. Why the current software redocumentation approaches and tools are still not able
to show domain knowledge in the software documentation using
redocumentation process?
iii. What is the best way to show the domain knowledge for the documentation
produced using reverse engineering technique?
iv. How to measure the software documentation produced from software
redocumentation process?
v. How to measure the schema representation produce from the redocumentation
process?
vi. How to map the schema representation produced with the standard
documentation?
vii. How to validate the usefulness of software documentation for program
understanding in software maintenance?
4

1.3 Objectives

The research objectives are mentioned based on the problem statement, as follows:

1. To study and investigate the current issues related to redocumentation.
2. To analyse the existing redocumentation approaches.
3. To formulate new redocumentation approach to represent the higher level
abstraction and integrate with standard documentation.
4. To develop prototype tool that supports the proposed approach.
5. To evaluate the proposed approach based on the evaluation criteria.

1.4 Scope of Study

Software redocumentation process used to produce software documentation in
different type of form such as text[8] and graphics[9]. The text form also can be shown
in electric form such as HTML[10] or XML files[11]. Directed graph is one of the
model uses to represent the artifacts in the graphics[12]. The techniques and approaches
used may be different from one to another due to different requirement and objective.
Some may be geared to understanding the software system while others used as aiding to
take decision for system evolution.

In this scope of the research, it needs to explore the redocumentation process to
produce the standard software documentation to describe the context of the software
system or in terms of domain specific concepts. The models and techniques used should
be able to produce documentation from various types of SWP. It needs to capture the
artifacts from the latest version of the software system by establishing the reverse
engineering environment. This will involve the developing tool to extract the artifact
5

from the source code and translated to the component which can present the high level
abstraction of the system in standard documentation.

The new work should able to view the documentation as a result from the reasoning
tool which supports the search and select mechanism to extract needed knowledge. The
software engineers or maintainers personnel should consider these results as an aiding to
understand the program for the maintenance task.

1.5 Significant of Study

Software maintenance identified as a most expensive phase in software lifecycle which
is estimated 40-70% of the costs of entire development of software system[13]. Software
maintenance becomes multi-dimensional problem that creates ongoing challenges for
both research community and tool developers. The challenges exists because the
different representation and interrelationships that exist among software artifact and
knowledge base. In addition, the software documentation which is describing the
artifacts presented in non-formal documentation that have problem for the maintainers
team to have a common understanding on the software system. Redocumentation
process needed to solve the problems which have mention above. There are some of the
redocumentation tools and methods exist to solve the above problem. However, those
approaches only focus on the representation switches and lack of semantic
representation. Thus, there is need for further research in this field from various
perspectives as listed below:
a. There is a need to present the domain component and describe the relevant
concept to establish the semantic relationship among the component.
b. There is a need to integrate the standard document templates and the software
component to have better communication between maintainers on the
software system.
6

c. Maintainer able to find the relevant information within a reasonable time
frame.

The important of having the domain specific tools to support program
comprehension describe by [14]. Therefore, the software redocumentation is important
process to improve the software program understanding in the software maintenance.

CHAPTER 2

LITERATURE REVIEW

2 CHAPTER 2 LITERATURE REVIEW

2.1 Introduction

In this chapter, some background on software maintenance, software
documentation, reverse engineering, redocumentation process and the repository used in
redocumentation process for knowledge presentation are provided. In addition, the
redocumentation approaches and the tools also have been described. The purpose of this
chapter is to describe the basic concepts and present current approaches for
redocumentation.

8

2.2 Definition of Software Maintenance

The definition of software maintenance is the modification of a software product
after delivery, to correct faults, to improve performance or other attributes, or to adapt
the product to a modified environment [15]. According to [2], maintenance define in
more broader scope which is traditionally define as any modification made on a system
after its delivery. Studies show that software maintenance is the predominant activity in
software engineering which consist of 75-80% total cost of the typical software system
[16].The important of the software maintenance painted clearly when most of the
companies enthusiastic to maintain the software system even the cost is high.
To understand software maintenance we need to be clear about what is meant by
'software'. It is a common misconception to believe that software is programs. This can
lead to a misunderstanding of terms that include the word 'software'. For instance, when
thinking about software maintenance activities, there is a temptation to think of activities
carried out exclusively on programs. This is because many software maintainers are
more familiar with, or rather are more exposed to programs than other components of a
software system[13].
McDermid's definition [17] makes clear that software comprises not only
programs - source and object code - but also documentation of any facet of the program,
such as requirements analysis, specification, design, system and user manuals, and the
procedures used to set up and operate the software system.
The documentation help the maintainer to access the information about the whole
system as much as possible. Always the maintainer will not will be author of software
document and by having a good documentation will help the maintainer to understand
the component of the system and other issues that may be relevant for successful
maintenance.

9

2.3 Definition of Legacy Systems

Legacy system can be defined as software we dont know what to do with, but it
is still performing useful job [18]. Most of the legacy system faced problem in
maintaining the updates in the documentation, lack of standardization and openness,
difficult to understand and change. The change in industry model such as e-Business and
globalization, changing business models, integration in information technologies and
emerging new information architecture such as J 2EE, Web Services and reuse[19].
The implication is to introduce new solutions to discard the system and develop
new system. However this solution not applicable in all situations for example the
software built with complex requirements which is not represented elsewhere
(documentation), so discarding the system will lost also the knowledge accumulated in
the system. In addition, the large software system has many users who not document the
features and side effect of the software. It will be tedious work to ask the user to produce
the document again for new system. Therefore, maintaining the interface and the
functionality of the legacy system is very important. To achieve the objective, the
ultimate semantic representation of the system knowledge needed to understand the
legacy system.

2.4 Definitions of Software Documentation

According to Ambler [20], software documentation is abstraction of knowledge
about a system and only as long as document or model (or any other artifact) can
effectively communicate knowledge does it forms part of the projects software
documentation, even if only short lifespan. Common examples of such software
documentation include requirement, specification, architectural, detailed design
documents, business plan, test plans and bugs reports. These documents are geared to
10

individuals involved in the production of that software. Such individuals include
managers, project leaders, developers and customers[21]. Documentation attributes
describe information about a document beyond the content provided within. Example
attributes include the documents writing style, grammar, extent to which it is up to date,
type, format, visibility, etc. Documentation artifacts consist of whole documents, or
elements within a document such as tables, examples, diagrams, etc. An artifact is an
entity that communicates information about the software system.

Documentation playing an important role in aiding program understanding to
support software evolution[22]. Documentation also represents the observation and
arguments for certain quality of programs[23]. There are two aspects to documentation
quality: the process and the product[24]. Documentation process quality focuses on the
manner in which the documentation is produced (for example, whether or not an
organization follows a systematic approach to document development that is closely
aligned with the rest of the software process). Documentation product quality is
concerned with attributes of the final product (for example, whether or not it uses
standardized representations like the Unified Modeling Language (UML) [25] in
graphical documentation). This is important for the software maintainer in order to
understand the system for decision making process. Documentation quality can be
achieved by using the standard documentation to have better communication between
the stakeholders on the related software system. In addition, standard documentation
also helps to reduce the time and effort spends to develop new software system.

There are some of the documentation standards available in the software
development environment such as MIL-STD-498 and IEEE Standard 830-1998
documentation. Even though there are different standard documentation but the content
could be relatively same. MIL-STD-498 have offered almost twenty two types of
software documents to support all the phases in software development such as Software
Development Plan (SDP) for planning phase, Software Requirement Specification (SRS)
11

for requirement phase, Software Design Description (SDD) for design phase and
Software Testing Plan (STP) for software testing phase.

However, among the documents describe above SRS and SDD documents are
more relevant document can be used to understand the software requirement and design.
SRS used to explain the scope of the project, definition of specific terms used,
background information and essential part of the system. SDD used to design
information on the high level and low level designs. In general, the software
documentation should become aid and also a form of communication tool to have a
common understanding between team members on software system [20].

2.5 Definitions of Reverse Engineering

According to [1], reverse engineering is the process of analyzing a subject
system to identify systems component and their inter relationships and create
representations of the system in another form or at a higher level of abstraction.
The first part of the definition, identifying components and relationships is
narrowed to the problem of recovering some design-level views of the system,
representing its structure as a set of boxes and links, which are derived from the code
disregarding some implementation details. Several available reverse engineering
methods do no fit this definition. Consider, for example, slicing or feature location.
None of the two produces an output consisting of components and relationships.
The second part of the definition by Chikofsky and Cross has a wider
applicability and allows covering many more reverse engineering methods. However, it
is not completely satisfactory, in that either it is too vaguewhat are the mentioned
representations of the system in another form?or it falls into the first case, if we
interpret the higher level of abstraction as the design, assuming we are analyzing the
code. Moreover, it misses a few important characteristics of reverse engineering.
12

Namely, the context (i.e., the task in which reverse engineering is conducted), the role of
automation (automated or semi-automated methods are in the scope of reverse
engineering) and the knowledge acquisition process, which is an integral part of reverse
engineering. It does not involve changing the subject system or creating a new system
based on the reverse-engineered subject system. It is a process of examination, not a
process of change or replication.

2.5.1 Goal of Reverse Engineering

The IEEE-1219 [26] standard emphasis that reverse engineering is one of the
important technology for supporting systems that have the source code as the only
reliable representation. The goal of reverse engineering are generate alternate view,
understand the complex program, detect side effects, recovering information lost, and
synthesize higher abstractions, produce alternate views facilitating reuse. There are a lot
of areas covered by the reverse engineering technology. Examples of problem areas
where reverse engineering has been successfully applied include redocumenting
programs [7] [27] and relational databases [28], recovering design patterns [29], building
traceability between code and documentation [30].

2.5.2 Abstraction

According to [13], in reverse engineering, abstraction is achieved by highlighting
the important feature of the subject system and ignoring the irrelevant ones[13].
Abstraction is defined as a model that summarizes the detail of the subject it is
representing [1]. This abstraction level is understood within the context of the software
systems lifecycle which creates representations of the system in textual form or
graphical form. The classic waterfall development lifecycle calls for
requirements/specifications followed by design, code, testing, implement, and maintain.
Based on the development lifecycle, we can simplify the abstraction level to
requirements (specification of the problem being solved), design (specification of the
13

solution) and implementation (coding, testing, delivery of the operational system). Table
2-1 shows the sample document produce for different level of abstraction. However
using reverse engineering method the process will be start with source code, design and
requirement.

Table 2-1: Sample Documentation for Different Level of Abstraction

2.5.3 Levels of Reverse Engineering

There are many subareas of reverse engineering. If it is at the same level as the
original system, the operation is commonly known as redocumentation [13]. If on the
other hand, the resulting product is at a higher level of abstraction, the operation is
known as 'design recovery' or 'specification recovery' (Figure 2.1).Each abstraction
level, reverse engineering technique used to record the abstraction through the
redocumentation process.

Abstraction Level Sample Documentation
Analysis/Specification
Requirement
Software Requirement Specifications (SRS)
Design / Architecture Software Design Description (SDD)
Implementation Source Code Listing or Analyses Report
14

Figure 2.1 Level of abstraction in a software system[13]

2.5.4 Reverse Engineering Methods, Techniques and Tools

This section discusses in detail the reverse engineering methods, techniques and
tools available in the research context. There always confusion between the methods,
techniques and tools in reverse engineering. According to [31] a method or techniqueis
a solution to one or more known problems in reverse engineering, e.g., design recovery.
It might be argued that a method is usually more related to a process which involves
humans in the loop while a technique is usually more focused on the automated part of
such a process. However IEEE 729 (IEEE 1983), do not allow to distinguish between
method and technique. On the other hand, according to [31], a tool is used to
support one or more methods.

15

Figure 2.2 : Sample of reverse engineering methods and tools [31]

Figure 2.2 show the organization of a necessarily non-exhaustive sampling of
contributions available in reverse engineering, according to the separation between
methods and tools described above. Each method is associated with a specific reverse
engineering problem and can be further specialized according to the specific approach
chosen to address the target reverse engineering problem. For example, slicing can be
computed statically or dynamically, can be subjected to amorphous transformations, or
can be determined for inputs satisfying given conditions.

Reverse engineering has been traditionally viewed as a two step process:
information extraction and abstraction. Information extraction analyses the subject
system artifacts to gather raw data, whereas abstraction creates user-oriented documents
and views. For example, information extraction activities consist of extracting Control
Flow Graphs (CFGs), metrics or facts from the source code. Abstraction outputs can be
16

design artifacts, traceability links, or business objects. Accordingly, Chikofsky and
Cross outlined a basic structure for reverse engineering tools (Figure 2.3). The software
product to be reversed is analyzed, and the results of this analysis are stored into an
information base. Such information is then used by view composers to produce alternate
views of the software product, such as metrics, graphics, reports, etc[1]. In the following
subsection each component is described in more detail.

Most reverse engineering tools aim at obtaining abstractions, or different form

Most reverse engineering tools aim at obtaining abstractions, or different forms
of representations, from software system implementations, although this is not a strict
requirement: as a matter of fact, reverse engineering can be performed on any software
artifact: requirement, design, code, test case, manual pages, etc. Reverse engineering
approaches can have two broad objectives: redocumentation and design recovery.
Redocumentation aims at producing/revising alternate views of a given artifact, at the
same level of abstraction, e.g., pretty printing source code or visualizing CFGs. The
detail explanation regarding redocumentation discussed in the following section. As
defined by Biggerstaff [32], design recovery aims at recreating design abstractions from
Figure 2.3: Reverse Engineering Tools Architecture [1]
17

the source code, existing documentation, experts knowledge and any other source of
information.

2.6 Definitions of Redocumentation

Redocumentation is the creation or revision of a semantically equivalent
representation within the same relative abstraction level[1]. The resulting forms of
representation are usually considered alternate views (for example, dataflow, data
structure, and control flow) intended for a human audience.
Redocumentation is the simplest and oldest form of reverse engineering [2]. The
re- prefix implies that the intent is to recover documentation about the subject system
that existed or should have existed.
Some common tools used to perform redocumentation are pretty printers (which
display a code listing in an improved form), flowcharts (which create diagrams directly
from code, reflecting control flow or code structure), and cross-reference listing
generators. A key goal of these tools is to provide easier ways to visualize relationships
among program components so we can recognize and follow paths clearly.
Software redocumentation is part of software reengineering. While reengineering
may involve additional activities like restructuring the code, retargeting, etc.,
redocumentation only recovers the understanding of the software and records it, and
therefore makes future program comprehension easier. Program comprehension is the
most expensive part of software maintenance, and therefore program redocumentation is
the key to software maintainability[5].
The main goals of the redocumentation process are threefold[33]. Firstly, to
create alternative views of the system so as to enhance understanding, for example the
generation of a hierarchical data flow [34] or control flow diagram from source code.
18

Secondly, to improve current documentation. Ideally, such documentation should have
been produced during the development of the system and updated as the system
changed. This, unfortunately, is not usually the case. Thirdly, to generate documentation
for a newly modified program. This is aimed at facilitating future maintenance work on
the system preventive maintenance.
Sometimes, the output of reverse engineering is thought to be the same as
redocumentation. After all, when reverse engineering captures the design information
from the legacy source code, the resulting information usually includes data flow
diagrams, control flow charts, etc. The difference between redocumentation and reverse
engineering is that the redocumentation usually generates system documentation
according to a standard. For example, there are redocumentation tools that create
documentation for DoD-STD-2167A - a DoD software documentation standard. A
special case of redocumentation tools are reformatting tools. Otherwise known as
pretty printers, reformatters make source code indentation, bolding, capitalization, etc.
consistent thus making the source code more readable.

2.6.1 Redocumentation Process

Basically the redocumentation process using the reverse engineering
architecture[1] to produce the documentation. Redocumentation process viewed as a
knowledge rescue process[35] as shown in Figure 2.4. The process can be subdivided to
three phases namely input, automation specialization process and output.

19

i. Software work product (SWP)
SWP can be source code, configuration files, build scripts or auxiliary artifacts.
Auxiliary artifact can be a data gathering, manuals, job control and graphic user
interface which help to understand the source code. The author in [36], explain
the approach for redocumentation process and emphasis on the important of the
software work product to produce documentation for different type of
information. Even in the [19], the author using the source from manual, source
code and graphical user interface(GUI) to reverse engineer and remodel the
system.

ii. Parser
Parser used to extraction necessary information from SWP and store into the
repository or system knowledge base. The important of the parser is to return the
relevant information such as parser used in [37] to produce the documentaion.
Most of the parser used to extract the information from specific language source

Figure 2.4: Redocumentation Process
20

code. As in [38], using the Haskell parser to extract the API or software library to
show in HTML documentation. There are parsers also which can extract the
information different type of languages such as Universal Report. It can extract
from languages [39] such as Basic, C, C++, COBOL, Visual Basic .Net, Visual
J ++and more.

iii. System Knowledge Based
According to [40], knowledge based is a collection of simple fact and
general rules representing some universe of discourse. The purpose of this
component is to store extracted information from the SWP in order to describe
the context of information. This component becomes heart of the system to allow
the tools accessing the required information. In other words, all parts of the
model able to access and organizes the information from the knowledge based.
However it depends on whether the related information extracted by the
knowledge based. There are researches only focus on the knowledge based to
make sure the knowledge retrieved support the most for program understanding.
In [41], Intelligent Program Editor (IPE) applies artificial intelligence to
the task of manipulating and analyzing programs. IPE is a knowledge based tool
to support program maintenance and development. It represents a deep
understanding of program structure and typical tasks performed by programmers.
It explicitly represents in the form of tree or graph of frames. In[42],
Maintainers Assistant, a project from the Center for Software Maintenance at
the University of Durham, is a code analysis tool intended to help programmers
in understanding and restructuring programs. This is knowledge based system
which uses program plans, or clichs, as building blocks from which algorithms
are constructed. To comprehend a program, it is transformed into a composition
of recognized plans.

21

iv. Knowledge processing

Basically the knowledge processing can be defined as a collection,
acquisition, encoding, conversion, sharing, dissemination, application and
innovation[43]. According to Kai Mertis and Peter Heisig[44], knowledge
processes at least have the production, storage, transfer and application of the
knowledge. Based on the definition given, knowledge process can be divided
into four stages which are knowledge generation, knowledge storage, knowledge
transfer and knowledge application[43]. Knowledge processing also can be sub
divided into sub processes, which split each process into different stages. The
main objective of knowledge process used to create the knowledge management
for related activities. The output from the knowledge processing will be the
knowledge representations or knowledge system when the process model is
applied.
In the redocumentation, the knowledge processing process involves an
analysis of the source code so that it can be properly documented and various
relationships implicit within the source code are recovered and revealed[35]. The
processed knowledge represented in different type of form such as data modeling
and procedure or function. The basic of knowledge processing applied in many
reverse engineering tools such as Rigi using the RSF file as a repository and
present the knowledge presentation in procedural. [45].

Generally, most of the redocumentation approaches and tools exist,
emphasis on presenting the knowledge in the lexical basis which is more concern
on extracting structural component rather than the meaning of the component.
This issue will be discussed later in the section related to the knowledge
representation.
22

v. Output
Finally the processed knowledge presented in various form of
documenting to the user ( developer, maintainer , software engineer or end user)
such as directed graph, annotation, visualization, metrics or in documentation.
The artifacts produced including software components such as procedures,
modules, classes, subclasses, interface, dependencies among the component,
control flow, composition. The produced output used for various purposes such
as for designing[29], documenting[11],software evolution[7] or analyzing the
software system[46]. However, the software documentation can be categorized
into Textual and Graphics. Textual documentation ranges from inline prose
written in an informal manner, to personalized views dynamically composed
Textual documentation ranges from inline prose written in an informal manner,
to personalized views dynamically composed from a document database[47]. A
more flexible form of textual documentation is electronic, such as HTML or
XML files, which permit activities such as automated indexing and the creation
of hypertext links between document fragments [12, 39].
The least mature type of graphical documentation is a static image, which
may use non-standard representations of software artifacts and relationships[48].
The most advanced graphical documents are editable by the user, better enabling
them to create customized representations of the subject system. Graphical
documentation relies on a variety of software visualization techniques to make
complicated information easier for the maintainer to understand.

Unfortunately, most documentation is of low quality relative to such
attributes, making its use in program understanding problematic[11] . This is due
in part to the fact that software system documentation is usually generated in an
ad hoc manner. There is little objective guidance for judging documentation
quality or for improving the documentation process. The result is documentation
23

quality that is difficult to predict, challenging to assess, and that usually falls
short of its potential. Therefore, [22] introduce the Document Maturity
Model(DMM) to assess the document quality. The next section will describe the
DMM in detail.

2.7 Document Quality

The Documentation Maturity Model (DMM) is specifically targeted towards
assessing the quality of software system documentation used in aiding program
understanding[22]. Product developers and technical writers produce such
documentation during regular development lifecycles. It can also be recreated after the
fact via reverse engineering, a process of source code analysis and virtual subsystem
synthesis.

Document Quality assessed by number of criteria. The author in [22], specify the
Key Product Attributes(KPA) which is efficiency, format(textual and graphical) and
granularity.

i. Efficiency
Efficiency refers to the level of direct support the documentation provides to the
software engineer engaged in a program understanding task.

ii. Format
Format refers to the type of document produced either in textual or graphic.

Textual
Documentation from inline prose in an informal manner to personalized
view.

24

Graphic
Graphical form of the presentation on the software artifacts and
relationships.

iii. Granularity
Granularity refer to the level of abstraction describe the documentation. Each of
the attributes measured based on 5 levels. Each attribute and the level shown in
the Table 2-2:

Maturity
Level
Key Product Attributes (KPA)
Format
Granularity

Efficiency Text Graphic
Level 1 Explain in low
level
functionality.
No standard
format and
style
Highly depend
on the
experience on
the developer
Static graph as a
hardcopy-like
image can be in
format such as
GIF or PDF and
informal.
Read only
graphic
Documentation
at level of
source code
Comment on
algorithm and
source code
Document
generated
manually and
textual form
and maintain
along with the
system.
Level 2 Standard
documentation
and include also
the developers
Standard
representation
in the graphical
form.
One level
above design
patterns.
Help developer
Semi-automatic
using reverse
engineering.
Static and reflect
25

own format
Using a
standard
template
documentation
Using template
such as UML.
to understand
high level
rational.
the system
changes at the
time of
generation.
Level 3 Hyperlinked
add
indirection to
the text.
Can be text,
graphic or
multimedia
commentary
Animated
graphical
documentation
in visual
manner
User have little
interaction
High level
design
software
architecture.
Able to make
changes based
on
understanding
on system
architecture.
Dynamic and
semi-
automatically
reflect the
changes as long
the developer
direct the tools.
Level 4 Contextual
Documentation
using tools
support.
Enhance the
information on
the context.
Interactive and
permit the user
to navigate
from one node
to next level
node.
Can chase down
the artifacts and
relationships.
Better response
to user
Capture the
system
requirements
from the point
of view of the
user.
Multiple level
of abstraction

Automated and
static but no
need developer
involvement.
26

feedback.
Level 5 Personalized
document for
the reader
Multiple view
of the system
Editable
graphical
documentation.
Able to add new
nodes and can
be saved in the
repository (if
available).
Product line
documentation
Capture the
important
information
concerning the
commonalities
and variability
in the product.
Define the
domain
knowledge
Fully automatic
and completely
dynamic.
Can produce the
necessary
documentation
on demand.
Table 2-2: Key Product Attributes (KPA) and the description for each maturity level

2.8 Definition of Knowledge Representation

Knowledge representation can be defined as a field of study using formal symbol
to represent a collection of propositions believe by some putative agent. The author also
27

emphasis that not necessary the symbol must represent all the propositions believed by
agent. There may very well be an infinite number of propositions believed, only a finite
number of which are ever represented[49]. Knowledge can be represent in the form such
as XML, structured text file, document management system, meta object facility,
production rules , frame based, tagging, semantic network and ontology. The
fundamental goal of knowledge representation is to represent knowledge in a manner as
to facilitate inferencing (i.e. drawing conclusions) from knowledge[50]. The reasoning
tool used to extract the knowledge and present the required knowledge by the user.

2.9 Definition of Reasoning

Reasoning is the formal manipulation of the symbols representing a collection of
believed propositions to produce representations of new ones. It is here that we use the
fact that symbols are more accessible than the propositions they represent: They must be
concrete enough that we can manipulate them (move them around, take them apart, copy
them, string them together) in such a way as to construct representations of new
propositions[49]. In [51], reasoning is an ideal reasoning system would produce all-and-
only the correct answers to every possible query, produce answers that are as specific as
possible, be expressive enough to permit any possible fact to be stored and any possible
query to be asked, and be (time) efficient.

As example, in medical field the diagnostic system used as general fact about
disease and also the about the patient, to determine the patient might have and the
treatment can be provided to them. If transform this example in reasoning, the expert at
compile time provide general information about the medicine. At run-time a user
specified the situation and posed the questions. The reasoning will produce the
appropriate answer based on the knowledge based which include specifics of this
problem and the general background knowledge. These output not stored in knowledge
28

bases but compute whenever needed. In general reasoning can be identified as symbolic
knowledge that used to pose queries and receive answer[51].

2.10 Definition of Ontology

Originally, Tom Gruber in 1993 defines ontology as a formal explicit of a shared
conceptualization of a domain of interest. However, there are many other definition of
ontology has been introduced by ontology researchers like Fensel, Hepp [52], Guarino &
Giaretta and others.
Information which is not structured, understandable only by human not by the
computer. As example, there is no way to tell the computer this article about train unless
it contains word train explicitly. To give some kind of intelligent, it must know the
meaning of the document which is define as semantic. However only the structure not
enough because it doesnt reflect the relations. It should also reflect the real world or its
part which is called as domain model. Domain model can be achieved through
conceptualization which simplified view of the world. An ontology can represent this
domain effectively. According to [53], ontology are able to present the knowledge
semantically the description of entities and their properties, relationships, and
constraints.
Ontologies are able to interweave human and computer understanding of
symbols. There are three main contribution of ontology in knowledge representation.
Firstly ontology allows communication between human, application system and human
and application system. Secondly, it provides computational inference to represent,
manipulate, analyze and implement the knowledge. Finally, it facilitates reuse and
organization of the knowledge.

Ontology can be appears in graphical or formal visualization and machine
processable serialization form. The earlier form is used for visualization purposes and
29

the latter is used for storage and processing using ontology language. For an example, in
tourism industry, ontology can used to represent different domains of interest like
travelling infrastructures (rental car, train, flight), geographic knowledge (location,
destination, distance) and financial knowledge (currency, price, method of payment).
Figure 2.6 illustrates graphical visualization for one of the domain interest; the
geographic ontology.

Figure 2.5 : Geographic visualization for geographic ontology [54]

2.10.1 Generality of Ontologies

According to the literature in [55], generally there are three common layers of
knowledge. On the basis of their levels of generality, these three layers correspond to
three different types of ontologies, namely:

i. Generic (or top-level) ontologies, which capture general, domain independent
knowledge (e.g. space and time). Examples are WordNet [56] and CYC [57].

30

Generic ontologies are shared by large numbers of people across different
domains.

ii. Domain ontologies, which capture the knowledge in a specific domain. An
example is UNSPSC
3
, which is a product classification scheme for vendors.
Domain ontologies are shared by the stakeholder in a domain.

iii. Application ontologies, which capture the knowledge necessary for a specific
application. An example could be an ontology representing the structure of a
particular Web site. Arguably, application ontologies are not really ontologies,
because they are not really shared.

2.10.2 Methodology for constructing Ontologies

Ontology construction is really time consuming and complicated task. To guide the
process, so far there are no standard methods to be followed. However, there are many
approaches have been proposed and used. In particular, the author in [58], using the
following steps: definition of the ontology purpose, conceptualization ,validation[59],
and finally coding. The conceptualization considered is the longest step and requires the
definition of the scope of the ontology, definition of its concepts, and description of each
one (through a glossary, specification of attributes, domain values, and constraints)
which represents the knowledge modeling itself. In the former, a methodology using
skeletal has been proposed[60]. It identifies five steps: (1) identify purpose and scope,
(2) build the ontology, (3) evaluation, (4) documentation, and (5) guideline for each
phases. Step (2) is divided further into ontology capture, coding and integration of
existing ontologies.

31

2.10.3 Type of Ontology

Type of ontology can be categories in different type of category. Ontology can
be categorizing in term generality which model the generic form for knowledge
representation across various domains such as in [58]. The knowledge representation
involve the application domain,system, skill of the maintainer and organization
structure. There also task ontologies[61] that capture the knowledge task-related
knowledge of the domain that the task is defined. However, there also method
ontologies[62] which provide definitions of the relevant concepts and relations used to
specify the reasoning process to understand the particular task.
There also ontology categorize based on the knowledge representation such as
frame ontology[63] which contain number of properties and these properties inherited by
subclasses and instances.Most of the ontologies explained above, however can be placed
under domain ontology. These are designed to a specific domain and applications
defined within that domain.There also another type of ontology which is linguistic
ontologies such as WordNet [64]. These usually have collection of term which led to
another classification. These type classifications are called terminological ontologies.

2.10.4 Ontologies Languages

Ontology languages have started used since 1980s with knowledge
representations systems such as KL-ONE [65] and CLASSIC[66]. However, only
beginning 1990s, the system Ontolingua [67] which used for the development,
management exchange of ontologies. It used internal Knowledge Interchange
Format(KIF) but it able to work with other ontology languages such KL-ONE and
CLASSIC. Later, ontologies began to used in World Wide Web to annotate Web pages
using formal ontologies embedded in HTML documents. After that, the Ontobroker used
32

ontologies to formulate queries and derive answers. Later, Ontobroker used as a query
language to retrieve the document based on notation. These languages become very
important reference for current languages on the Semantic Web such as Resource
Description Framework (RDF) and Web Ontology Language (OWL). RDF is the first
language formed to represent information about resources in WWW.
RDF is based on two main functionalities of identifying resources using URIs
and describing resources in term of simple properties and property values [68]. RDF
represents data using subject-predicate-object triples which also called as RDF triples or
statements [68, 69]. The subject represents the resource, the predicate represents
property of this resource while the object indicate value of this property [54]. Each of
these subject, predicate and object is identified using URIs. RDF also incorporates graph
structure to simplify the modeling of data representation. Figure 2.7 below illustrates
RDF triples in graph model.

Figure 2.6 : RDF Triples in graph model

Using this triples, able to capture knowledge and metadata that available on the
web. However, RDF only provides simple description about the resource using the
properties and values [70]. There was a need for scheme which defines the vocabularies
for the resources. The need for more expressive description has invokes creation of
Resource Description Framework Scheme (RDFS). According to [54], RDFS is an
extension to RDF which facilitates the formulation of vocabularies for RDF meta data.
RDFS has introduces some basic (frame-based) ontological modeling primitives like
classes, properties and instances and their hierarchies. The combination of RDF and
RDFS is also referred as RDF(S). There are many tools have been introduced to support
Predicate
(Property of Resource)
Subject
(Resource)
Object (value
of property)
33

visual editing of RDF(S) descriptions such as Protg, WebODE, OntoEdit and KAON
OI-Modeller. In a technical whitepaper, Oracle has explained that Oracle Spatial 10g
supports RDFS description in database solution
1
. However RDF(S), is still consider as a
simple ontological language with basic inference capabilities. There is a need for more
vocabularies for describing properties to achieve interoperation between numerous
schemas which extends the Semantic Web layer to Web Ontology Language (OWL).
OWL has been declared as another Semantic Web Language by W3C in 2004.
OWL is more expressive language than RDF(S) by providing ontology layer to existing
RDF(S) [69]. It adds more vocabulary for describing properties and classes; for example
relationship between classes, cardinality, equality and many other richer characteristic of
properties. OWL is available in three varieties namely; OWL-Lite, OWL-DL and OWL-
Full. Each of these varieties reflects different degree of expressiveness.
Recently, Semantic Web Modeling Language (WSML) has been added as an
ontology language for web. WSML focuses on Semantic Web Service and covers major
aspects of different knowledge representation formalism [54]. A recent survey on
Semantic Web in [71] reveals that, most popular ontology editors for Semantic Web are
Protg, SWOOP, OntoStudio and Altova Semantic Works. Based on the same resource,
many of the ontologists prefer OWL compare to RDF(S).

2.10.5 Application of Ontologies

Ontology is increasingly becoming an essential tool for solving problems in many
research areas. In Semantic Web[55], the ontology used as to captured the knowledge on
global scale. Ontology provide an explicit representation of the semantics underlying
data, programs, pages, and other Web resources will enable a knowledge-based Web that
provides a qualitatively new level of service and overcome many of the problems in
knowledge sharing and reuse and in information integration. In database[72], ontology

1
http://www.oracle.com/technology/tech/semantic_technologies/pdf/semantic_tech_rdf_wp.pdf
34

repositories used to store and maintain the ontologies. The author believe that for agent
and ontology technology to gain widespread acceptance, it should be possible for agents
to directly access ontologies using a well-known protocol, and the information schema
underlying an ontology repository should be easily accessible in a declarative format. In
[73] , the author discussing the impact of ontology on information system which can be
divided to temporal dimension and structural dimension. Other area such as electronic
commerce, knowledge management, information retrieval, bioinformatics, software
engineering[74], intelligent system, ontology-based brokering and software
maintenance[58] also using ontology extensively.

2.11 Redocumentation Approaches and Tools
This section enumerates some of the reverse engineering approaches and tools available
in the literature.

2.11.1 XML Based Approach

XML based approach is one of the common redocumentation approaches used to
generate the documentation. It contains structured information that extracts the content
and the meaning of the documentation. XML reassemble from HTML to make it more
useful for program documentation. By using XML the technical writer or software
engineer can define their own format, such as <TASK>,<FILE>,<FUNCTION>,
<VARIABLE>and <CONSTRANT>[11]. This feature helps to identify the implicit
semantic to the document. The nature of XML shows the information in hierarchical
help to understand the program more easily. It also validates the data captured from the
program to make sure the data can be exchange between different software systems.
One of the researcher [11], used the XML based to redocument the program. In the
research, every level (inputs, automations specialized process, design models and
maintenance histories[36]) in the redocumentation framework required minor techniques
35

and integrated together to produce high quality documents[11] . In the first level, SWP
captured the data from source code and blended with other resources (manual,
programmer, and software documents) to have more data sources. Following that,
commercial or specific parser used to extract the structure in the extraction process. In
the level 2, captured structure or data from various SWP merge into one repository to
facilitate for knowledge processing. One of the important activities in the level 2 is to
uncovering important information hidden in the gathered data[11]. Finally, in the level 3,
generated documentation can be viewed in both textual and graphical representation.
The output produced can be used back in the next iteration of data gathering phase to
refine the information contained in the repository.

2.11.2 Model Oriented Redocumentation Approach

In [7], Model Oriented Redocumentation approach used to produce models from
existing systems and generate the documentation based on the models. The basic
concept used in this approach from the Model Driven Engineering (MDE). The main
objective of the MDE is to raise the level of abstraction in program specification and
increase the automation in program development. The MDE concept suited the
redocumentation process, specifically to produce higher level of abstraction in the final
documentation.
Basically, the MDE concept merged with the Model Driven Architecture (MDA)
in general and fastened with the Technological Spaces(TS) [75]. Figure 2.8 show the
model oriented framework for redocumentation within MDE context. Initially legacy
system is transformed into a stack of formal models. These formal models are written in
a formal language and can be transformed into other TSs. Model information in different
TSs is stored in Documentation Repository with a well defined meaning and can be used
to produce the documentation in a uniform way. To support the framework, the tool
36

called Maintainer Integration Platform(MIP) developed and supported by Wide
Spectrum Language (WSL) to present high and low level abstraction[7].

Figure 2.7: Model Oriented Framework [7]

The MIP tool used to collects information from different sources and uses the
Transformation Engine to translate the assembler files into WSL files and
transform/abstract into different models/views. The M-DOC generates the final
documentation, which is browser based and easy navigation. The final documentation
produced using different model and slice codes shown in Figure 2.9.

37

Figure 2.8: Final Documentation Produced using MIP tool [7]

2.11.3 Incremental Redocumentation Approach

One of the common issues in maintaining the system is to record the changes
request by customer or user occur in the source code. Because change requests are
unevenly distributed, teams having code ownership will overwork some programmers
while leaving others underutilized. In a small team, maintenance management will
suffer, wasting resources and draining profits. Incremental Redocumentation approach

38

used to rebuild the documentation incrementally once changes done by the programmer.
Because change requests are unevenly distributed, teams having code ownership will
overwork some programmers while leaving others underutilized. In a small team,
maintenance management will suffer, wasting resources and draining profits[8].

Steps involved in the change process are listed below:
i. Request for change
ii. Understanding the current system
iii. Localization of the change
iv. Implementation of the change
v. Ripple effects of the change
vi. Verification
vii. Redocumentation any changes recorded in the PAS tools

The change request is the first step in the change process. It is in plain English and
usually originates with the customers, passing to the project team through the
applications owners and distributors. During the planning phase, the team identifies the
parts of the software affected by the change and assigns a particular programmer to
implement the change. The programmer then implements the change and verifies its
correctness. Finally, during the redocumentation phase, the program comprehension
gained during the change is recorded in the appropriate partitions of PAS.

The partitioned annotations of software (PAS) serve as a notebook of the
maintenance programmer, in which the programmer can record all of his understanding,
whether it was arrived at in a top-down or bottom-up fashion, whether it is complete or
partial, or even whether it is a confirmed or tentative[5]. Since the PAS is in hypertext in
the style of World Wide Web, there is no need to limit the number of partitions or their
contents, and a partition in which it can be recorded can always be found. From the
global point of view, the PAS annotations are a matrix, where one coordinates is the
constructs of the code, and the other coordinate is the selected partitions. There is a wide
39

variety of partitions which a programmer may want to use, with various levels of
abstraction and various kinds of information. Hence, each entry in the matrix is the
annotation for a specific construct, containing information of a specific kind. Among the
partitions, domain partitions play a special role. Each program operates in a certain
domain of application, and each application domain has ontology, consisting of essences
and concepts of that domain. They have to be supported by the program, and they are
important for the understanding the program.

For example, if the application domain is that of library, then the concepts
supported are books, loans, customers, etc. These concepts in turn have to be supported
by specific constructs in the code. A domain partition makes these constructs
comprehensible. It is written in terms of the domain concepts only, and is
understandable to both a programmer and to a user of the program. PAS tools can be
divided to three separate categories: PAS browsers, PAS editors, and PAS generators.
PAS browser focus on the browsing the document generated through PAS
documentation. PAS editor support editing and updating PAS documentation. And the
last component is the PAS generator used to parsing the existing source code and
prepares the skeleton files for PAS documentation. One of the example tools used by
PAS is HMS.

HMS is a PAS generator for programs written in object oriented language C++. Its
input is a directory of C++source files and header files. The tool parses all files in the
directory and creates a skeleton documentation file for each class. Skeleton
documentation files are written in html and contain all information extractable from the
code. They also contain additional empty spaces for any documentation the programmer
may want to add. For each partition, there is section of the file with a header and a space
for the documentation comments.

40

2.11.4 Island Grammar Approach

An island grammar consists of
i. detailed productions for the language constructs we are specifically interested in
(island)[76]
ii. liberal productions catching all remaining constructs(water)[76]; and
iii. a minimal set of general definitions covering the overall structure of a program.
In simple terms, the full grammar of parts interested in, is the island. These surrounded
by constructs which is not interested in, the water. These approach used in the
redocumentation for extracting facts from a systems source code[37]. The simple
example how the extraction shown in Figure 2.10.

Figure 2.9 : Example on using Island Grammar Approach[37]

Grammar definition language SDF [76] used as a parser to define the island
grammar. Basically it will return parse tree in J ava object which encoded in aterm
format[77]. The analysis results that are of interest can be written to a repository, and
from there can be combined, queried and used in the rest of the documentation

41

generation process. The way of analyzing source code used common redocumentation
process, in the sense that there is a chain of analysis, filter, and presentation events. In
island grammar approach, filtering the data started during the first (analysis) phase,
because the approach deal with those language constructs defined in the island grammar.
The output can be abstracted in different layer depends on the documentation
requirement. The author in [37], using Cobol system to generate hierarchy associated
with documentation requirements. On the other hand, author in [78], explain in detail the
supporting tool for island grammar approach called Mangrove.

2.11.5 DocLike Modularized Graph Approach

According to [79], DocLike Modularized Graph(DMG) approach presents the
software architectures of a reverse engineered subject system graphically in a
modularized and standardized document like manner and to visualize the software
abstraction. There are many other approaches using structural documentation[46] and
assume current views and textual descriptions can be parts of structural re-
documentation without the need to be transformed into word processor. But according to
[79], DMG provides document-like software visualization and re-documentation impart
the template of software design documentation that enables users to directly update the
description of each section in Description Panel and its associated graph that is
generated automatically as shown in Figure 2.10. The Content Panel focus on the
representation of modules and associated graph is displayed accordingly in Graph Panel.
Description Panel used to describe the associated section manually.

42

Figure 2.10 DocLike Viewer Prototype Tool[6]

The DocLike Viewer Prototype Tool system can be expressed using the
redocumentation framework as show in Figure 2.10. As an input, they used C source
code and using the existing parser provided by Rigi. In the later section the author going
to discuss on this tool in details. DocLike Viewer using the existing storage provided by
Rigi and filters the data by selecting only required information to be visualized in
DocLike Viewer.

43

2.12 Redocumentation Tools

In this section we briefly describe some of the redocumentation tools that are designed
to assist the maintainer task.

2.12.1 Rigi

Rigi was designed to address three of the most difficult problems in the area of
programming-in-the-large: the mastery of the structural complexity of large software
systems, the effective presentation of development information, and the definition of
procedures for checking and maintaining the completeness, consistency, and traceability
of system descriptions. Thus, the major objective of Rigi is to effectively represent and
manipulate the building blocks of a software system and their myriad dependencies,
thereby aiding the development phases of the project[80]. Rigi is one of the tools
developed for source code abstraction in text file and not really for redocumentation.
But it is become fundamental tool for most of the researcher to used Rigi architecture to
generate documentation.

Rigi using a reverse engineering approach that models the system by extracting
artifacts from the information space, organizing them into higher level abstractions, and
presenting the model graphically[48]. There are three types of methodology used in
Rigi, which are rigireverse, rigiserver and rigiedit[46]. Rigireverse is a parsing system
that supports common imperative programming languages (C and Cobol) and a parse for
LaTex, to analyze documentation[46]. Rigiserver is a repository to store the information
extracted from the source code and rigiedit is an interactive, window-oriented graph
editor to manipulate program representations.

44

In Rigi, the first phase involves parsing the source code and storing the extracted
artifacts in the repository in Rigi Standard Format (RSF) file. There are two types of
Rigi Standard Format (RSF) files. First is an unstructured Rigi Standard Format (RSF)
file which may contain duplicate tuples. The other is structured Rigi Standard Format
(RSF) files used for displaying graphical architecture. Files contain information about
the nodes (e.g., functions, variables, data structures, etc.) and arcs (e.g., function to
function calls, or function to the variable calls). There is a tool called sortrsf which
converts unstructured file into structured file.
Once a C programming input file is being parse by the Rigi parser, a RSF file is
produce and it is send to Rigi editor for further processing. Rigi editor is a graph editor
and its user interface is based on windows, menus, color and mouse pointer. It is
programmable using a scripting language called Tcl [15]. Rigi editor used for
architecture display, traverse, and modify the graphical model. This produces a flat
resource-flow graph of the software representing the function calls and data accesses.
The second phase involves cluster functions into subsystems according to rules and
principles of software modularity to generate multiple views called Simple Hierarchical
MultiPerspective view[9], layered hierarchies for higher level abstractions. The Rigi
editor assists the designers, programmers, integrators, and maintainers in defining,
manipulating, exploring, and understanding, the structure of large, integrated, evolving
software systems[80]. Rigi also able to import the schema from Columbus tool [47]
and view the representation using rigi graph visualizer. This features support the
Columbus tool to view graph based on the Columbus Schema, a call-graph and UML
class diagram like graph.

2.12.2 Haddock Tool for Haskell Documentation

Haddock Tool is a tool for generating documentation from the source code
automatically. Haddock primarily focus on generating the library documentation from
45

the Haskell source code. According to [38], the major reason for generating library
documentation listed below:
i. Having interpreted the API from the source code, a documentation tool can
automatically cross-reference the documentation it produces.
ii. The code already has documentation in the form of comments and can be used in
document generation.
iii. There are minimum possibilities of non synchronization between the source code
and the documentation.
Currently the only fully supported output format is HTML, although there is a partial
implementation of a DocBook (SGML) back-end. The knowledge capture in HTML
consists of:
i. A root document, which lists all the modules in the documentation on (this may
be a subset of the modules actually processed, as some of the modules may be
hidden; see Section 5.3). If a hierarchical module structure is being used, then
indentation is used to show the module structure.
ii. An HTML page for each module, giving the definitions of each of the entities
exported by that module.
iii. A full index for the set of modules, which links each type, class, and function
name to each of the modules that exports it.
Haddock using general Haskell parser distributed by GHC and extend the parser by
adding abstract syntax to include documentation annotations. Persistent schemes or
interface used to store the libraries before extracted to the documentation in HTML
form. There are few other tools can be categorized under API redocumentation are
J avadoc [81], Doxygen[82] and Caml System [83]. J avadoc and Doxygen have been
using in large commercial system and the features of the system improved time to time
to full fill the current needs.

46

2.12.3 Scribble Tool

Scribble focus on generating library documentation, user guides and tutorials for
PLT scheme. It combines all of these threads producing a scribble language or tool that
spans and integrates document categories. Scribble was built using PLT Scheme
technology, which is an innovative programming language that builds on a rich
academic and practical tradition[84]. It is suitable for implementation tasks ranging from
scripting to application development, including GUIs, web services and supports the
creation of new programming languages through a rich, expressive syntax system. The
features in PLT schemes help to develop Scribble system more easily and Scribble just
an extension of the PLT schema. So, the main input and the parser in the documentation
process is the PLT Scheme itself. Central PLanet package repository used to store the
libraries. The final output produced in HTML form which consists of libraries with the
guides and tutorials. In fundamentals, the basic concept is builds on Scheme to construct
representations of documents using Scheme functions and macros, and it uses an
extension of Scheme syntax to make it more suitable for working with literal text.

2.12.4 Universal Report

Universal Report is a tool used to analysis the source code and
documents the software system. The main objective of this tool is to
analyse and generate the structured and well formatted document of a
given program. The biggest advantage of this tool compare to other
tools is it can generate documentation from various type of languages
such as C++. Visual Basic, Ada, Cobol, Fortran, Java, Assembler,Perl,
PHP, Python and many others (Figure 2.11).

47

Figure 2.11 Programming Language Supported by Universal Report

It using pattern matching algorithm and compilation techniques to extract the
information from the source code and generate the documentation in HTML, Latex and
plain text files[39]. The HTML output have a lot of features includes search the script
for text over the entire documentation, an online commenting and annotating system, a
dynamic flowchart, routine call graph, screenshots from form files , detailed analysis
and dynamic compositition of each routine. In addition, the Universal Report also can
read the database files and generate the detailed report of the structure and elements such
as table, fields and reports. However, the featured in Universal Report tool
emphasis redocumentation of source code in the implementation level
only. It doesnt focus on higher abstraction level such as design or
specification level.
48

2.13 Summary of Redocumentation approaches and tools
The Table 2-3 and Table 2-4 summarize the approaches and tools based on the
redocumentation approaches.

Approaches Input Parser Knowledge
Representation
Schema
Output
XML Based
Approach
Source Code,
Existing
documents,
Users
Commercial
Parsing
system
XML Textual and graphical
Documentation
(Cross reference, system
overview) for different type
of stakeholder.
Model Oriented
Approach
Source Code WSL (Wide
Spectrum
Language)
Meta Object Facility
(MOF)
Tree like view
Documentation
(Call Graph, Flow chart,
Class Diagram, Activity
Diagram)
Incremental
Redocumentation
Approach
c++source
files and
header
HMS Parser Database Document consist of
partition of skeleton
49

DocLike
Modularized Graph
C source code rigi(Parser), Library of Procedures Structural Documentation
Updateable Document,
Visualize through directed
graph
Island grammar
approach
Source
Code,Hand-
written
documentation
SDF (Syntax
Definition
Formalism)

Graphs representation
HTML environment using
structural documentation
Table 2-3 : Summary of Redocumentation Approaches

Tools Input Parser Knowledge
Representation
Output
Rigi C/C++and
cobol
rigi parser,rigi
editor
Directed and undirected
graph layout (nested
graph)
Structure Text File - RSF
files(rigi standard format)
Columbus C++ Extractor_CPP.d
ll(CAN.exe and
CANPP.exe)
using Columbus
parser
Columbus
schema(Abstract Syntax
Graph)
HTML,CPPML, GXL,
UML XMI and FAMIX
XMI
Haddock Haskell Source
Code
Haskell Parser Document Management
System
Structural Document by
Module in HTML form
Scribble
Tool
PLT Schema Scribble Parser Schema strings syntax HTML Documentation
consists of library with user
guide and tutorials
50

Table 2-4: Summary of Redocumentation Tools
Universal
Report
Multi
Programming
Source Code
Universal
Report Parser
Component Diagram,
Report, Routine Diagram
Latex, HTML and plain
text formats

CHAPTER 3

COMPARATIVE EVALUATION OF THE STATE OF ART APPROACHES

3 COMPARATIVE EVALUATION OF THE STATE OF ART
APPROACHES
This chapter compares the redocumentation approaches and tools that
described in chapter 2 with respect to the following benchmarks. The
results can be seen in

3.1 Analysis Criteria
In the next this section we will discuss the criteria used to evaluate the approaches and
the tools in detailed. Basically the approaches will be evaluated using two categories as
follow:
Document Quality Attributes
Knowledge Representation Criteria

3.1.1 Document Quality
The main objective to produce documentation is to enhance the understanding on the
source code. The DMM maturity model used to measure the approaches and tools to
52

achieve this objective. Each of the attributes measured based on 5 levels. Table 3-1 and
3-2 shows the summary of the evaluations for the redocumentation approaches and tools.

Approaches
Benchmarks
Format Granularity Efficiency
Text Graphic
XML Based
Approach
3 3 3 2
Model Oriented
Approach
2 3 3 2
Incremental
Approach
2 - 2 3
Island Grammar
Approach
3 1 3 2
DocLike
Modularized Graph
Approach
3 3 3 2
Table 3-1: Comparing redocumentation approaches

53

Tools
Benchmarks
Format Granularity Efficiency
Text Graphic
Rigi 2 4 2 2
Haddock Tool 3 - 2 4
Scribble Tool 3 1 2 4
Columbus 2 - 2 2
Universal Report 3 1 1 4
Table 3-2 : Comparing redocumentation tools

Table 3-3 and Table 3-4 shows the quality level achieved by the approaches and
tools to generate the documentation. The strength of existing approaches and tools
emphasis on different levels in redocumentation process. Example model oriented
approach shows the highest maturity level for the granularity criteria. The Rigi tools
criteria shows highest for the graphics and Scribble tool shows highest for efficiency.
However, XML approaches shows in moderate level and balance for each criteria except
for efficiency.
Based on the analysis above, it shows that the quality of the documents produced it
depends on the emphasis on redocumentation process level. Like Rigi the emphasis for
visualization (output), Island grammer approach for extracting syntatic structure of the
code (parser) and Model Oriented for software evolution (knowledge based).

54

3.1.2 Knowledge Representation Criteria
Knowledge presentation criteria used to evaluate the redocumentation approaches
and tools. These following criteria used to evaluate the storage management capability
which is the heart of the redocumentation process.
i. Effectiveness
Knowledge can be presented according to different type of users or different type
of projects. In other words, user able to search and choose the information
needed. As example, the presentation able to view from different group of users
such as developer, maintainer, software engineer and customer or different
projects such as procedural or object-oriented development.

ii. Efficiency
Able to search and find the related knowledge as the time needed. Especially for
the software developer, when searching for certain knowledge.

iii. Explicitness
The representation schema able to show more detailed description related to
knowledge selected. As example, if the representation schema shows the classes,
the user able to view in detail its properties, implementation and the subclasses.
The main reason for having this criteria is to able the user identify the risk if do
any changes in the system.

iv. Accessibility
Able to perform in different environment and can be accessible through
electronic and print out copies. These criteria appropriate for the tools used for
redocumentation. As example, object oriented can be shown in different
application such as in HTML or specific software ( Rational Rose or Visio).

55

v. Modifiability
Able to modify the current representation schema for the software improvement
and able to validate any changes happen.

vi. Understandability
Able to understand the representation schema. In other words, the main purpose
of these criteria is to make sure that the user understands the output of the
presentation. As example, the representation schema able to describe the
functions produced.

Table 3-3 : The Evaluation using Knowledge Representation Criteria for
Redocumentation Approaches

The Table 3-3 shows the evaluation on the redocumentation approaches based on the
knowledge representation criteria.

Approach
Criteria
XML
Based
Approach
Model
Oriented
Incremental
Island
Grammer
DocLike
Modularized
Graph
Effectiveness High High Medium Medium Medium
Efficiency Medium Low Low Medium Low
Explicitness Medium High Medium Medium Medium
Accessibility High Low Medium Medium Medium
Modifiability Medium High Low Low Low
Understandability High High Medium Medium Medium
56

Table 3-4 : The evaluation using Knowledge Representation Criteria for
Redocumentation Tools

The detail explanation on the evaluation above discuss in detail on the following section.

3.2 Discussion

This chapter and previous chapter have aimed to provide an overview and compare
recent progress in redocumentation. These approaches and tools evaluated using two
criteria. First, evaluation done on the redocumentation approaches and tools based on
the document quality suggested by [22] as shown in Table 3-1 and 3-2. The main
objective of the evaluation is to measure the quality of the document produced to
enhance the program understanding. Second the evaluation done based on knowledge
representation for redocumentation approaches and tools as shown in Table 3-3 and 3-4.
The objective is to analyze the knowledge representation level in order to generate the
software redocumentation.
Tools
Criteria
Rigi
Haddock
Tool
Scribble
Tool
Columbus
Universal
Report
Effectiveness Low Medium Low Low Medium
Efficiency Low High Medium Low Medium
Explicitness Medium Low Low Medium Low
Accessibility Low High High Low High
Modifiability Low Low Medium Low Medium
Understandabili
ty
Low High Low Low
High
57

For the first criteria the measurement based on the format, granularity and
efficiency contribute to identify the level of document produced for program
understanding. Among the approaches, XML based approach are very common
approach used nowadays. The advantage of this approach compare to other approaches
is it can create different type of view according to the user needs. However the
abstraction is in medium level and it will be difficult to show semantically related
knowledge of a same domain. Model Oriented approach useful for system evolutionary
because it able to show the exact characteristics as the original system or the domain
level. The maintainer has a better view and understanding to handle maintenance task.
Island grammar approach used at parser level and it contribute to speed up the extraction
process and concentrate on the data analysis for the documentation. Based on the tools
the granularity still considered low and medium level. Most of the tools develop only
until the level of view the software architecture but not on the requirement level of the
system. The text format documentation produced by the all the approaches and tools still
in medium level which is the most document created are in the hyperlink such as XML
based approach. On graphic, Rigi one of tool which helps to view the graphic form of
the software component and able to navigate to certain extends. However, other
approaches and tools emphasis only on viewing the graphics but not able to navigate
using graphic. In term of efficiency the haddock and scribble tool are able to automate
the redocumentation process compare to other approaches and tools. However the
automation process is easy because it involve the low level abstraction.
The overall knowledge representation for model oriented approach shows better
compare to other approaches because it able to use the knowledge based to create higher
level granularity. However the island grammar approach, Scribble tool and Haddock tool
more efficient compare to other approaches. Because the solutions more emphasis on
low level abstraction and useful for the programmer or developer to understand the
program to manage the maintenance work.
The main problem with most of these approaches and tools for redocumentation
are the granularity level are low and limit the understanding on the domain knowledge.
58

The model oriented approach try to solve, the efficiency level are low and not able to
search the information as needed. Furthermore, these approaches and tools limit the
representation switches using schema languages and no general redocumentation process
which can use for several scenarios.

3.3 Related Work

There are huge amount of reverse engineering approaches and tools available
which developed for research based[47] and as for commercial [85]. The research goals
are focus on small problems and emphasize on the low level abstraction that exactly
match the structure for recognition. The researches on producing high level abstraction
of the software component are less in reverse engineering. Most related work is the
development of document management and retrieval tool to provide semi-automatic
metadata generator and ontology based search engine for electronic documents. RDFS
or OWL used to describing the data[86]. This tool generally emphasis on generating
ontology for personal document and not for producing high granularity software
components. The system architecture only produces the result in Listview and not as a
standard documentation.
Other related work is transforming the legacy schema to ontology using meta-
framework approach. As describe by Manual Winner, this meta-framework can generate
the ontology to improved structures and semantics compare to the original legacy
schemas[87]. To produce semantic enrichment of the schemas, this method transforms
the legacy scheme using heuristic and refactoring to improve the design and precision of
the schemas. However, this method emphasis only on the knowledge base and output
level and did not emphasis on the important of the SWP that can help enrich the
metadata to generate ontology. In addition, the Meta framework method not emphasis
on producing standard documentation or ontology to describe the artifacts in meaningful
way.

CHAPTER 4

RESEARCH METHODOLOGY

4 CHAPTER 4 RESEARCH METHODOLOGY
In this chapter, the methodology of the proposed approach will be described in
principle. This includes the research procedure, the operational framework, the
assumptions and limitations, and the schedule of the research.

4.1 Research Design and Procedure

In [88],the authors propose, as hypothesis, a classification of the research problems
of the software engineering discipline in: Engineering problems, concerned with the
construction of new objects such as methodologies, approaches, techniques, etc;
Scientific problems, concerned with the study of the existing object such as algorithmic
complexity, software metrics, testing techniques, etc. In the case of Engineering, it
would be necessary to study existing methodologies, reflecting on them to determine
their advantages and disadvantages and proposing a new one, which, while retaining the
60

advantages of the methodologies studied, would, as far as possible, lack their
shortcomings.

This research aimed to develop a new redocumentation approach using ontology. Thus,
producing this approach is an engineering problem and this research will focus on
engineering design such as modeling, constructing, and evaluating the new object. In
Figure 4-1, the steps of the research procedure have been shown. Some existing
algorithms or approaches may need to be revised and enhanced in order to achieve the
objectives of the study. In Figure 4-2, the flow chart of the research activities has been
shown.

61

62

Figure 4.1 : Research Procedure

63

Figure 4.2 : Research Flow Chart
64

4.2 Operational Framework

Table 4-1 : Operational Framework
No Research Question Objective Activity Deliverable(s)
1. What is redocumentation process and
component in the process? How the
artifact presented in documentation?
To investigate knowledge representation
concept in redocumentation process
Literature Study Result of literature
review
2. Why the existing approaches for
redocumentation process are still not
able to satisfy all users
requirements?
To evaluate all of current
redocumentation approaches
Literature study
Comparative evaluation
of current approaches
Results of comparative
evaluation
3. How to show semantic
representation in software
documentation to understand
software system?
To present new approach based on
ontology

Building a model
for redocumentation
process

Designing an
algorithm to
improve existing
redocumentation
process
The redocumentation
process model

Source code of the
algorithm
To develop composer tool that
supports the proposed approach
based on ontology
Designing the
prototype tool
Coding
Integrating the
tool
Design
documentation
Source code
Executable tool
4. How to validate the usefulness
of the
proposed approach to support
redocumentation?
To evaluate effectiveness of the
redocumentation approach based on
proposed benchmark
Analyzing the
results
Analysis results
65

66

4.3 Assumptions and Limitation

A software prototype of the proposed approach as a reasoning tool will be developed to
support the implementation of the proposed redocumentation approach. Even though the
approach can be used for any type of system environment but for this research to prove
the concept it decided to develop an environment to which can support the knowledge
acquisition and present it in the form of document. It should create a system
environment that allows the user to toggle the application between system
documentation and tool.

4.4 Assumptions and Limitations

This research has following assumptions and limitations:

i. The study assumes that the repository consistently updated with current software
system otherwise the result would be less effective. To ensure this to occur, a
case study should be selected from recent software system.
ii. The system assume that the artifact extracted from source code and database are
extracted the entire component accordingly. Existing approach can be used to
make achieve this requirement.
iii. The system assume that the database used as source only deal with relational
database system.
iv. The system assumes that maintainer or system users have knowledge to perform
reasoning for enable them search and select related knowledge from the
ontology.
v. The artifacts extracted can be either object-oriented or structural program based
software.

67

4.5 Research Planning and Schedule
No. Activity Month
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
1. Literature Review

2. Problem Definition

3. Present the Research
Proposal

4. Develop the Proposed
Approach

5. Develop the Application

6. Develop the Prototype

7. Integrate the Application
Tools

8. Test and Modify the
Reasoning Tool

9. Evaluate the Proposed
Approach

10. Write up the Thesis
Report

68

4.6 Summary

This chapter explained the research methodology and procedure to
carry out for the research. The research work carried out based on
software disciplines and practice that relate software technology
including know-how and tools to support real world environment. Some
justification of planning research against it initial objectives and
issues lead to plan for some validation approach. Some assumptions of
this research were laid out to determine proper actions to take prior
to its implementation.

CHAPTER 5

PRELIMINARY FINDINGS AND EXPECTED RESULTS

5 CHAPTER 5 PRELIMINARY FINDINGS AND EXPECTED
RESULTS
5.1 Preliminary Findings

This section explains the proposed framework for redocumentation
and the components of the framework as the preliminary findings of the
research.

5.2 Definition of the Proposed Framework

The aim of this research is to redocument the source code with other auxiliary
artifacts using ontology and show the output using a standard documentation. Some
criteria from XML based approach and model approach combine in the proposed
framework which it has own strength in receiving input, viewing and create domain
application. Ontology uses the existing schemas in the previous approaches and
generates the high vocabulary for the domain application in the form of machine-
70

readable artifact. The figure 5.1 shows the proposed framework for redocumentation
process.

71

Figure 5.1: The Proposed of Redocumentation Process Framework

The proposed framework consists of three levels which are inputs, knowledge
automation process and outputs. Each level describe in detail in the following section.

5.2.1 Input

The input level consists of two components namely: source and the parser. In the
following each of them explained:
a. Identifying Source of Data and Knowledge
In the input process, the source is related to problem interest is available to process.
Some of the sources listed are:
Source code
Database
Documentation
Maintainer, Developers, Users
As a next stage the source available which is source code, database, user manual and
maintainer will describe in detail how to retrieve the knowledge from each of these
sources.

i. Source code

Source code is one of the most reliable components to retrieve the correct
understanding about the program. However changes in the program also need to
put in the consideration to make sure retrieved knowledge is the latest one. Even
though most of the medium or large companies practices by keep a record on
changes done of the source code, but the information not stored in the repository
or machine learning tool. Some technique needs to be used to keep a record on
the file being changed especially during the maintenance process. Incremental
72

redocumentation approach has been discussed in the Chapter 2 which helps to
redocument the changes in the source code. The steps in this approach can be
used to identify the changes during the mini maintenance process.

ii. Database

Database is another important source which helps to retrieve the related
knowledge for the application domain. There are 3 major resources in the
database can use for extraction which is the data dictionary, data and also the
relationship between relation. With the query on data and system table able to
find all the 3 resources above.

iii. Documentation

Documentation such as system documentation and user manual are the available
components which help to extract high level component. Software documents
follow the proper document structure which organizes in the chapter, sections
and sub sections standard format will be used to extract the knowledge. IEEE
and military standard documents are the sample document can used for these
purposes.

iv. Software Expert Domains

Software Expert Domains such as maintainer or developer able to give input for
higher level abstraction level such software specification and requirements. Most
of the organization the knowledge software experts keep their valuable
knowledge related to application domain in their mind. Survey and interview
technique can be used to retrieve this valuable knowledge. Collected data will be
transforming to repository for analysis.
73

b. Parser

The next component in the proposed framework is the parser. The parser will be
developed for the extraction of the component from the source code. The existing
parser Island grammar approach will be used to reduce the development time for
parser and return the relevant information. The Syntax Definition Formalism
(SDF) can be used to define the island grammar. Using SDF has a few
advantages such as allow for concise and natural definition of syntax and
promote the modularization of a syntax definition. The SDF extract the related
artifacts and relationship from the source code. To extract related data from the
database, manual analysis using the data dictionary provided by the relational
database will be used.

5.2.2 Knowledge automation process

There are three components in knowledge automation process which is metadata,
ontology storage and reasoning tool. Each of this component describe in detail in the
following section. The main concern on this level is to transform the knowledge stored
in the schema to ontology. Next to support the user through reasoning tool to generate a
better structure of semantic schema and provide efficient knowledge representation.

a. Metadata

The source retrieved can be either structured or unstructured data. However, in
order to change the metadata to ontology automatically, the data must be in
structured manner. Therefore if the unstructured data retrieved from the source, it
must be transformed to structured data. In the metadata there are 2 major
74

components namely: standard documentation template metadata and the source
metadata. Standard documentation template need to be defined by the user
manually according to the standard document available in software development
such as IEEE Standard 830-1998 and MIL-STD-498. Each sub section in the
standard document will be defined in the metadata. The user will be provide with
the tools to allow the user flexibly define the metadata.
The source metadata consist with data retrieve from different sources.
The database and source code will be handled by the parser to extract the
relevant component. However sources such as developer, user manual or
graphical user interface will be define manually by the user in the source
metadata using the interface module given. The main tasks of the source
metadata are to save and manage the extracted component. The most flexible
metadata will be used is XML schema to wrap all the source and standard
documentation in structured data.

b. Ontology

This is the heart of the framework to mapping the XML schema to ontology.
Based on the ontology methodologies, the purpose will be defined as a first step.
The purpose will define the knowledge which is relevant to the domain of
interest. The conceptualization step needs to identify basic attributes and
competency questions. It structures the domain knowledge as meaningful models
at the knowledge level either from scratch or by reusing existing models. The
competency questions identify the requirement that ontology must answer. This
steps leads to a preliminary ontology containing the description of kind, relations
and properties.

The next step will be the formalization which will transform the metadata
to the formal or semi-computable model. The formal description language will
be used to describe the ontology and will be integrated with the interface model.
75

The implementation step builds computable model in an ontology language.
Final step is to validate and refactor the created ontology to make sure the quality
and usefulness of the concept for particular domain achieved. Web Ontology
Language (OWL) will be used to define, describe and express the ontology. The
steps describe above will be implemented in creating the ontology for standard
documentation and the domain ontology.
Even though the ontology for the standard documentation and domain
ontology are different, it is possible to surpass from one to another by using the
artifacts as bridge. The related artifacts in domain ontology mapped to the
section of standard document. The maintainer will be given features to query and
create the intersection between both ontologies. Next section will describe this
feature in detail.

c. Ontology Reasoner

The maintainer will be given a tool with ontology inference mechanism
capability to search and query the conceptual and semantic level to access the
knowledge that is not presented explicitly. This is important features for the
proposed approach compare to keyword-based searching. The proposed approach
makes use the existing capability of ontology using inference rules to implement
the intelligent retrieval and improve the rationality and usability of the results.
Formalization of such query allows the maintainer integrate both ontology to
produce the standard documentation containing the simplified view of software
domain model.

5.2.3 Output

The output consists with the interface that used to refine raw or semantically
enriched standard documentation. Based on the domain ontology produced and
76

query from the reasoning tool, the standard documentation can be produced such
as MIL-Std-498 standard system documentation. The result from the ontology
can be mapped to predefine component in the standard document structure such
as component description, use case diagram and descriptions, possible
requirement specification, data flow graph for specific use case, relationship
among component using semantic relationship from the domain point of view.
The interface module will be equipped with the functions such as metadata
management module, ontology creator and browser, query editor and
documentation viewer. The table below shows the description for each interface
module.
Table 5-1 : User Interface Description

5.3 Expected Findings

No. Interface Description
1. Metadata Management
Module
Provide semi-automatic mechanism to create the
metadata summarizing the artifacts received from the
parser and the other source. However other source will
be manually input the metadata.
2. Ontology Creator and
Browser
Ontology creator accomplished with the generator
engine that read the metadata in order to generate the
ontology for application domain.
3. Query Editor Provides the ontology based search or inference
mechanism to allow the user to process the knowledge
source required.
4. Documentation
Management Module
Integrate with the query editor to extract the knowledge
source from the domain ontology and map with the
standard documentation.
77

This research is expected to present a new approach on redocumentation for ease
software maintenance. The expected finding includes:
i. A semi-automated redocumentation process approach based on ontology.
ii. A supporting prototype tool with capabilities of discovers and combines
the ontology component to produce documentation.
iii. An evaluation of the result of the proposed approach with other current
approaches to determine the effectiveness of the approach.

78

REFERENCES

1. Elliot, J .C. and H.C. J ames, II, Reverse Engineering and Design Recovery: A
Taxonomy. IEEE Softw., 1990. 7(1): p. 13-17.
2. Sergio Cozzetti, B.d.S., et al., A study of the documentation essential to software
maintenance, in Proceedings of the 23rd annual international conference on
Design of communication: documenting \& designing for pervasive
information. 2005, ACM: Coventry, United Kingdom.
3. Tilley, S., Three Challenges in Program Redocumentation for Distributed
Systems. 2008
4. Hausi, A.M., et al., Understanding software systems using reverse engineering
technology perspectives from the Rigi project, in Proceedings of the 1993
conference of the Centre for Advanced Studies on Collaborative research:
software engineering - Volume 1. 1993, IBM Press: Toronto, Ontario, Canada.
5. Vaclav, R., Incremental Redocumentation with Hypertext, in Proceedings of the
1st Euromicro Working Conference on Software Maintenance and
Reengineering (CSMR '97). 1997, IEEE Computer Society.
6. Shahida Sulaiman, N.B.I., Shamsul Sahibuddin,Sarina Sulaiman, Re-
documenting, Visualizing and Understanding Software System Using DocLike
Viewer, in Proceedings of the Tenth Asia-Pacific Software Engineering
Conference (APSEC03). 2003, IEEE.
7. Feng, C. and Y. Hongji. Model Oriented Evolutionary Redocumentation. in
Computer Software and Applications Conference, 2007. COMPSAC 2007. 31st
Annual International. 2007.
8. Rajlich, V., Incremental redocumentation using the Web. Software, IEEE, 2000.
17(5): p. 102-106.
9. Margaret-Anne, D.S., et al., Rigi: a visualization environment for reverse
engineering, in Proceedings of the 19th international conference on Software
engineering. 1997, ACM: Boston, Massachusetts, United States.
10. Matthew, F., B. Eli, and F. Robert Bruce, Scribble: closing the book on ad hoc
documentation tools, in Proceedings of the 14th ACM SIGPLAN international
conference on Functional programming. 2009, ACM: Edinburgh, Scotland.
11. J ochen, H., H. Shihong, and T. Scott, Documenting software systems with views
II: an integrated approach based on XML, in Proceedings of the 19th annual
international conference on Computer documentation. 2001, ACM Press: Sante
Fe, New Mexico, USA.
12. Scott, T. and H. Shihong, Documenting software systems with views III: towards
a task-oriented classification of program visualization techniques, in
Proceedings of the 20th annual international conference on Computer
documentation. 2002, ACM Press: Toronto, Ontario, Canada.
13. Takang, P.G.A.A., SOFTWARE MAINTENANCE CONCEPT AND PRACTICE.
2nd. Edition ed. 2003: World Scientific Publishing Co. Pte. Ltd.
79

14. Margaret-Anne, S., Theories, tools and research methods in program
comprehension: past, present and future. Software Quality Control, 2006. 14(3):
p. 187-208.
15. Edelstein, D.V., Report on the IEEE STD 1219-1993- Standard for software
maintenance. SIGSOFT Softw. Eng. Notes, 1993. 18(4): p. 94-95.
16. Keyes, J ., Software Engineering Handbook. 2005: A CRC Press Company.
17. McDermid, J ., Software engineer's reference book. illustrated, reprint ed, ed. J .
McDermid. 1991: CRC Press, 1993.
18. M. P. Ward , K.H.B., Formal Methods for Legacy Systems. J . Software
Maintenance: Research and Practice, 1995. 7: p. 203--219.
19. KHUZZAN, S.B.A.B.S., e-Fiber : Reengineering of a Legacy System. 2010,
UNIVERSITI TEKNOLOGI MALAYSIA.
20. Forward, A., Software Documentation Building and Maintaining Artefacts of
Communication, in Ottawa-Carleton Institute for Computer Science. 2002,
University of Ottawa: Canada. p. 167.
21. Andrew, F. and C.L. Timothy, The relevance of software documentation, tools
and technologies: a survey, in Proceedings of the 2002 ACM symposium on
Document engineering. 2002, ACM: McLean, Virginia, USA.
22. Shihong, H. and T. Scott, Towards a documentation maturity model, in
Proceedings of the 21st annual international conference on Documentation.
2003, ACM Press: San Francisco, CA, USA.
23. Thomas, V., N. Kurt, and rmark, Maintaining program understanding: issues,
tools, and future directions. Nordic J . of Computing, 2004. 11(3): p. 303-320.
24. Sommerville, I. Software Documentation. 7/11/01 7/11/01 [cited; Available
from: http://www.literateprogramming.com/.
25. Corp., I. UML Resource Center. [cited; Available from:
http://www.rational.com/uml.
26. IEEE Standard for Software Maintenance. IEEE Std 1219-1998, 1998: p. i.
27. Robert, M.F. and M. Malcolm, Redocumentation for the maintenance of
software, in Proceedings of the 30th annual Southeast regional conference.
1992, ACM Press: Raleigh, North Carolina.
28. William, J .P. and R.B. Michael, An approach for reverse engineering of
relational databases. Commun. ACM, 1994. 37(5): p. 42-ff.
29. Tim, T. and T. Scott, Documenting software systems with views V: towards
visual documentation of design patterns as an aid to program understanding, in
Proceedings of the 25th annual ACM international conference on Design of
communication. 2007, ACM: El Paso, Texas, USA.
30. Giuliano Antoniol, G.C., Gerardo Casazza,Andrea De Lucia,Ettore Merlo,
Recovering Traceability Links between Code and Documentation 2002.
31. Paolo Tonella, M.T., Bart Du Bois,Tarja Systa, ed. Empirical studies in reverse
engineering: state of the art and future trends. ed. D.M. Berry. 2007, Springer
Science.
32. Biggerstaff, T.J ., Design recovery for maintenance and reuse. Computer, 1989.
22(7): p. 36-49.
80

33. K.Lano, H.H., Reverse Engineering and Software Maintenance : A Practical
Approach. 1994: Mc-Graw Hill, London.
34. Benedusi, P., A. Cimitile, and U. De Carlini. A reverse engineering methodology
to reconstruct hierarchical data flow diagrams for software maintenance. in
Software Maintenance, 1989., Proceedings., Conference on. 1989.
35. Inc., B.S. Reverse Engineering. 2010 [cited; Available from: http://bus-
software.com/re.htm.
36. Shihong, H., et al. Adoption-Centric Software Maintenance Process
Improvement via Information Integration. in Software Technology and
Engineering Practice, 2005. 13th IEEE International Workshop on. 2005.
37. Arie van, D. and K. Tobias, Building Documentation Generators, in Proceedings
of the IEEE International Conference on Software Maintenance. 1999, IEEE
Computer Society.
38. Simon, M., Haddock, a Haskell documentation tool, in Proceedings of the 2002
ACM SIGPLAN workshop on Haskell. 2002, ACM: Pittsburgh, Pennsylvania.
39. Tadonki, C. Universal Report: a generic reverse engineering tool. in Program
Comprehension, 2004. Proceedings. 12th IEEE International Workshop on.
2004.
40. Shirabad, J .S., Supporting Software Maintenance by Mining Software Update
Records, in School of Information Technology and Engineering. 2003,
University of Ottawa: Canada. p. 258.
41. Daniel, G.S., S.D. J effrey, and P.M. Brian, A knowledge base for supporting an
intelligent program editor, in Proceedings of the 7th international conference on
Software engineering. 1984, IEEE Press: Orlando, Florida, United States.
42. Ward, F.W.C.a.M.K.a.M.M.a.M., Knowledge-Based System for Software
Maintenance, in in Proceedings for the Conference on Software Maintenance.
1988.
43. Zheng, X.-d., H.-h. Hu, and H.-r. Gu. Research on Core Business Process-Based
Knowledge Process Model. in Semantics, Knowledge and Grid, 2009. SKG 2009.
Fifth International Conference on. 2009.
44. Heisig, K.P., Business Process-oriented KM. 2007, Xian J iaotong University.
45. Holger M. Kienle, H.A.M.u., The Rigi Reverse Engineering Environment, in
Proceedings of the International Workshop on Advanced Software Development
Tools and Techniques 2008, ECOOP: Paphos, Cyprus.
46. Kenny, W., et al., Structural Redocumentation: A Case Study. 1995, IEEE
Computer Society Press. p. 46-54.
47. Ferenc, R., Columbus Reverse Engineering Tool and Schema for C++. 2002.
48. Rigi User's Manual. 1996, University of Victoria: Canada.
49. Ronald J .Brachman, H.J .L., Knowledge Representation and Reasoning. 2004,
San Francisco: Morgan Kaufmann.
50. Domain Knowledge in Planning Representation and Use.
51. Russell, G., D. Christian, and N.I. Santoso, Efficient reasoning. ACM Comput.
Surv., 2001. 33(1): p. 1-30.
52. Hepp, M., Ontologies: State of the Art, Business Potential, and Grand
Challenges, in Ontology Management. 2008. p. 3-22.
81

53. Mark S. Fox, M.G., An Organizational Ontology for Enterprise Modeling. 1998,
AAAI Press / MIT Press, Menlo Park. p. 131--152.
54. Stephan, G.s., H.s. Pascal, and A.s. Andreas, Knowledge Representation and
Ontologies, in Semantic Web Services. 2007. p. 51-105.
55. Fensel, D., Lausen, H., Polleres, A., Bruijn, J ., Stollberg, M., Roman, D.,
Domingue, J . , Enabling Semantic Web Services : The Web Service Modeling
Ontology. 2007: Springer.
56. Fellbaum, C., WordNet:An Electronic Lexical Database, G. Miller, Editor. 1998,
MIT Press.
57. Douglas, B.L., CYC: a large-scale investment in knowledge infrastructure.
Commun. ACM, 1995. 38(11): p. 33-38.
58. Mrcio Greyck Batista Dias, N.A., Kthia Maral de Oliveira, Organizing the
Knowledge Used in Software Maintenance. J ournal of Universal Computer
Science, 2003.
59. Leonid Kof, M.P., Validating Documentation with Domain Ontologies, in
Proceeding of the 2005 conference on New Trends in Software Methodologies,
Tools and Techniques: Proceedings of the fourth SoMeT_W05. 2005, IOS Press
Amsterdam, The Netherlands, The Netherlands
60. Mike Uschold, M.K., Towards a Methodology for Building Ontologies, in In
Workshop on Basic Ontological Issues in Knowledge Sharing, held in
conjunction with IJCAI-95. 1995: Canada.
61. Seremeti, L. and A. Kameas. A Task-Based Ontology Enginnering Approach for
Novice Ontology Developers. in Informatics, 2009. BCI '09. Fourth Balkan
Conference in. 2009.
62. B. Chandrasekaran, J .R.J .a.V.R.B., Ontology of Tasks and Methods, in AAAI
Spring Symposium and the1998 Banff Knowledge Acquisition Workshop. 1998:
Canada.
63. Thomas, R.G., A translation approach to portable ontology specifications.
Knowl. Acquis., 1993. 5(2): p. 199-220.
64. G.A.Miller, WorNet : An online lexical database. International journal of
Lexicography, 1990.
65. J ames G. Schmolze, T.A.L., Classification in the KL-ONE Knowledge
Representation System. 1985. p. Cognitive Science, 9(2):171216.
66. Alexander Borgida, R.J .B., Deborah L. McGuinness,Lori Alperin Resnick,
CLASSIC: A Structural Data Model for Objects, in In Proceedings of the 1989
ACM SIGMOD International Conference on Management of Data. 1989. p.
pages 5967.
67. Adam Farquhar, R.F., J ames Rice, The Ontolingua Server:a Tool for
Collaborative Ontology Construction, in In Proceedings of the 10th Knowledge
Acquisition for Knowledge-Based Systems Workshop, Banff. 1996: Canada.
68. Manola, F. and E. Miller. RDF Primer. 10 Feb 2004 [cited; Available from:
http://www.w3.org/TR/rdf-primer/.
69. The Semantic Web, in Enabling Semantic Web Services. 2007. p. 25-36.
70. From Web to Semantic Web, in Implementing Semantic Web Services. 2008. p. 3-
25.
82

71. Cardoso, J ., The Semantic Web: A mythical story or a solid reality?, in Metadata
and Semantics. 2009. p. 253-257.
72. J in, P., C. Stephen, and C. Daniel, A lightweight ontology repository, in
Proceedings of the second international joint conference on Autonomous agents
and multiagent systems. 2003, ACM: Melbourne, Australia.
73. Guarino, N., Formal Ontology and Information Systems, in 1st International
Conference on Formal Ontology in Information Systems (FOIS98),. 1998:
Trento,Italy.
74. Welty, C.A. and D.A. Ferrucci. A formal ontology for re-use of software
architecture documents. in Automated Software Engineering, 1999. 14th IEEE
International Conference on. 1999.
75. Aksit, I.K.J .B.M., Technological Spaces: an Initial Appraisal, in CoopIS,
DOA'2002 Federated Conferences, Industrial track 2002, CiteSeer Scientific
Literature Digital Library.
76. Verhoeven, E.-J ., Cobol Island Grammer in SDF, in Informatic Institute 2000,
University of Armsterdem. p. 73.
77. Mark G. J . van den Brand, P.K., Chris Verhoef Core Technologies for System
Renovation, in SOFSEM '96: Proceedings of the 23rd Seminar on Current
Trends in Theory and Practice of Informatics: Theory and Practice of
Informatics 1996, Springer-Verlag
78. Moonen, L. Generating robust parsers using island grammars. in Reverse
Engineering, 2001. Proceedings. Eighth Working Conference on. 2001.
79. Re-documenting, Visualizing and Understanding Software System Using
DocLike Viewer.
80. H. A, M., ller, and K. Klashinsky, Rigi-A system for programming-in-the-large,
in Proceedings of the 10th international conference on Software engineering.
1988, IEEE Computer Society Press: Singapore.
81. Javadoc Tool Home Page. [cited; Available from:
http://java.sun.com/j2se/javadoc/.
82. Heesch, D.v. Doxygen Source Code Documentation Generator Tool. 2007
[cited; Available from: http://www.stack.nl/~dimitri/doxygen/.
83. Leroy, X. The Objective Caml System Release 3.11. 2008 [cited; Available
from: http://caml.inria.fr/pub/docs/manual-ocaml/index.html.
84. PLT Scheme. [cited; Available from: http://www.plt-scheme.org/.
85. Robert, P. and T. Scott, Automatically connecting documentation to code with
rose, in Proceedings of the 20th annual international conference on Computer
documentation. 2002, ACM Press: Toronto, Ontario, Canada.
86. Hak-lae Kim , H.-G.K., Kyung-Mo Park Ontalk: Ontology-Based Personal
Document Management System. 2004, CiteSeer- Scientific Literature Digital
Library and Search Engine.
87. Wimmer, M. A Meta-Framework for Generating Ontologies from Legacy
Schemas. in Database and Expert Systems Application, 2009. DEXA '09. 20th
International Workshop on. 2009.
83

88. Marcos., M.L.a.E., Research in Software Engineering: Paradigms and Methods,
in In Proceedings of the 17th International Conference on Advanced Information
System (CAiSE05). J une 2005: Porto, Portugal.

PHD Proposal

Uploaded by

Copyright:

Available Formats

PHD Proposal

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

PHD Proposal

Uploaded by

Copyright:

Available Formats

What is software maintenance and why is it important?

What is the goal of reverse engineering?

Research Proposal

An Effective Semantic Knowledge Representation Using Ontology To

You might also like