This research proposal aims to develop an effective semantic knowledge representation using ontology to support the redocumentation process for legacy systems in software maintenance.
The proposal provides background on the problems of maintaining legacy software systems and redocumenting their documentation. It outlines the objectives to develop an ontology-based framework to automate the extraction of knowledge from legacy code and produce updated documentation.
The proposal will involve reviewing existing approaches to software redocumentation and knowledge representation using ontologies. It will also describe the proposed methodology, including defining the framework components and operations. Preliminary findings will be presented on how ontologies can support automated redocumentation and improve documentation quality.
This research proposal aims to develop an effective semantic knowledge representation using ontology to support the redocumentation process for legacy systems in software maintenance.
The proposal provides background on the problems of maintaining legacy software systems and redocumenting their documentation. It outlines the objectives to develop an ontology-based framework to automate the extraction of knowledge from legacy code and produce updated documentation.
The proposal will involve reviewing existing approaches to software redocumentation and knowledge representation using ontologies. It will also describe the proposed methodology, including defining the framework components and operations. Preliminary findings will be presented on how ontologies can support automated redocumentation and improve documentation quality.
This research proposal aims to develop an effective semantic knowledge representation using ontology to support the redocumentation process for legacy systems in software maintenance.
The proposal provides background on the problems of maintaining legacy software systems and redocumenting their documentation. It outlines the objectives to develop an ontology-based framework to automate the extraction of knowledge from legacy code and produce updated documentation.
The proposal will involve reviewing existing approaches to software redocumentation and knowledge representation using ontologies. It will also describe the proposed methodology, including defining the framework components and operations. Preliminary findings will be presented on how ontologies can support automated redocumentation and improve documentation quality.
This research proposal aims to develop an effective semantic knowledge representation using ontology to support the redocumentation process for legacy systems in software maintenance.
The proposal provides background on the problems of maintaining legacy software systems and redocumenting their documentation. It outlines the objectives to develop an ontology-based framework to automate the extraction of knowledge from legacy code and produce updated documentation.
The proposal will involve reviewing existing approaches to software redocumentation and knowledge representation using ontologies. It will also describe the proposed methodology, including defining the framework components and operations. Preliminary findings will be presented on how ontologies can support automated redocumentation and improve documentation quality.
The document discusses research on using ontologies to support the redocumentation process for legacy systems. It covers topics like software maintenance, reverse engineering, knowledge representation, and different redocumentation approaches and tools.
Software maintenance is the modification of a software product after delivery to correct faults, improve performance or other attributes, or adapt the product to a changed environment. It is an important part of the software lifecycle as most costs occur during the maintenance phase.
The goal of reverse engineering is to analyze a subject system to identify the system's components and their interrelationships and create representations of the system in another form or at a higher level of abstraction. It aims to understand the design of a system and extract information that can be used to modify or recreate the system.
Research Proposal
An Effective Semantic Knowledge Representation Using Ontology To
Support Redocumentation Process for Legacy System in Software Maintenance
Supervisor : Dr.Suhaimi Ibrahim
School of Graduate Studies Universiti Teknologi Malaysia 2010 ii
1 INTRODUCTION .............................................................................. 1 1.1 Background of the problem ................................................................................. 1 1.2 Statement of the Problem .................................................................................... 3 1.3 Objectives ............................................................................................................ 4 1.4 Scope of Study ..................................................................................................... 4 1.5 Significant of Study ............................................................................................. 5 2 CHAPTER 2 LITERATURE REVIEW .......................................... 7 2.1 Introduction ......................................................................................................... 7 2.2 Definition of Software Maintenance ................................................................... 8 2.3 Definition of Legacy Systems ............................................................................. 9 2.4 Definitions of Software Documentation .............................................................. 9 2.5 Definitions of Reverse Engineering .................................................................. 11 2.5.1 GoalofReverseEngineering................................................................................12 2.5.2 Abstraction...........................................................................................................12 2.5.3 LevelsofReverseEngineering..............................................................................13 2.5.4 ReverseEngineeringMethods,TechniquesandTools........................................14 2.6 Definitions of Redocumentation........................................................................ 17 2.6.1 RedocumentationProcess...................................................................................18 2.7 Document Quality ............................................................................................. 23 2.8 Definition of Knowledge Representation .......................................................... 26 iii
2.9 Definition of Reasoning .................................................................................... 27 2.10 Definition of Ontology .................................................................................. 28 2.10.1 GeneralityofOntologies......................................................................................29 2.10.2 MethodologyforconstructingOntologies...........................................................30 2.10.3 TypeofOntology..................................................................................................31 2.10.4 OntologiesLanguages..........................................................................................31 2.10.5 ApplicationofOntologies.....................................................................................33 2.11 Redocumentation Approaches and Tools ...................................................... 34 2.11.1 XMLBasedApproach...........................................................................................34 2.11.2 ModelOrientedRedocumentationApproach.....................................................35 2.11.3 IncrementalRedocumentationApproach...........................................................37 2.11.4 IslandGrammarApproach...................................................................................40 2.11.5 DocLikeModularizedGraphApproach................................................................41 2.12 Redocumentation Tools ................................................................................. 43 2.12.1 Rigi........................................................................................................................43 2.12.2 HaddockToolforHaskellDocumentation...........................................................44 2.12.3 ScribbleTool.........................................................................................................46 2.12.4 UniversalReport..................................................................................................46 2.13 Summary of Redocumentation approaches and tools .................................... 48 3 COMPARATIVE EVALUATION OF THE STATE OF ART APPROACHES ........................................................................................... 51 3.1 Analysis Criteria ................................................................................................ 51 3.1.1 DocumentQuality................................................................................................51 3.1.2 KnowledgeRepresentationCriteria.....................................................................54 3.2 Discussion ......................................................................................................... 56 3.3 Related Work ..................................................................................................... 58 iv
4 CHAPTER 4 RESEARCH METHODOLOGY .......................... 59 4.1 Research Design and Procedure ........................................................................ 59 4.2 Operational Framework ..................................................................................... 64 4.3 Assumptions and Limitation ............................................................................. 66 4.4 Assumptions and Limitations ............................................................................ 66 4.5 Research Planning and Schedule ....................................................................... 67 4.6 Summary ........................................................................................................... 68 5 CHAPTER 5 PRELIMINARY FINDINGS AND EXPECTED RESULTS ..................................................................................................... 69 5.1 Preliminary Findings ......................................................................................... 69 5.2 Definition of the Proposed Framework ............................................................. 69 5.2.1 Input.....................................................................................................................71 5.2.2 Knowledgeautomationprocess..........................................................................73 5.2.3 Output..................................................................................................................75 5.3 Components of the Proposed Framework ......................................................... 76
LIST OF FIGURES Figure 2.1 Level of abstraction in a software system[13] ................................................ 14 Figure 2.2 : Sample of reverse engineering methods and tools [31] ................................ 15 Figure 2.3: Reverse Engineering Tools Architecture [1] ................................................. 16 Figure 2.4: Redocumentation Process .............................................................................. 19 Figure 2.5 : Geographic visualization for geographic ontology [54] ............................... 29 Figure 2.6 : RDF Triples in graph model ......................................................................... 32 Figure 2.7: Model Oriented Framework [7] ..................................................................... 36 Figure 2.8: Final Documentation Produced using MIP tool [7] ...................................... 37 Figure 2.9 : Example on using Island Grammar Approach[37] ....................................... 40 Figure 2.10 DocLike Viewer Prototype Tool[6] .............................................................. 42 Figure 2.11 Programming Language Supported by Universal Report ............................. 47 Figure 4.1 : Research Procedure ...................................................................................... 62 Figure 4.2 : Research Flow Chart .................................................................................... 63 Figure 5.1: The Proposed of Redocumentation Process Framework ............................... 71
Table 2-1: Sample Documentation for Different Level of Abstraction 13 Table 2-2: Key Product Attributes (KPA) and the description for each maturity level 26 Table 2-3 : Summary of Redocumentation Approaches 49 Table 2-4: Summary of Redocumentation Tools 50 Table 3-1: Comparing redocumentation approaches 52 Table 3-2 : Comparing redocumentation tools 53 Table 3-3 : The Evaluation using Knowledge Representation Criteria for Redocumentation Approaches 55 Table 3-4 : The evaluation using Knowledge Representation Criteria for Redocumentation Tools 56 Table 4-1 : Operational Framework 64 Table 5-1 : User Interface Description 76
1 INTRODUCTION In this chapter an introduction to research proposal is provided. First of all, the background of the problem to be solved is described. After that, the problem statement, and also objective, scope, and importance of the study are described respectively.
1.1 Background of the problem
Software documentation is an important component in the software development and software engineering as a general. It is one the best resources to improve the development and maintain the understanding of the program and oldest practices that continues till now[2]. However, such documentation suffer from the following problems: not up-to-date, poor quality, not standardize, lack of interest of the programmers in the documentation, provides just single perspective and produced in the format which is not suitable for the maintenance [3]. Most of the legacy software 2
system faces this problem in providing the understanding on software system for software evolution or software maintenance. The situation get worst if the people involve maintaining the software system did not participate in the development of the system. Ignoring the legacy system by creating a new system will be not a good idea in most of the situation because the current system may have accumulated critical knowledge which is not presented elsewhere.
Software redocumentation is one the approach used as aiding for program understanding to support the maintenance and evolution. It helps to extract the artifacts from the sources via reverse engineering technique and presented in form of documentation needed. They are some research have been done related to software redocumentation to improve the understanding of the software system. Most of solutions, emphasis to link source code and documentation, incremental redocumentation for program comprehension, flexible browsing the system design and transform legacy system to UML diagram. There is very few solutions focus for high level abstraction. In other words, current solutions emphasis on representation switches such as providing alternate views [4], annotations [5] ,structural documentation[6] and model oriented documentation [7]. The tools developed for redocumentation are slow and unable to answer questions in software maintenance such as what is the underlying principle or purpose for this code? and what is the relation between diagrams to the requirements. The tools developed also become doubtful in producing the standard documents. The cause is the content and the styles of the software documentation differ among programmers which become time consuming for the maintainer to digest and understand the content of the documents.
The main issues need to be addressed here is the software documentation produced from redocumentation process need to emphasis on the important of explicit documenting domain knowledge to improve the program comprehension in software maintenance and presented in standard documentation. The maintainer needs a better understanding about the semantic relationships among the component from real world 3
domain point of view especially if the maintainer is the new member in the program domain. The benefits of generating the document not only realize by the maintainer but also other personnel such as programmer, software engineer and technical manager to use as a communication tool.
1.2 Statement of the Problem
This research is intended to deal with the problems related to software documentation using redocumentation process on legacy system for program understanding in software maintenance. The main question is How to produce an effective software documentation using redocumentation model and approach that can show domain knowledge on legacy system for program understanding to improve the software maintenance. The sub questions of the main research question are as follows: i. What are the current redocumentation approaches and tools? ii. Why the current software redocumentation approaches and tools are still not able to show domain knowledge in the software documentation using redocumentation process? iii. What is the best way to show the domain knowledge for the documentation produced using reverse engineering technique? iv. How to measure the software documentation produced from software redocumentation process? v. How to measure the schema representation produce from the redocumentation process? vi. How to map the schema representation produced with the standard documentation? vii. How to validate the usefulness of software documentation for program understanding in software maintenance? 4
1.3 Objectives
The research objectives are mentioned based on the problem statement, as follows:
1. To study and investigate the current issues related to redocumentation. 2. To analyse the existing redocumentation approaches. 3. To formulate new redocumentation approach to represent the higher level abstraction and integrate with standard documentation. 4. To develop prototype tool that supports the proposed approach. 5. To evaluate the proposed approach based on the evaluation criteria.
1.4 Scope of Study
Software redocumentation process used to produce software documentation in different type of form such as text[8] and graphics[9]. The text form also can be shown in electric form such as HTML[10] or XML files[11]. Directed graph is one of the model uses to represent the artifacts in the graphics[12]. The techniques and approaches used may be different from one to another due to different requirement and objective. Some may be geared to understanding the software system while others used as aiding to take decision for system evolution.
In this scope of the research, it needs to explore the redocumentation process to produce the standard software documentation to describe the context of the software system or in terms of domain specific concepts. The models and techniques used should be able to produce documentation from various types of SWP. It needs to capture the artifacts from the latest version of the software system by establishing the reverse engineering environment. This will involve the developing tool to extract the artifact 5
from the source code and translated to the component which can present the high level abstraction of the system in standard documentation.
The new work should able to view the documentation as a result from the reasoning tool which supports the search and select mechanism to extract needed knowledge. The software engineers or maintainers personnel should consider these results as an aiding to understand the program for the maintenance task.
1.5 Significant of Study
Software maintenance identified as a most expensive phase in software lifecycle which is estimated 40-70% of the costs of entire development of software system[13]. Software maintenance becomes multi-dimensional problem that creates ongoing challenges for both research community and tool developers. The challenges exists because the different representation and interrelationships that exist among software artifact and knowledge base. In addition, the software documentation which is describing the artifacts presented in non-formal documentation that have problem for the maintainers team to have a common understanding on the software system. Redocumentation process needed to solve the problems which have mention above. There are some of the redocumentation tools and methods exist to solve the above problem. However, those approaches only focus on the representation switches and lack of semantic representation. Thus, there is need for further research in this field from various perspectives as listed below: a. There is a need to present the domain component and describe the relevant concept to establish the semantic relationship among the component. b. There is a need to integrate the standard document templates and the software component to have better communication between maintainers on the software system. 6
c. Maintainer able to find the relevant information within a reasonable time frame.
The important of having the domain specific tools to support program comprehension describe by [14]. Therefore, the software redocumentation is important process to improve the software program understanding in the software maintenance.
2.1 Introduction
In this chapter, some background on software maintenance, software documentation, reverse engineering, redocumentation process and the repository used in redocumentation process for knowledge presentation are provided. In addition, the redocumentation approaches and the tools also have been described. The purpose of this chapter is to describe the basic concepts and present current approaches for redocumentation.
2.2 Definition of Software Maintenance
The definition of software maintenance is the modification of a software product after delivery, to correct faults, to improve performance or other attributes, or to adapt the product to a modified environment [15]. According to [2], maintenance define in more broader scope which is traditionally define as any modification made on a system after its delivery. Studies show that software maintenance is the predominant activity in software engineering which consist of 75-80% total cost of the typical software system [16].The important of the software maintenance painted clearly when most of the companies enthusiastic to maintain the software system even the cost is high. To understand software maintenance we need to be clear about what is meant by 'software'. It is a common misconception to believe that software is programs. This can lead to a misunderstanding of terms that include the word 'software'. For instance, when thinking about software maintenance activities, there is a temptation to think of activities carried out exclusively on programs. This is because many software maintainers are more familiar with, or rather are more exposed to programs than other components of a software system[13]. McDermid's definition [17] makes clear that software comprises not only programs - source and object code - but also documentation of any facet of the program, such as requirements analysis, specification, design, system and user manuals, and the procedures used to set up and operate the software system. The documentation help the maintainer to access the information about the whole system as much as possible. Always the maintainer will not will be author of software document and by having a good documentation will help the maintainer to understand the component of the system and other issues that may be relevant for successful maintenance.
2.3 Definition of Legacy Systems
Legacy system can be defined as software we dont know what to do with, but it is still performing useful job [18]. Most of the legacy system faced problem in maintaining the updates in the documentation, lack of standardization and openness, difficult to understand and change. The change in industry model such as e-Business and globalization, changing business models, integration in information technologies and emerging new information architecture such as J 2EE, Web Services and reuse[19]. The implication is to introduce new solutions to discard the system and develop new system. However this solution not applicable in all situations for example the software built with complex requirements which is not represented elsewhere (documentation), so discarding the system will lost also the knowledge accumulated in the system. In addition, the large software system has many users who not document the features and side effect of the software. It will be tedious work to ask the user to produce the document again for new system. Therefore, maintaining the interface and the functionality of the legacy system is very important. To achieve the objective, the ultimate semantic representation of the system knowledge needed to understand the legacy system.
2.4 Definitions of Software Documentation
According to Ambler [20], software documentation is abstraction of knowledge about a system and only as long as document or model (or any other artifact) can effectively communicate knowledge does it forms part of the projects software documentation, even if only short lifespan. Common examples of such software documentation include requirement, specification, architectural, detailed design documents, business plan, test plans and bugs reports. These documents are geared to 10
individuals involved in the production of that software. Such individuals include managers, project leaders, developers and customers[21]. Documentation attributes describe information about a document beyond the content provided within. Example attributes include the documents writing style, grammar, extent to which it is up to date, type, format, visibility, etc. Documentation artifacts consist of whole documents, or elements within a document such as tables, examples, diagrams, etc. An artifact is an entity that communicates information about the software system.
Documentation playing an important role in aiding program understanding to support software evolution[22]. Documentation also represents the observation and arguments for certain quality of programs[23]. There are two aspects to documentation quality: the process and the product[24]. Documentation process quality focuses on the manner in which the documentation is produced (for example, whether or not an organization follows a systematic approach to document development that is closely aligned with the rest of the software process). Documentation product quality is concerned with attributes of the final product (for example, whether or not it uses standardized representations like the Unified Modeling Language (UML) [25] in graphical documentation). This is important for the software maintainer in order to understand the system for decision making process. Documentation quality can be achieved by using the standard documentation to have better communication between the stakeholders on the related software system. In addition, standard documentation also helps to reduce the time and effort spends to develop new software system.
There are some of the documentation standards available in the software development environment such as MIL-STD-498 and IEEE Standard 830-1998 documentation. Even though there are different standard documentation but the content could be relatively same. MIL-STD-498 have offered almost twenty two types of software documents to support all the phases in software development such as Software Development Plan (SDP) for planning phase, Software Requirement Specification (SRS) 11
for requirement phase, Software Design Description (SDD) for design phase and Software Testing Plan (STP) for software testing phase.
However, among the documents describe above SRS and SDD documents are more relevant document can be used to understand the software requirement and design. SRS used to explain the scope of the project, definition of specific terms used, background information and essential part of the system. SDD used to design information on the high level and low level designs. In general, the software documentation should become aid and also a form of communication tool to have a common understanding between team members on software system [20].
2.5 Definitions of Reverse Engineering
According to [1], reverse engineering is the process of analyzing a subject system to identify systems component and their inter relationships and create representations of the system in another form or at a higher level of abstraction. The first part of the definition, identifying components and relationships is narrowed to the problem of recovering some design-level views of the system, representing its structure as a set of boxes and links, which are derived from the code disregarding some implementation details. Several available reverse engineering methods do no fit this definition. Consider, for example, slicing or feature location. None of the two produces an output consisting of components and relationships. The second part of the definition by Chikofsky and Cross has a wider applicability and allows covering many more reverse engineering methods. However, it is not completely satisfactory, in that either it is too vaguewhat are the mentioned representations of the system in another form?or it falls into the first case, if we interpret the higher level of abstraction as the design, assuming we are analyzing the code. Moreover, it misses a few important characteristics of reverse engineering. 12
Namely, the context (i.e., the task in which reverse engineering is conducted), the role of automation (automated or semi-automated methods are in the scope of reverse engineering) and the knowledge acquisition process, which is an integral part of reverse engineering. It does not involve changing the subject system or creating a new system based on the reverse-engineered subject system. It is a process of examination, not a process of change or replication.
2.5.1 Goal of Reverse Engineering
The IEEE-1219 [26] standard emphasis that reverse engineering is one of the important technology for supporting systems that have the source code as the only reliable representation. The goal of reverse engineering are generate alternate view, understand the complex program, detect side effects, recovering information lost, and synthesize higher abstractions, produce alternate views facilitating reuse. There are a lot of areas covered by the reverse engineering technology. Examples of problem areas where reverse engineering has been successfully applied include redocumenting programs [7] [27] and relational databases [28], recovering design patterns [29], building traceability between code and documentation [30].
2.5.2 Abstraction
According to [13], in reverse engineering, abstraction is achieved by highlighting the important feature of the subject system and ignoring the irrelevant ones[13]. Abstraction is defined as a model that summarizes the detail of the subject it is representing [1]. This abstraction level is understood within the context of the software systems lifecycle which creates representations of the system in textual form or graphical form. The classic waterfall development lifecycle calls for requirements/specifications followed by design, code, testing, implement, and maintain. Based on the development lifecycle, we can simplify the abstraction level to requirements (specification of the problem being solved), design (specification of the 13
solution) and implementation (coding, testing, delivery of the operational system). Table 2-1 shows the sample document produce for different level of abstraction. However using reverse engineering method the process will be start with source code, design and requirement.
Table 2-1: Sample Documentation for Different Level of Abstraction
2.5.3 Levels of Reverse Engineering
There are many subareas of reverse engineering. If it is at the same level as the original system, the operation is commonly known as redocumentation [13]. If on the other hand, the resulting product is at a higher level of abstraction, the operation is known as 'design recovery' or 'specification recovery' (Figure 2.1).Each abstraction level, reverse engineering technique used to record the abstraction through the redocumentation process.
Figure 2.1 Level of abstraction in a software system[13]
2.5.4 Reverse Engineering Methods, Techniques and Tools
This section discusses in detail the reverse engineering methods, techniques and tools available in the research context. There always confusion between the methods, techniques and tools in reverse engineering. According to [31] a method or techniqueis a solution to one or more known problems in reverse engineering, e.g., design recovery. It might be argued that a method is usually more related to a process which involves humans in the loop while a technique is usually more focused on the automated part of such a process. However IEEE 729 (IEEE 1983), do not allow to distinguish between method and technique. On the other hand, according to [31], a tool is used to support one or more methods.
Figure 2.2 : Sample of reverse engineering methods and tools [31]
Figure 2.2 show the organization of a necessarily non-exhaustive sampling of contributions available in reverse engineering, according to the separation between methods and tools described above. Each method is associated with a specific reverse engineering problem and can be further specialized according to the specific approach chosen to address the target reverse engineering problem. For example, slicing can be computed statically or dynamically, can be subjected to amorphous transformations, or can be determined for inputs satisfying given conditions.
Reverse engineering has been traditionally viewed as a two step process: information extraction and abstraction. Information extraction analyses the subject system artifacts to gather raw data, whereas abstraction creates user-oriented documents and views. For example, information extraction activities consist of extracting Control Flow Graphs (CFGs), metrics or facts from the source code. Abstraction outputs can be 16
design artifacts, traceability links, or business objects. Accordingly, Chikofsky and Cross outlined a basic structure for reverse engineering tools (Figure 2.3). The software product to be reversed is analyzed, and the results of this analysis are stored into an information base. Such information is then used by view composers to produce alternate views of the software product, such as metrics, graphics, reports, etc[1]. In the following subsection each component is described in more detail.
Most reverse engineering tools aim at obtaining abstractions, or different form
Most reverse engineering tools aim at obtaining abstractions, or different forms of representations, from software system implementations, although this is not a strict requirement: as a matter of fact, reverse engineering can be performed on any software artifact: requirement, design, code, test case, manual pages, etc. Reverse engineering approaches can have two broad objectives: redocumentation and design recovery. Redocumentation aims at producing/revising alternate views of a given artifact, at the same level of abstraction, e.g., pretty printing source code or visualizing CFGs. The detail explanation regarding redocumentation discussed in the following section. As defined by Biggerstaff [32], design recovery aims at recreating design abstractions from Figure 2.3: Reverse Engineering Tools Architecture [1] 17
the source code, existing documentation, experts knowledge and any other source of information.
2.6 Definitions of Redocumentation
Redocumentation is the creation or revision of a semantically equivalent representation within the same relative abstraction level[1]. The resulting forms of representation are usually considered alternate views (for example, dataflow, data structure, and control flow) intended for a human audience. Redocumentation is the simplest and oldest form of reverse engineering [2]. The re- prefix implies that the intent is to recover documentation about the subject system that existed or should have existed. Some common tools used to perform redocumentation are pretty printers (which display a code listing in an improved form), flowcharts (which create diagrams directly from code, reflecting control flow or code structure), and cross-reference listing generators. A key goal of these tools is to provide easier ways to visualize relationships among program components so we can recognize and follow paths clearly. Software redocumentation is part of software reengineering. While reengineering may involve additional activities like restructuring the code, retargeting, etc., redocumentation only recovers the understanding of the software and records it, and therefore makes future program comprehension easier. Program comprehension is the most expensive part of software maintenance, and therefore program redocumentation is the key to software maintainability[5]. The main goals of the redocumentation process are threefold[33]. Firstly, to create alternative views of the system so as to enhance understanding, for example the generation of a hierarchical data flow [34] or control flow diagram from source code. 18
Secondly, to improve current documentation. Ideally, such documentation should have been produced during the development of the system and updated as the system changed. This, unfortunately, is not usually the case. Thirdly, to generate documentation for a newly modified program. This is aimed at facilitating future maintenance work on the system preventive maintenance. Sometimes, the output of reverse engineering is thought to be the same as redocumentation. After all, when reverse engineering captures the design information from the legacy source code, the resulting information usually includes data flow diagrams, control flow charts, etc. The difference between redocumentation and reverse engineering is that the redocumentation usually generates system documentation according to a standard. For example, there are redocumentation tools that create documentation for DoD-STD-2167A - a DoD software documentation standard. A special case of redocumentation tools are reformatting tools. Otherwise known as pretty printers, reformatters make source code indentation, bolding, capitalization, etc. consistent thus making the source code more readable.
2.6.1 Redocumentation Process
Basically the redocumentation process using the reverse engineering architecture[1] to produce the documentation. Redocumentation process viewed as a knowledge rescue process[35] as shown in Figure 2.4. The process can be subdivided to three phases namely input, automation specialization process and output.
i. Software work product (SWP) SWP can be source code, configuration files, build scripts or auxiliary artifacts. Auxiliary artifact can be a data gathering, manuals, job control and graphic user interface which help to understand the source code. The author in [36], explain the approach for redocumentation process and emphasis on the important of the software work product to produce documentation for different type of information. Even in the [19], the author using the source from manual, source code and graphical user interface(GUI) to reverse engineer and remodel the system.
ii. Parser Parser used to extraction necessary information from SWP and store into the repository or system knowledge base. The important of the parser is to return the relevant information such as parser used in [37] to produce the documentaion. Most of the parser used to extract the information from specific language source
Figure 2.4: Redocumentation Process 20
code. As in [38], using the Haskell parser to extract the API or software library to show in HTML documentation. There are parsers also which can extract the information different type of languages such as Universal Report. It can extract from languages [39] such as Basic, C, C++, COBOL, Visual Basic .Net, Visual J ++and more.
iii. System Knowledge Based According to [40], knowledge based is a collection of simple fact and general rules representing some universe of discourse. The purpose of this component is to store extracted information from the SWP in order to describe the context of information. This component becomes heart of the system to allow the tools accessing the required information. In other words, all parts of the model able to access and organizes the information from the knowledge based. However it depends on whether the related information extracted by the knowledge based. There are researches only focus on the knowledge based to make sure the knowledge retrieved support the most for program understanding. In [41], Intelligent Program Editor (IPE) applies artificial intelligence to the task of manipulating and analyzing programs. IPE is a knowledge based tool to support program maintenance and development. It represents a deep understanding of program structure and typical tasks performed by programmers. It explicitly represents in the form of tree or graph of frames. In[42], Maintainers Assistant, a project from the Center for Software Maintenance at the University of Durham, is a code analysis tool intended to help programmers in understanding and restructuring programs. This is knowledge based system which uses program plans, or clichs, as building blocks from which algorithms are constructed. To comprehend a program, it is transformed into a composition of recognized plans.
iv. Knowledge processing
Basically the knowledge processing can be defined as a collection, acquisition, encoding, conversion, sharing, dissemination, application and innovation[43]. According to Kai Mertis and Peter Heisig[44], knowledge processes at least have the production, storage, transfer and application of the knowledge. Based on the definition given, knowledge process can be divided into four stages which are knowledge generation, knowledge storage, knowledge transfer and knowledge application[43]. Knowledge processing also can be sub divided into sub processes, which split each process into different stages. The main objective of knowledge process used to create the knowledge management for related activities. The output from the knowledge processing will be the knowledge representations or knowledge system when the process model is applied. In the redocumentation, the knowledge processing process involves an analysis of the source code so that it can be properly documented and various relationships implicit within the source code are recovered and revealed[35]. The processed knowledge represented in different type of form such as data modeling and procedure or function. The basic of knowledge processing applied in many reverse engineering tools such as Rigi using the RSF file as a repository and present the knowledge presentation in procedural. [45].
Generally, most of the redocumentation approaches and tools exist, emphasis on presenting the knowledge in the lexical basis which is more concern on extracting structural component rather than the meaning of the component. This issue will be discussed later in the section related to the knowledge representation. 22
v. Output Finally the processed knowledge presented in various form of documenting to the user ( developer, maintainer , software engineer or end user) such as directed graph, annotation, visualization, metrics or in documentation. The artifacts produced including software components such as procedures, modules, classes, subclasses, interface, dependencies among the component, control flow, composition. The produced output used for various purposes such as for designing[29], documenting[11],software evolution[7] or analyzing the software system[46]. However, the software documentation can be categorized into Textual and Graphics. Textual documentation ranges from inline prose written in an informal manner, to personalized views dynamically composed Textual documentation ranges from inline prose written in an informal manner, to personalized views dynamically composed from a document database[47]. A more flexible form of textual documentation is electronic, such as HTML or XML files, which permit activities such as automated indexing and the creation of hypertext links between document fragments [12, 39]. The least mature type of graphical documentation is a static image, which may use non-standard representations of software artifacts and relationships[48]. The most advanced graphical documents are editable by the user, better enabling them to create customized representations of the subject system. Graphical documentation relies on a variety of software visualization techniques to make complicated information easier for the maintainer to understand.
Unfortunately, most documentation is of low quality relative to such attributes, making its use in program understanding problematic[11] . This is due in part to the fact that software system documentation is usually generated in an ad hoc manner. There is little objective guidance for judging documentation quality or for improving the documentation process. The result is documentation 23
quality that is difficult to predict, challenging to assess, and that usually falls short of its potential. Therefore, [22] introduce the Document Maturity Model(DMM) to assess the document quality. The next section will describe the DMM in detail.
2.7 Document Quality
The Documentation Maturity Model (DMM) is specifically targeted towards assessing the quality of software system documentation used in aiding program understanding[22]. Product developers and technical writers produce such documentation during regular development lifecycles. It can also be recreated after the fact via reverse engineering, a process of source code analysis and virtual subsystem synthesis.
Document Quality assessed by number of criteria. The author in [22], specify the Key Product Attributes(KPA) which is efficiency, format(textual and graphical) and granularity.
i. Efficiency Efficiency refers to the level of direct support the documentation provides to the software engineer engaged in a program understanding task.
ii. Format Format refers to the type of document produced either in textual or graphic.
Textual Documentation from inline prose in an informal manner to personalized view.
Graphic Graphical form of the presentation on the software artifacts and relationships.
iii. Granularity Granularity refer to the level of abstraction describe the documentation. Each of the attributes measured based on 5 levels. Each attribute and the level shown in the Table 2-2:
Maturity Level Key Product Attributes (KPA) Format Granularity
Efficiency Text Graphic Level 1 Explain in low level functionality. No standard format and style Highly depend on the experience on the developer Static graph as a hardcopy-like image can be in format such as GIF or PDF and informal. Read only graphic Documentation at level of source code Comment on algorithm and source code Document generated manually and textual form and maintain along with the system. Level 2 Standard documentation and include also the developers Standard representation in the graphical form. One level above design patterns. Help developer Semi-automatic using reverse engineering. Static and reflect 25
own format Using a standard template documentation Using template such as UML. to understand high level rational. the system changes at the time of generation. Level 3 Hyperlinked add indirection to the text. Can be text, graphic or multimedia commentary Animated graphical documentation in visual manner User have little interaction High level design software architecture. Able to make changes based on understanding on system architecture. Dynamic and semi- automatically reflect the changes as long the developer direct the tools. Level 4 Contextual Documentation using tools support. Enhance the information on the context. Interactive and permit the user to navigate from one node to next level node. Can chase down the artifacts and relationships. Better response to user Capture the system requirements from the point of view of the user. Multiple level of abstraction
Automated and static but no need developer involvement. 26
feedback. Level 5 Personalized document for the reader Multiple view of the system Editable graphical documentation. Able to add new nodes and can be saved in the repository (if available). Product line documentation Capture the important information concerning the commonalities and variability in the product. Define the domain knowledge Fully automatic and completely dynamic. Can produce the necessary documentation on demand. Table 2-2: Key Product Attributes (KPA) and the description for each maturity level
2.8 Definition of Knowledge Representation
Knowledge representation can be defined as a field of study using formal symbol to represent a collection of propositions believe by some putative agent. The author also 27
emphasis that not necessary the symbol must represent all the propositions believed by agent. There may very well be an infinite number of propositions believed, only a finite number of which are ever represented[49]. Knowledge can be represent in the form such as XML, structured text file, document management system, meta object facility, production rules , frame based, tagging, semantic network and ontology. The fundamental goal of knowledge representation is to represent knowledge in a manner as to facilitate inferencing (i.e. drawing conclusions) from knowledge[50]. The reasoning tool used to extract the knowledge and present the required knowledge by the user.
2.9 Definition of Reasoning
Reasoning is the formal manipulation of the symbols representing a collection of believed propositions to produce representations of new ones. It is here that we use the fact that symbols are more accessible than the propositions they represent: They must be concrete enough that we can manipulate them (move them around, take them apart, copy them, string them together) in such a way as to construct representations of new propositions[49]. In [51], reasoning is an ideal reasoning system would produce all-and- only the correct answers to every possible query, produce answers that are as specific as possible, be expressive enough to permit any possible fact to be stored and any possible query to be asked, and be (time) efficient.
As example, in medical field the diagnostic system used as general fact about disease and also the about the patient, to determine the patient might have and the treatment can be provided to them. If transform this example in reasoning, the expert at compile time provide general information about the medicine. At run-time a user specified the situation and posed the questions. The reasoning will produce the appropriate answer based on the knowledge based which include specifics of this problem and the general background knowledge. These output not stored in knowledge 28
bases but compute whenever needed. In general reasoning can be identified as symbolic knowledge that used to pose queries and receive answer[51].
2.10 Definition of Ontology
Originally, Tom Gruber in 1993 defines ontology as a formal explicit of a shared conceptualization of a domain of interest. However, there are many other definition of ontology has been introduced by ontology researchers like Fensel, Hepp [52], Guarino & Giaretta and others. Information which is not structured, understandable only by human not by the computer. As example, there is no way to tell the computer this article about train unless it contains word train explicitly. To give some kind of intelligent, it must know the meaning of the document which is define as semantic. However only the structure not enough because it doesnt reflect the relations. It should also reflect the real world or its part which is called as domain model. Domain model can be achieved through conceptualization which simplified view of the world. An ontology can represent this domain effectively. According to [53], ontology are able to present the knowledge semantically the description of entities and their properties, relationships, and constraints. Ontologies are able to interweave human and computer understanding of symbols. There are three main contribution of ontology in knowledge representation. Firstly ontology allows communication between human, application system and human and application system. Secondly, it provides computational inference to represent, manipulate, analyze and implement the knowledge. Finally, it facilitates reuse and organization of the knowledge.
Ontology can be appears in graphical or formal visualization and machine processable serialization form. The earlier form is used for visualization purposes and 29
the latter is used for storage and processing using ontology language. For an example, in tourism industry, ontology can used to represent different domains of interest like travelling infrastructures (rental car, train, flight), geographic knowledge (location, destination, distance) and financial knowledge (currency, price, method of payment). Figure 2.6 illustrates graphical visualization for one of the domain interest; the geographic ontology.
Figure 2.5 : Geographic visualization for geographic ontology [54]
2.10.1 Generality of Ontologies
According to the literature in [55], generally there are three common layers of knowledge. On the basis of their levels of generality, these three layers correspond to three different types of ontologies, namely:
i. Generic (or top-level) ontologies, which capture general, domain independent knowledge (e.g. space and time). Examples are WordNet [56] and CYC [57].
Generic ontologies are shared by large numbers of people across different domains.
ii. Domain ontologies, which capture the knowledge in a specific domain. An example is UNSPSC 3 , which is a product classification scheme for vendors. Domain ontologies are shared by the stakeholder in a domain.
iii. Application ontologies, which capture the knowledge necessary for a specific application. An example could be an ontology representing the structure of a particular Web site. Arguably, application ontologies are not really ontologies, because they are not really shared.
2.10.2 Methodology for constructing Ontologies
Ontology construction is really time consuming and complicated task. To guide the process, so far there are no standard methods to be followed. However, there are many approaches have been proposed and used. In particular, the author in [58], using the following steps: definition of the ontology purpose, conceptualization ,validation[59], and finally coding. The conceptualization considered is the longest step and requires the definition of the scope of the ontology, definition of its concepts, and description of each one (through a glossary, specification of attributes, domain values, and constraints) which represents the knowledge modeling itself. In the former, a methodology using skeletal has been proposed[60]. It identifies five steps: (1) identify purpose and scope, (2) build the ontology, (3) evaluation, (4) documentation, and (5) guideline for each phases. Step (2) is divided further into ontology capture, coding and integration of existing ontologies.
2.10.3 Type of Ontology
Type of ontology can be categories in different type of category. Ontology can be categorizing in term generality which model the generic form for knowledge representation across various domains such as in [58]. The knowledge representation involve the application domain,system, skill of the maintainer and organization structure. There also task ontologies[61] that capture the knowledge task-related knowledge of the domain that the task is defined. However, there also method ontologies[62] which provide definitions of the relevant concepts and relations used to specify the reasoning process to understand the particular task. There also ontology categorize based on the knowledge representation such as frame ontology[63] which contain number of properties and these properties inherited by subclasses and instances.Most of the ontologies explained above, however can be placed under domain ontology. These are designed to a specific domain and applications defined within that domain.There also another type of ontology which is linguistic ontologies such as WordNet [64]. These usually have collection of term which led to another classification. These type classifications are called terminological ontologies.
2.10.4 Ontologies Languages
Ontology languages have started used since 1980s with knowledge representations systems such as KL-ONE [65] and CLASSIC[66]. However, only beginning 1990s, the system Ontolingua [67] which used for the development, management exchange of ontologies. It used internal Knowledge Interchange Format(KIF) but it able to work with other ontology languages such KL-ONE and CLASSIC. Later, ontologies began to used in World Wide Web to annotate Web pages using formal ontologies embedded in HTML documents. After that, the Ontobroker used 32
ontologies to formulate queries and derive answers. Later, Ontobroker used as a query language to retrieve the document based on notation. These languages become very important reference for current languages on the Semantic Web such as Resource Description Framework (RDF) and Web Ontology Language (OWL). RDF is the first language formed to represent information about resources in WWW. RDF is based on two main functionalities of identifying resources using URIs and describing resources in term of simple properties and property values [68]. RDF represents data using subject-predicate-object triples which also called as RDF triples or statements [68, 69]. The subject represents the resource, the predicate represents property of this resource while the object indicate value of this property [54]. Each of these subject, predicate and object is identified using URIs. RDF also incorporates graph structure to simplify the modeling of data representation. Figure 2.7 below illustrates RDF triples in graph model.
Figure 2.6 : RDF Triples in graph model
Using this triples, able to capture knowledge and metadata that available on the web. However, RDF only provides simple description about the resource using the properties and values [70]. There was a need for scheme which defines the vocabularies for the resources. The need for more expressive description has invokes creation of Resource Description Framework Scheme (RDFS). According to [54], RDFS is an extension to RDF which facilitates the formulation of vocabularies for RDF meta data. RDFS has introduces some basic (frame-based) ontological modeling primitives like classes, properties and instances and their hierarchies. The combination of RDF and RDFS is also referred as RDF(S). There are many tools have been introduced to support Predicate (Property of Resource) Subject (Resource) Object (value of property) 33
visual editing of RDF(S) descriptions such as Protg, WebODE, OntoEdit and KAON OI-Modeller. In a technical whitepaper, Oracle has explained that Oracle Spatial 10g supports RDFS description in database solution 1 . However RDF(S), is still consider as a simple ontological language with basic inference capabilities. There is a need for more vocabularies for describing properties to achieve interoperation between numerous schemas which extends the Semantic Web layer to Web Ontology Language (OWL). OWL has been declared as another Semantic Web Language by W3C in 2004. OWL is more expressive language than RDF(S) by providing ontology layer to existing RDF(S) [69]. It adds more vocabulary for describing properties and classes; for example relationship between classes, cardinality, equality and many other richer characteristic of properties. OWL is available in three varieties namely; OWL-Lite, OWL-DL and OWL- Full. Each of these varieties reflects different degree of expressiveness. Recently, Semantic Web Modeling Language (WSML) has been added as an ontology language for web. WSML focuses on Semantic Web Service and covers major aspects of different knowledge representation formalism [54]. A recent survey on Semantic Web in [71] reveals that, most popular ontology editors for Semantic Web are Protg, SWOOP, OntoStudio and Altova Semantic Works. Based on the same resource, many of the ontologists prefer OWL compare to RDF(S).
2.10.5 Application of Ontologies
Ontology is increasingly becoming an essential tool for solving problems in many research areas. In Semantic Web[55], the ontology used as to captured the knowledge on global scale. Ontology provide an explicit representation of the semantics underlying data, programs, pages, and other Web resources will enable a knowledge-based Web that provides a qualitatively new level of service and overcome many of the problems in knowledge sharing and reuse and in information integration. In database[72], ontology
repositories used to store and maintain the ontologies. The author believe that for agent and ontology technology to gain widespread acceptance, it should be possible for agents to directly access ontologies using a well-known protocol, and the information schema underlying an ontology repository should be easily accessible in a declarative format. In [73] , the author discussing the impact of ontology on information system which can be divided to temporal dimension and structural dimension. Other area such as electronic commerce, knowledge management, information retrieval, bioinformatics, software engineering[74], intelligent system, ontology-based brokering and software maintenance[58] also using ontology extensively.
2.11 Redocumentation Approaches and Tools This section enumerates some of the reverse engineering approaches and tools available in the literature.
2.11.1 XML Based Approach
XML based approach is one of the common redocumentation approaches used to generate the documentation. It contains structured information that extracts the content and the meaning of the documentation. XML reassemble from HTML to make it more useful for program documentation. By using XML the technical writer or software engineer can define their own format, such as <TASK>,<FILE>,<FUNCTION>, <VARIABLE>and <CONSTRANT>[11]. This feature helps to identify the implicit semantic to the document. The nature of XML shows the information in hierarchical help to understand the program more easily. It also validates the data captured from the program to make sure the data can be exchange between different software systems. One of the researcher [11], used the XML based to redocument the program. In the research, every level (inputs, automations specialized process, design models and maintenance histories[36]) in the redocumentation framework required minor techniques 35
and integrated together to produce high quality documents[11] . In the first level, SWP captured the data from source code and blended with other resources (manual, programmer, and software documents) to have more data sources. Following that, commercial or specific parser used to extract the structure in the extraction process. In the level 2, captured structure or data from various SWP merge into one repository to facilitate for knowledge processing. One of the important activities in the level 2 is to uncovering important information hidden in the gathered data[11]. Finally, in the level 3, generated documentation can be viewed in both textual and graphical representation. The output produced can be used back in the next iteration of data gathering phase to refine the information contained in the repository.
2.11.2 Model Oriented Redocumentation Approach
In [7], Model Oriented Redocumentation approach used to produce models from existing systems and generate the documentation based on the models. The basic concept used in this approach from the Model Driven Engineering (MDE). The main objective of the MDE is to raise the level of abstraction in program specification and increase the automation in program development. The MDE concept suited the redocumentation process, specifically to produce higher level of abstraction in the final documentation. Basically, the MDE concept merged with the Model Driven Architecture (MDA) in general and fastened with the Technological Spaces(TS) [75]. Figure 2.8 show the model oriented framework for redocumentation within MDE context. Initially legacy system is transformed into a stack of formal models. These formal models are written in a formal language and can be transformed into other TSs. Model information in different TSs is stored in Documentation Repository with a well defined meaning and can be used to produce the documentation in a uniform way. To support the framework, the tool 36
called Maintainer Integration Platform(MIP) developed and supported by Wide Spectrum Language (WSL) to present high and low level abstraction[7].
Figure 2.7: Model Oriented Framework [7]
The MIP tool used to collects information from different sources and uses the Transformation Engine to translate the assembler files into WSL files and transform/abstract into different models/views. The M-DOC generates the final documentation, which is browser based and easy navigation. The final documentation produced using different model and slice codes shown in Figure 2.9.
Figure 2.8: Final Documentation Produced using MIP tool [7]
2.11.3 Incremental Redocumentation Approach
One of the common issues in maintaining the system is to record the changes request by customer or user occur in the source code. Because change requests are unevenly distributed, teams having code ownership will overwork some programmers while leaving others underutilized. In a small team, maintenance management will suffer, wasting resources and draining profits. Incremental Redocumentation approach
used to rebuild the documentation incrementally once changes done by the programmer. Because change requests are unevenly distributed, teams having code ownership will overwork some programmers while leaving others underutilized. In a small team, maintenance management will suffer, wasting resources and draining profits[8].
Steps involved in the change process are listed below: i. Request for change ii. Understanding the current system iii. Localization of the change iv. Implementation of the change v. Ripple effects of the change vi. Verification vii. Redocumentation any changes recorded in the PAS tools
The change request is the first step in the change process. It is in plain English and usually originates with the customers, passing to the project team through the applications owners and distributors. During the planning phase, the team identifies the parts of the software affected by the change and assigns a particular programmer to implement the change. The programmer then implements the change and verifies its correctness. Finally, during the redocumentation phase, the program comprehension gained during the change is recorded in the appropriate partitions of PAS.
The partitioned annotations of software (PAS) serve as a notebook of the maintenance programmer, in which the programmer can record all of his understanding, whether it was arrived at in a top-down or bottom-up fashion, whether it is complete or partial, or even whether it is a confirmed or tentative[5]. Since the PAS is in hypertext in the style of World Wide Web, there is no need to limit the number of partitions or their contents, and a partition in which it can be recorded can always be found. From the global point of view, the PAS annotations are a matrix, where one coordinates is the constructs of the code, and the other coordinate is the selected partitions. There is a wide 39
variety of partitions which a programmer may want to use, with various levels of abstraction and various kinds of information. Hence, each entry in the matrix is the annotation for a specific construct, containing information of a specific kind. Among the partitions, domain partitions play a special role. Each program operates in a certain domain of application, and each application domain has ontology, consisting of essences and concepts of that domain. They have to be supported by the program, and they are important for the understanding the program.
For example, if the application domain is that of library, then the concepts supported are books, loans, customers, etc. These concepts in turn have to be supported by specific constructs in the code. A domain partition makes these constructs comprehensible. It is written in terms of the domain concepts only, and is understandable to both a programmer and to a user of the program. PAS tools can be divided to three separate categories: PAS browsers, PAS editors, and PAS generators. PAS browser focus on the browsing the document generated through PAS documentation. PAS editor support editing and updating PAS documentation. And the last component is the PAS generator used to parsing the existing source code and prepares the skeleton files for PAS documentation. One of the example tools used by PAS is HMS.
HMS is a PAS generator for programs written in object oriented language C++. Its input is a directory of C++source files and header files. The tool parses all files in the directory and creates a skeleton documentation file for each class. Skeleton documentation files are written in html and contain all information extractable from the code. They also contain additional empty spaces for any documentation the programmer may want to add. For each partition, there is section of the file with a header and a space for the documentation comments.
2.11.4 Island Grammar Approach
An island grammar consists of i. detailed productions for the language constructs we are specifically interested in (island)[76] ii. liberal productions catching all remaining constructs(water)[76]; and iii. a minimal set of general definitions covering the overall structure of a program. In simple terms, the full grammar of parts interested in, is the island. These surrounded by constructs which is not interested in, the water. These approach used in the redocumentation for extracting facts from a systems source code[37]. The simple example how the extraction shown in Figure 2.10.
Figure 2.9 : Example on using Island Grammar Approach[37]
Grammar definition language SDF [76] used as a parser to define the island grammar. Basically it will return parse tree in J ava object which encoded in aterm format[77]. The analysis results that are of interest can be written to a repository, and from there can be combined, queried and used in the rest of the documentation
generation process. The way of analyzing source code used common redocumentation process, in the sense that there is a chain of analysis, filter, and presentation events. In island grammar approach, filtering the data started during the first (analysis) phase, because the approach deal with those language constructs defined in the island grammar. The output can be abstracted in different layer depends on the documentation requirement. The author in [37], using Cobol system to generate hierarchy associated with documentation requirements. On the other hand, author in [78], explain in detail the supporting tool for island grammar approach called Mangrove.
2.11.5 DocLike Modularized Graph Approach
According to [79], DocLike Modularized Graph(DMG) approach presents the software architectures of a reverse engineered subject system graphically in a modularized and standardized document like manner and to visualize the software abstraction. There are many other approaches using structural documentation[46] and assume current views and textual descriptions can be parts of structural re- documentation without the need to be transformed into word processor. But according to [79], DMG provides document-like software visualization and re-documentation impart the template of software design documentation that enables users to directly update the description of each section in Description Panel and its associated graph that is generated automatically as shown in Figure 2.10. The Content Panel focus on the representation of modules and associated graph is displayed accordingly in Graph Panel. Description Panel used to describe the associated section manually.
Figure 2.10 DocLike Viewer Prototype Tool[6]
The DocLike Viewer Prototype Tool system can be expressed using the redocumentation framework as show in Figure 2.10. As an input, they used C source code and using the existing parser provided by Rigi. In the later section the author going to discuss on this tool in details. DocLike Viewer using the existing storage provided by Rigi and filters the data by selecting only required information to be visualized in DocLike Viewer.
2.12 Redocumentation Tools
In this section we briefly describe some of the redocumentation tools that are designed to assist the maintainer task.
2.12.1 Rigi
Rigi was designed to address three of the most difficult problems in the area of programming-in-the-large: the mastery of the structural complexity of large software systems, the effective presentation of development information, and the definition of procedures for checking and maintaining the completeness, consistency, and traceability of system descriptions. Thus, the major objective of Rigi is to effectively represent and manipulate the building blocks of a software system and their myriad dependencies, thereby aiding the development phases of the project[80]. Rigi is one of the tools developed for source code abstraction in text file and not really for redocumentation. But it is become fundamental tool for most of the researcher to used Rigi architecture to generate documentation.
Rigi using a reverse engineering approach that models the system by extracting artifacts from the information space, organizing them into higher level abstractions, and presenting the model graphically[48]. There are three types of methodology used in Rigi, which are rigireverse, rigiserver and rigiedit[46]. Rigireverse is a parsing system that supports common imperative programming languages (C and Cobol) and a parse for LaTex, to analyze documentation[46]. Rigiserver is a repository to store the information extracted from the source code and rigiedit is an interactive, window-oriented graph editor to manipulate program representations.
In Rigi, the first phase involves parsing the source code and storing the extracted artifacts in the repository in Rigi Standard Format (RSF) file. There are two types of Rigi Standard Format (RSF) files. First is an unstructured Rigi Standard Format (RSF) file which may contain duplicate tuples. The other is structured Rigi Standard Format (RSF) files used for displaying graphical architecture. Files contain information about the nodes (e.g., functions, variables, data structures, etc.) and arcs (e.g., function to function calls, or function to the variable calls). There is a tool called sortrsf which converts unstructured file into structured file. Once a C programming input file is being parse by the Rigi parser, a RSF file is produce and it is send to Rigi editor for further processing. Rigi editor is a graph editor and its user interface is based on windows, menus, color and mouse pointer. It is programmable using a scripting language called Tcl [15]. Rigi editor used for architecture display, traverse, and modify the graphical model. This produces a flat resource-flow graph of the software representing the function calls and data accesses. The second phase involves cluster functions into subsystems according to rules and principles of software modularity to generate multiple views called Simple Hierarchical MultiPerspective view[9], layered hierarchies for higher level abstractions. The Rigi editor assists the designers, programmers, integrators, and maintainers in defining, manipulating, exploring, and understanding, the structure of large, integrated, evolving software systems[80]. Rigi also able to import the schema from Columbus tool [47] and view the representation using rigi graph visualizer. This features support the Columbus tool to view graph based on the Columbus Schema, a call-graph and UML class diagram like graph.
2.12.2 Haddock Tool for Haskell Documentation
Haddock Tool is a tool for generating documentation from the source code automatically. Haddock primarily focus on generating the library documentation from 45
the Haskell source code. According to [38], the major reason for generating library documentation listed below: i. Having interpreted the API from the source code, a documentation tool can automatically cross-reference the documentation it produces. ii. The code already has documentation in the form of comments and can be used in document generation. iii. There are minimum possibilities of non synchronization between the source code and the documentation. Currently the only fully supported output format is HTML, although there is a partial implementation of a DocBook (SGML) back-end. The knowledge capture in HTML consists of: i. A root document, which lists all the modules in the documentation on (this may be a subset of the modules actually processed, as some of the modules may be hidden; see Section 5.3). If a hierarchical module structure is being used, then indentation is used to show the module structure. ii. An HTML page for each module, giving the definitions of each of the entities exported by that module. iii. A full index for the set of modules, which links each type, class, and function name to each of the modules that exports it. Haddock using general Haskell parser distributed by GHC and extend the parser by adding abstract syntax to include documentation annotations. Persistent schemes or interface used to store the libraries before extracted to the documentation in HTML form. There are few other tools can be categorized under API redocumentation are J avadoc [81], Doxygen[82] and Caml System [83]. J avadoc and Doxygen have been using in large commercial system and the features of the system improved time to time to full fill the current needs.
2.12.3 Scribble Tool
Scribble focus on generating library documentation, user guides and tutorials for PLT scheme. It combines all of these threads producing a scribble language or tool that spans and integrates document categories. Scribble was built using PLT Scheme technology, which is an innovative programming language that builds on a rich academic and practical tradition[84]. It is suitable for implementation tasks ranging from scripting to application development, including GUIs, web services and supports the creation of new programming languages through a rich, expressive syntax system. The features in PLT schemes help to develop Scribble system more easily and Scribble just an extension of the PLT schema. So, the main input and the parser in the documentation process is the PLT Scheme itself. Central PLanet package repository used to store the libraries. The final output produced in HTML form which consists of libraries with the guides and tutorials. In fundamentals, the basic concept is builds on Scheme to construct representations of documents using Scheme functions and macros, and it uses an extension of Scheme syntax to make it more suitable for working with literal text.
2.12.4 Universal Report
Universal Report is a tool used to analysis the source code and documents the software system. The main objective of this tool is to analyse and generate the structured and well formatted document of a given program. The biggest advantage of this tool compare to other tools is it can generate documentation from various type of languages such as C++. Visual Basic, Ada, Cobol, Fortran, Java, Assembler,Perl, PHP, Python and many others (Figure 2.11).
Figure 2.11 Programming Language Supported by Universal Report
It using pattern matching algorithm and compilation techniques to extract the information from the source code and generate the documentation in HTML, Latex and plain text files[39]. The HTML output have a lot of features includes search the script for text over the entire documentation, an online commenting and annotating system, a dynamic flowchart, routine call graph, screenshots from form files , detailed analysis and dynamic compositition of each routine. In addition, the Universal Report also can read the database files and generate the detailed report of the structure and elements such as table, fields and reports. However, the featured in Universal Report tool emphasis redocumentation of source code in the implementation level only. It doesnt focus on higher abstraction level such as design or specification level. 48
2.13 Summary of Redocumentation approaches and tools The Table 2-3 and Table 2-4 summarize the approaches and tools based on the redocumentation approaches.
Approaches Input Parser Knowledge Representation Schema Output XML Based Approach Source Code, Existing documents, Users Commercial Parsing system XML Textual and graphical Documentation (Cross reference, system overview) for different type of stakeholder. Model Oriented Approach Source Code WSL (Wide Spectrum Language) Meta Object Facility (MOF) Tree like view Documentation (Call Graph, Flow chart, Class Diagram, Activity Diagram) Incremental Redocumentation Approach c++source files and header HMS Parser Database Document consist of partition of skeleton 49
DocLike Modularized Graph C source code rigi(Parser), Library of Procedures Structural Documentation Updateable Document, Visualize through directed graph Island grammar approach Source Code,Hand- written documentation SDF (Syntax Definition Formalism)
Graphs representation HTML environment using structural documentation Table 2-3 : Summary of Redocumentation Approaches
Tools Input Parser Knowledge Representation Output Rigi C/C++and cobol rigi parser,rigi editor Directed and undirected graph layout (nested graph) Structure Text File - RSF files(rigi standard format) Columbus C++ Extractor_CPP.d ll(CAN.exe and CANPP.exe) using Columbus parser Columbus schema(Abstract Syntax Graph) HTML,CPPML, GXL, UML XMI and FAMIX XMI Haddock Haskell Source Code Haskell Parser Document Management System Structural Document by Module in HTML form Scribble Tool PLT Schema Scribble Parser Schema strings syntax HTML Documentation consists of library with user guide and tutorials 50
Table 2-4: Summary of Redocumentation Tools Universal Report Multi Programming Source Code Universal Report Parser Component Diagram, Report, Routine Diagram Latex, HTML and plain text formats
3 COMPARATIVE EVALUATION OF THE STATE OF ART APPROACHES This chapter compares the redocumentation approaches and tools that described in chapter 2 with respect to the following benchmarks. The results can be seen in
3.1 Analysis Criteria In the next this section we will discuss the criteria used to evaluate the approaches and the tools in detailed. Basically the approaches will be evaluated using two categories as follow: Document Quality Attributes Knowledge Representation Criteria
3.1.1 Document Quality The main objective to produce documentation is to enhance the understanding on the source code. The DMM maturity model used to measure the approaches and tools to 52
achieve this objective. Each of the attributes measured based on 5 levels. Table 3-1 and 3-2 shows the summary of the evaluations for the redocumentation approaches and tools.
Approaches Benchmarks Format Granularity Efficiency Text Graphic XML Based Approach 3 3 3 2 Model Oriented Approach 2 3 3 2 Incremental Approach 2 - 2 3 Island Grammar Approach 3 1 3 2 DocLike Modularized Graph Approach 3 3 3 2 Table 3-1: Comparing redocumentation approaches
Table 3-3 and Table 3-4 shows the quality level achieved by the approaches and tools to generate the documentation. The strength of existing approaches and tools emphasis on different levels in redocumentation process. Example model oriented approach shows the highest maturity level for the granularity criteria. The Rigi tools criteria shows highest for the graphics and Scribble tool shows highest for efficiency. However, XML approaches shows in moderate level and balance for each criteria except for efficiency. Based on the analysis above, it shows that the quality of the documents produced it depends on the emphasis on redocumentation process level. Like Rigi the emphasis for visualization (output), Island grammer approach for extracting syntatic structure of the code (parser) and Model Oriented for software evolution (knowledge based).
3.1.2 Knowledge Representation Criteria Knowledge presentation criteria used to evaluate the redocumentation approaches and tools. These following criteria used to evaluate the storage management capability which is the heart of the redocumentation process. i. Effectiveness Knowledge can be presented according to different type of users or different type of projects. In other words, user able to search and choose the information needed. As example, the presentation able to view from different group of users such as developer, maintainer, software engineer and customer or different projects such as procedural or object-oriented development.
ii. Efficiency Able to search and find the related knowledge as the time needed. Especially for the software developer, when searching for certain knowledge.
iii. Explicitness The representation schema able to show more detailed description related to knowledge selected. As example, if the representation schema shows the classes, the user able to view in detail its properties, implementation and the subclasses. The main reason for having this criteria is to able the user identify the risk if do any changes in the system.
iv. Accessibility Able to perform in different environment and can be accessible through electronic and print out copies. These criteria appropriate for the tools used for redocumentation. As example, object oriented can be shown in different application such as in HTML or specific software ( Rational Rose or Visio).
v. Modifiability Able to modify the current representation schema for the software improvement and able to validate any changes happen.
vi. Understandability Able to understand the representation schema. In other words, the main purpose of these criteria is to make sure that the user understands the output of the presentation. As example, the representation schema able to describe the functions produced.
Table 3-3 : The Evaluation using Knowledge Representation Criteria for Redocumentation Approaches
The Table 3-3 shows the evaluation on the redocumentation approaches based on the knowledge representation criteria.
Approach Criteria XML Based Approach Model Oriented Incremental Island Grammer DocLike Modularized Graph Effectiveness High High Medium Medium Medium Efficiency Medium Low Low Medium Low Explicitness Medium High Medium Medium Medium Accessibility High Low Medium Medium Medium Modifiability Medium High Low Low Low Understandability High High Medium Medium Medium 56
Table 3-4 : The evaluation using Knowledge Representation Criteria for Redocumentation Tools
The detail explanation on the evaluation above discuss in detail on the following section.
3.2 Discussion
This chapter and previous chapter have aimed to provide an overview and compare recent progress in redocumentation. These approaches and tools evaluated using two criteria. First, evaluation done on the redocumentation approaches and tools based on the document quality suggested by [22] as shown in Table 3-1 and 3-2. The main objective of the evaluation is to measure the quality of the document produced to enhance the program understanding. Second the evaluation done based on knowledge representation for redocumentation approaches and tools as shown in Table 3-3 and 3-4. The objective is to analyze the knowledge representation level in order to generate the software redocumentation. Tools Criteria Rigi Haddock Tool Scribble Tool Columbus Universal Report Effectiveness Low Medium Low Low Medium Efficiency Low High Medium Low Medium Explicitness Medium Low Low Medium Low Accessibility Low High High Low High Modifiability Low Low Medium Low Medium Understandabili ty Low High Low Low High 57
For the first criteria the measurement based on the format, granularity and efficiency contribute to identify the level of document produced for program understanding. Among the approaches, XML based approach are very common approach used nowadays. The advantage of this approach compare to other approaches is it can create different type of view according to the user needs. However the abstraction is in medium level and it will be difficult to show semantically related knowledge of a same domain. Model Oriented approach useful for system evolutionary because it able to show the exact characteristics as the original system or the domain level. The maintainer has a better view and understanding to handle maintenance task. Island grammar approach used at parser level and it contribute to speed up the extraction process and concentrate on the data analysis for the documentation. Based on the tools the granularity still considered low and medium level. Most of the tools develop only until the level of view the software architecture but not on the requirement level of the system. The text format documentation produced by the all the approaches and tools still in medium level which is the most document created are in the hyperlink such as XML based approach. On graphic, Rigi one of tool which helps to view the graphic form of the software component and able to navigate to certain extends. However, other approaches and tools emphasis only on viewing the graphics but not able to navigate using graphic. In term of efficiency the haddock and scribble tool are able to automate the redocumentation process compare to other approaches and tools. However the automation process is easy because it involve the low level abstraction. The overall knowledge representation for model oriented approach shows better compare to other approaches because it able to use the knowledge based to create higher level granularity. However the island grammar approach, Scribble tool and Haddock tool more efficient compare to other approaches. Because the solutions more emphasis on low level abstraction and useful for the programmer or developer to understand the program to manage the maintenance work. The main problem with most of these approaches and tools for redocumentation are the granularity level are low and limit the understanding on the domain knowledge. 58
The model oriented approach try to solve, the efficiency level are low and not able to search the information as needed. Furthermore, these approaches and tools limit the representation switches using schema languages and no general redocumentation process which can use for several scenarios.
3.3 Related Work
There are huge amount of reverse engineering approaches and tools available which developed for research based[47] and as for commercial [85]. The research goals are focus on small problems and emphasize on the low level abstraction that exactly match the structure for recognition. The researches on producing high level abstraction of the software component are less in reverse engineering. Most related work is the development of document management and retrieval tool to provide semi-automatic metadata generator and ontology based search engine for electronic documents. RDFS or OWL used to describing the data[86]. This tool generally emphasis on generating ontology for personal document and not for producing high granularity software components. The system architecture only produces the result in Listview and not as a standard documentation. Other related work is transforming the legacy schema to ontology using meta- framework approach. As describe by Manual Winner, this meta-framework can generate the ontology to improved structures and semantics compare to the original legacy schemas[87]. To produce semantic enrichment of the schemas, this method transforms the legacy scheme using heuristic and refactoring to improve the design and precision of the schemas. However, this method emphasis only on the knowledge base and output level and did not emphasis on the important of the SWP that can help enrich the metadata to generate ontology. In addition, the Meta framework method not emphasis on producing standard documentation or ontology to describe the artifacts in meaningful way.
4 CHAPTER 4 RESEARCH METHODOLOGY In this chapter, the methodology of the proposed approach will be described in principle. This includes the research procedure, the operational framework, the assumptions and limitations, and the schedule of the research.
4.1 Research Design and Procedure
In [88],the authors propose, as hypothesis, a classification of the research problems of the software engineering discipline in: Engineering problems, concerned with the construction of new objects such as methodologies, approaches, techniques, etc; Scientific problems, concerned with the study of the existing object such as algorithmic complexity, software metrics, testing techniques, etc. In the case of Engineering, it would be necessary to study existing methodologies, reflecting on them to determine their advantages and disadvantages and proposing a new one, which, while retaining the 60
advantages of the methodologies studied, would, as far as possible, lack their shortcomings.
This research aimed to develop a new redocumentation approach using ontology. Thus, producing this approach is an engineering problem and this research will focus on engineering design such as modeling, constructing, and evaluating the new object. In Figure 4-1, the steps of the research procedure have been shown. Some existing algorithms or approaches may need to be revised and enhanced in order to achieve the objectives of the study. In Figure 4-2, the flow chart of the research activities has been shown.
Figure 4.1 : Research Procedure
Figure 4.2 : Research Flow Chart 64
4.2 Operational Framework
Table 4-1 : Operational Framework No Research Question Objective Activity Deliverable(s) 1. What is redocumentation process and component in the process? How the artifact presented in documentation? To investigate knowledge representation concept in redocumentation process Literature Study Result of literature review 2. Why the existing approaches for redocumentation process are still not able to satisfy all users requirements? To evaluate all of current redocumentation approaches Literature study Comparative evaluation of current approaches Results of comparative evaluation 3. How to show semantic representation in software documentation to understand software system? To present new approach based on ontology
Building a model for redocumentation process
Designing an algorithm to improve existing redocumentation process The redocumentation process model
Source code of the algorithm To develop composer tool that supports the proposed approach based on ontology Designing the prototype tool Coding Integrating the tool Design documentation Source code Executable tool 4. How to validate the usefulness of the proposed approach to support redocumentation? To evaluate effectiveness of the redocumentation approach based on proposed benchmark Analyzing the results Analysis results 65
4.3 Assumptions and Limitation
A software prototype of the proposed approach as a reasoning tool will be developed to support the implementation of the proposed redocumentation approach. Even though the approach can be used for any type of system environment but for this research to prove the concept it decided to develop an environment to which can support the knowledge acquisition and present it in the form of document. It should create a system environment that allows the user to toggle the application between system documentation and tool.
4.4 Assumptions and Limitations
This research has following assumptions and limitations:
i. The study assumes that the repository consistently updated with current software system otherwise the result would be less effective. To ensure this to occur, a case study should be selected from recent software system. ii. The system assume that the artifact extracted from source code and database are extracted the entire component accordingly. Existing approach can be used to make achieve this requirement. iii. The system assume that the database used as source only deal with relational database system. iv. The system assumes that maintainer or system users have knowledge to perform reasoning for enable them search and select related knowledge from the ontology. v. The artifacts extracted can be either object-oriented or structural program based software.
This chapter explained the research methodology and procedure to carry out for the research. The research work carried out based on software disciplines and practice that relate software technology including know-how and tools to support real world environment. Some justification of planning research against it initial objectives and issues lead to plan for some validation approach. Some assumptions of this research were laid out to determine proper actions to take prior to its implementation.
This section explains the proposed framework for redocumentation and the components of the framework as the preliminary findings of the research.
5.2 Definition of the Proposed Framework
The aim of this research is to redocument the source code with other auxiliary artifacts using ontology and show the output using a standard documentation. Some criteria from XML based approach and model approach combine in the proposed framework which it has own strength in receiving input, viewing and create domain application. Ontology uses the existing schemas in the previous approaches and generates the high vocabulary for the domain application in the form of machine- 70
readable artifact. The figure 5.1 shows the proposed framework for redocumentation process.
Figure 5.1: The Proposed of Redocumentation Process Framework
The proposed framework consists of three levels which are inputs, knowledge automation process and outputs. Each level describe in detail in the following section.
5.2.1 Input
The input level consists of two components namely: source and the parser. In the following each of them explained: a. Identifying Source of Data and Knowledge In the input process, the source is related to problem interest is available to process. Some of the sources listed are: Source code Database Documentation Maintainer, Developers, Users As a next stage the source available which is source code, database, user manual and maintainer will describe in detail how to retrieve the knowledge from each of these sources.
i. Source code
Source code is one of the most reliable components to retrieve the correct understanding about the program. However changes in the program also need to put in the consideration to make sure retrieved knowledge is the latest one. Even though most of the medium or large companies practices by keep a record on changes done of the source code, but the information not stored in the repository or machine learning tool. Some technique needs to be used to keep a record on the file being changed especially during the maintenance process. Incremental 72
redocumentation approach has been discussed in the Chapter 2 which helps to redocument the changes in the source code. The steps in this approach can be used to identify the changes during the mini maintenance process.
ii. Database
Database is another important source which helps to retrieve the related knowledge for the application domain. There are 3 major resources in the database can use for extraction which is the data dictionary, data and also the relationship between relation. With the query on data and system table able to find all the 3 resources above.
iii. Documentation
Documentation such as system documentation and user manual are the available components which help to extract high level component. Software documents follow the proper document structure which organizes in the chapter, sections and sub sections standard format will be used to extract the knowledge. IEEE and military standard documents are the sample document can used for these purposes.
iv. Software Expert Domains
Software Expert Domains such as maintainer or developer able to give input for higher level abstraction level such software specification and requirements. Most of the organization the knowledge software experts keep their valuable knowledge related to application domain in their mind. Survey and interview technique can be used to retrieve this valuable knowledge. Collected data will be transforming to repository for analysis. 73
b. Parser
The next component in the proposed framework is the parser. The parser will be developed for the extraction of the component from the source code. The existing parser Island grammar approach will be used to reduce the development time for parser and return the relevant information. The Syntax Definition Formalism (SDF) can be used to define the island grammar. Using SDF has a few advantages such as allow for concise and natural definition of syntax and promote the modularization of a syntax definition. The SDF extract the related artifacts and relationship from the source code. To extract related data from the database, manual analysis using the data dictionary provided by the relational database will be used.
5.2.2 Knowledge automation process
There are three components in knowledge automation process which is metadata, ontology storage and reasoning tool. Each of this component describe in detail in the following section. The main concern on this level is to transform the knowledge stored in the schema to ontology. Next to support the user through reasoning tool to generate a better structure of semantic schema and provide efficient knowledge representation.
a. Metadata
The source retrieved can be either structured or unstructured data. However, in order to change the metadata to ontology automatically, the data must be in structured manner. Therefore if the unstructured data retrieved from the source, it must be transformed to structured data. In the metadata there are 2 major 74
components namely: standard documentation template metadata and the source metadata. Standard documentation template need to be defined by the user manually according to the standard document available in software development such as IEEE Standard 830-1998 and MIL-STD-498. Each sub section in the standard document will be defined in the metadata. The user will be provide with the tools to allow the user flexibly define the metadata. The source metadata consist with data retrieve from different sources. The database and source code will be handled by the parser to extract the relevant component. However sources such as developer, user manual or graphical user interface will be define manually by the user in the source metadata using the interface module given. The main tasks of the source metadata are to save and manage the extracted component. The most flexible metadata will be used is XML schema to wrap all the source and standard documentation in structured data.
b. Ontology
This is the heart of the framework to mapping the XML schema to ontology. Based on the ontology methodologies, the purpose will be defined as a first step. The purpose will define the knowledge which is relevant to the domain of interest. The conceptualization step needs to identify basic attributes and competency questions. It structures the domain knowledge as meaningful models at the knowledge level either from scratch or by reusing existing models. The competency questions identify the requirement that ontology must answer. This steps leads to a preliminary ontology containing the description of kind, relations and properties.
The next step will be the formalization which will transform the metadata to the formal or semi-computable model. The formal description language will be used to describe the ontology and will be integrated with the interface model. 75
The implementation step builds computable model in an ontology language. Final step is to validate and refactor the created ontology to make sure the quality and usefulness of the concept for particular domain achieved. Web Ontology Language (OWL) will be used to define, describe and express the ontology. The steps describe above will be implemented in creating the ontology for standard documentation and the domain ontology. Even though the ontology for the standard documentation and domain ontology are different, it is possible to surpass from one to another by using the artifacts as bridge. The related artifacts in domain ontology mapped to the section of standard document. The maintainer will be given features to query and create the intersection between both ontologies. Next section will describe this feature in detail.
c. Ontology Reasoner
The maintainer will be given a tool with ontology inference mechanism capability to search and query the conceptual and semantic level to access the knowledge that is not presented explicitly. This is important features for the proposed approach compare to keyword-based searching. The proposed approach makes use the existing capability of ontology using inference rules to implement the intelligent retrieval and improve the rationality and usability of the results. Formalization of such query allows the maintainer integrate both ontology to produce the standard documentation containing the simplified view of software domain model.
5.2.3 Output
The output consists with the interface that used to refine raw or semantically enriched standard documentation. Based on the domain ontology produced and 76
query from the reasoning tool, the standard documentation can be produced such as MIL-Std-498 standard system documentation. The result from the ontology can be mapped to predefine component in the standard document structure such as component description, use case diagram and descriptions, possible requirement specification, data flow graph for specific use case, relationship among component using semantic relationship from the domain point of view. The interface module will be equipped with the functions such as metadata management module, ontology creator and browser, query editor and documentation viewer. The table below shows the description for each interface module. Table 5-1 : User Interface Description
5.3 Expected Findings
No. Interface Description 1. Metadata Management Module Provide semi-automatic mechanism to create the metadata summarizing the artifacts received from the parser and the other source. However other source will be manually input the metadata. 2. Ontology Creator and Browser Ontology creator accomplished with the generator engine that read the metadata in order to generate the ontology for application domain. 3. Query Editor Provides the ontology based search or inference mechanism to allow the user to process the knowledge source required. 4. Documentation Management Module Integrate with the query editor to extract the knowledge source from the domain ontology and map with the standard documentation. 77
This research is expected to present a new approach on redocumentation for ease software maintenance. The expected finding includes: i. A semi-automated redocumentation process approach based on ontology. ii. A supporting prototype tool with capabilities of discovers and combines the ontology component to produce documentation. iii. An evaluation of the result of the proposed approach with other current approaches to determine the effectiveness of the approach.
1. Elliot, J .C. and H.C. J ames, II, Reverse Engineering and Design Recovery: A Taxonomy. IEEE Softw., 1990. 7(1): p. 13-17. 2. Sergio Cozzetti, B.d.S., et al., A study of the documentation essential to software maintenance, in Proceedings of the 23rd annual international conference on Design of communication: documenting \& designing for pervasive information. 2005, ACM: Coventry, United Kingdom. 3. Tilley, S., Three Challenges in Program Redocumentation for Distributed Systems. 2008 4. Hausi, A.M., et al., Understanding software systems using reverse engineering technology perspectives from the Rigi project, in Proceedings of the 1993 conference of the Centre for Advanced Studies on Collaborative research: software engineering - Volume 1. 1993, IBM Press: Toronto, Ontario, Canada. 5. Vaclav, R., Incremental Redocumentation with Hypertext, in Proceedings of the 1st Euromicro Working Conference on Software Maintenance and Reengineering (CSMR '97). 1997, IEEE Computer Society. 6. Shahida Sulaiman, N.B.I., Shamsul Sahibuddin,Sarina Sulaiman, Re- documenting, Visualizing and Understanding Software System Using DocLike Viewer, in Proceedings of the Tenth Asia-Pacific Software Engineering Conference (APSEC03). 2003, IEEE. 7. Feng, C. and Y. Hongji. Model Oriented Evolutionary Redocumentation. in Computer Software and Applications Conference, 2007. COMPSAC 2007. 31st Annual International. 2007. 8. Rajlich, V., Incremental redocumentation using the Web. Software, IEEE, 2000. 17(5): p. 102-106. 9. Margaret-Anne, D.S., et al., Rigi: a visualization environment for reverse engineering, in Proceedings of the 19th international conference on Software engineering. 1997, ACM: Boston, Massachusetts, United States. 10. Matthew, F., B. Eli, and F. Robert Bruce, Scribble: closing the book on ad hoc documentation tools, in Proceedings of the 14th ACM SIGPLAN international conference on Functional programming. 2009, ACM: Edinburgh, Scotland. 11. J ochen, H., H. Shihong, and T. Scott, Documenting software systems with views II: an integrated approach based on XML, in Proceedings of the 19th annual international conference on Computer documentation. 2001, ACM Press: Sante Fe, New Mexico, USA. 12. Scott, T. and H. Shihong, Documenting software systems with views III: towards a task-oriented classification of program visualization techniques, in Proceedings of the 20th annual international conference on Computer documentation. 2002, ACM Press: Toronto, Ontario, Canada. 13. Takang, P.G.A.A., SOFTWARE MAINTENANCE CONCEPT AND PRACTICE. 2nd. Edition ed. 2003: World Scientific Publishing Co. Pte. Ltd. 79
14. Margaret-Anne, S., Theories, tools and research methods in program comprehension: past, present and future. Software Quality Control, 2006. 14(3): p. 187-208. 15. Edelstein, D.V., Report on the IEEE STD 1219-1993- Standard for software maintenance. SIGSOFT Softw. Eng. Notes, 1993. 18(4): p. 94-95. 16. Keyes, J ., Software Engineering Handbook. 2005: A CRC Press Company. 17. McDermid, J ., Software engineer's reference book. illustrated, reprint ed, ed. J . McDermid. 1991: CRC Press, 1993. 18. M. P. Ward , K.H.B., Formal Methods for Legacy Systems. J . Software Maintenance: Research and Practice, 1995. 7: p. 203--219. 19. KHUZZAN, S.B.A.B.S., e-Fiber : Reengineering of a Legacy System. 2010, UNIVERSITI TEKNOLOGI MALAYSIA. 20. Forward, A., Software Documentation Building and Maintaining Artefacts of Communication, in Ottawa-Carleton Institute for Computer Science. 2002, University of Ottawa: Canada. p. 167. 21. Andrew, F. and C.L. Timothy, The relevance of software documentation, tools and technologies: a survey, in Proceedings of the 2002 ACM symposium on Document engineering. 2002, ACM: McLean, Virginia, USA. 22. Shihong, H. and T. Scott, Towards a documentation maturity model, in Proceedings of the 21st annual international conference on Documentation. 2003, ACM Press: San Francisco, CA, USA. 23. Thomas, V., N. Kurt, and rmark, Maintaining program understanding: issues, tools, and future directions. Nordic J . of Computing, 2004. 11(3): p. 303-320. 24. Sommerville, I. Software Documentation. 7/11/01 7/11/01 [cited; Available from: 25. Corp., I. UML Resource Center. [cited; Available from: 26. IEEE Standard for Software Maintenance. IEEE Std 1219-1998, 1998: p. i. 27. Robert, M.F. and M. Malcolm, Redocumentation for the maintenance of software, in Proceedings of the 30th annual Southeast regional conference. 1992, ACM Press: Raleigh, North Carolina. 28. William, J .P. and R.B. Michael, An approach for reverse engineering of relational databases. Commun. ACM, 1994. 37(5): p. 42-ff. 29. Tim, T. and T. Scott, Documenting software systems with views V: towards visual documentation of design patterns as an aid to program understanding, in Proceedings of the 25th annual ACM international conference on Design of communication. 2007, ACM: El Paso, Texas, USA. 30. Giuliano Antoniol, G.C., Gerardo Casazza,Andrea De Lucia,Ettore Merlo, Recovering Traceability Links between Code and Documentation 2002. 31. Paolo Tonella, M.T., Bart Du Bois,Tarja Systa, ed. Empirical studies in reverse engineering: state of the art and future trends. ed. D.M. Berry. 2007, Springer Science. 32. Biggerstaff, T.J ., Design recovery for maintenance and reuse. Computer, 1989. 22(7): p. 36-49. 80
33. K.Lano, H.H., Reverse Engineering and Software Maintenance : A Practical Approach. 1994: Mc-Graw Hill, London. 34. Benedusi, P., A. Cimitile, and U. De Carlini. A reverse engineering methodology to reconstruct hierarchical data flow diagrams for software maintenance. in Software Maintenance, 1989., Proceedings., Conference on. 1989. 35. Inc., B.S. Reverse Engineering. 2010 [cited; Available from: http://bus- 36. Shihong, H., et al. Adoption-Centric Software Maintenance Process Improvement via Information Integration. in Software Technology and Engineering Practice, 2005. 13th IEEE International Workshop on. 2005. 37. Arie van, D. and K. Tobias, Building Documentation Generators, in Proceedings of the IEEE International Conference on Software Maintenance. 1999, IEEE Computer Society. 38. Simon, M., Haddock, a Haskell documentation tool, in Proceedings of the 2002 ACM SIGPLAN workshop on Haskell. 2002, ACM: Pittsburgh, Pennsylvania. 39. Tadonki, C. Universal Report: a generic reverse engineering tool. in Program Comprehension, 2004. Proceedings. 12th IEEE International Workshop on. 2004. 40. Shirabad, J .S., Supporting Software Maintenance by Mining Software Update Records, in School of Information Technology and Engineering. 2003, University of Ottawa: Canada. p. 258. 41. Daniel, G.S., S.D. J effrey, and P.M. Brian, A knowledge base for supporting an intelligent program editor, in Proceedings of the 7th international conference on Software engineering. 1984, IEEE Press: Orlando, Florida, United States. 42. Ward, F.W.C.a.M.K.a.M.M.a.M., Knowledge-Based System for Software Maintenance, in in Proceedings for the Conference on Software Maintenance. 1988. 43. Zheng, X.-d., H.-h. Hu, and H.-r. Gu. Research on Core Business Process-Based Knowledge Process Model. in Semantics, Knowledge and Grid, 2009. SKG 2009. Fifth International Conference on. 2009. 44. Heisig, K.P., Business Process-oriented KM. 2007, Xian J iaotong University. 45. Holger M. Kienle, H.A.M.u., The Rigi Reverse Engineering Environment, in Proceedings of the International Workshop on Advanced Software Development Tools and Techniques 2008, ECOOP: Paphos, Cyprus. 46. Kenny, W., et al., Structural Redocumentation: A Case Study. 1995, IEEE Computer Society Press. p. 46-54. 47. Ferenc, R., Columbus Reverse Engineering Tool and Schema for C++. 2002. 48. Rigi User's Manual. 1996, University of Victoria: Canada. 49. Ronald J .Brachman, H.J .L., Knowledge Representation and Reasoning. 2004, San Francisco: Morgan Kaufmann. 50. Domain Knowledge in Planning Representation and Use. 51. Russell, G., D. Christian, and N.I. Santoso, Efficient reasoning. ACM Comput. Surv., 2001. 33(1): p. 1-30. 52. Hepp, M., Ontologies: State of the Art, Business Potential, and Grand Challenges, in Ontology Management. 2008. p. 3-22. 81
53. Mark S. Fox, M.G., An Organizational Ontology for Enterprise Modeling. 1998, AAAI Press / MIT Press, Menlo Park. p. 131--152. 54. Stephan, G.s., H.s. Pascal, and A.s. Andreas, Knowledge Representation and Ontologies, in Semantic Web Services. 2007. p. 51-105. 55. Fensel, D., Lausen, H., Polleres, A., Bruijn, J ., Stollberg, M., Roman, D., Domingue, J . , Enabling Semantic Web Services : The Web Service Modeling Ontology. 2007: Springer. 56. Fellbaum, C., WordNet:An Electronic Lexical Database, G. Miller, Editor. 1998, MIT Press. 57. Douglas, B.L., CYC: a large-scale investment in knowledge infrastructure. Commun. ACM, 1995. 38(11): p. 33-38. 58. Mrcio Greyck Batista Dias, N.A., Kthia Maral de Oliveira, Organizing the Knowledge Used in Software Maintenance. J ournal of Universal Computer Science, 2003. 59. Leonid Kof, M.P., Validating Documentation with Domain Ontologies, in Proceeding of the 2005 conference on New Trends in Software Methodologies, Tools and Techniques: Proceedings of the fourth SoMeT_W05. 2005, IOS Press Amsterdam, The Netherlands, The Netherlands 60. Mike Uschold, M.K., Towards a Methodology for Building Ontologies, in In Workshop on Basic Ontological Issues in Knowledge Sharing, held in conjunction with IJCAI-95. 1995: Canada. 61. Seremeti, L. and A. Kameas. A Task-Based Ontology Enginnering Approach for Novice Ontology Developers. in Informatics, 2009. BCI '09. Fourth Balkan Conference in. 2009. 62. B. Chandrasekaran, J .R.J .a.V.R.B., Ontology of Tasks and Methods, in AAAI Spring Symposium and the1998 Banff Knowledge Acquisition Workshop. 1998: Canada. 63. Thomas, R.G., A translation approach to portable ontology specifications. Knowl. Acquis., 1993. 5(2): p. 199-220. 64. G.A.Miller, WorNet : An online lexical database. International journal of Lexicography, 1990. 65. J ames G. Schmolze, T.A.L., Classification in the KL-ONE Knowledge Representation System. 1985. p. Cognitive Science, 9(2):171216. 66. Alexander Borgida, R.J .B., Deborah L. McGuinness,Lori Alperin Resnick, CLASSIC: A Structural Data Model for Objects, in In Proceedings of the 1989 ACM SIGMOD International Conference on Management of Data. 1989. p. pages 5967. 67. Adam Farquhar, R.F., J ames Rice, The Ontolingua Server:a Tool for Collaborative Ontology Construction, in In Proceedings of the 10th Knowledge Acquisition for Knowledge-Based Systems Workshop, Banff. 1996: Canada. 68. Manola, F. and E. Miller. RDF Primer. 10 Feb 2004 [cited; Available from: 69. The Semantic Web, in Enabling Semantic Web Services. 2007. p. 25-36. 70. From Web to Semantic Web, in Implementing Semantic Web Services. 2008. p. 3- 25. 82
71. Cardoso, J ., The Semantic Web: A mythical story or a solid reality?, in Metadata and Semantics. 2009. p. 253-257. 72. J in, P., C. Stephen, and C. Daniel, A lightweight ontology repository, in Proceedings of the second international joint conference on Autonomous agents and multiagent systems. 2003, ACM: Melbourne, Australia. 73. Guarino, N., Formal Ontology and Information Systems, in 1st International Conference on Formal Ontology in Information Systems (FOIS98),. 1998: Trento,Italy. 74. Welty, C.A. and D.A. Ferrucci. A formal ontology for re-use of software architecture documents. in Automated Software Engineering, 1999. 14th IEEE International Conference on. 1999. 75. Aksit, I.K.J .B.M., Technological Spaces: an Initial Appraisal, in CoopIS, DOA'2002 Federated Conferences, Industrial track 2002, CiteSeer Scientific Literature Digital Library. 76. Verhoeven, E.-J ., Cobol Island Grammer in SDF, in Informatic Institute 2000, University of Armsterdem. p. 73. 77. Mark G. J . van den Brand, P.K., Chris Verhoef Core Technologies for System Renovation, in SOFSEM '96: Proceedings of the 23rd Seminar on Current Trends in Theory and Practice of Informatics: Theory and Practice of Informatics 1996, Springer-Verlag 78. Moonen, L. Generating robust parsers using island grammars. in Reverse Engineering, 2001. Proceedings. Eighth Working Conference on. 2001. 79. Re-documenting, Visualizing and Understanding Software System Using DocLike Viewer. 80. H. A, M., ller, and K. Klashinsky, Rigi-A system for programming-in-the-large, in Proceedings of the 10th international conference on Software engineering. 1988, IEEE Computer Society Press: Singapore. 81. Javadoc Tool Home Page. [cited; Available from: 82. Heesch, D.v. Doxygen Source Code Documentation Generator Tool. 2007 [cited; Available from: 83. Leroy, X. The Objective Caml System Release 3.11. 2008 [cited; Available from: 84. PLT Scheme. [cited; Available from: 85. Robert, P. and T. Scott, Automatically connecting documentation to code with rose, in Proceedings of the 20th annual international conference on Computer documentation. 2002, ACM Press: Toronto, Ontario, Canada. 86. Hak-lae Kim , H.-G.K., Kyung-Mo Park Ontalk: Ontology-Based Personal Document Management System. 2004, CiteSeer- Scientific Literature Digital Library and Search Engine. 87. Wimmer, M. A Meta-Framework for Generating Ontologies from Legacy Schemas. in Database and Expert Systems Application, 2009. DEXA '09. 20th International Workshop on. 2009. 83
88. Marcos., M.L.a.E., Research in Software Engineering: Paradigms and Methods, in In Proceedings of the 17th International Conference on Advanced Information System (CAiSE05). J une 2005: Porto, Portugal.