no longer supports Internet Explorer.
To browse and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
4 pages
1 file
Abstract Duplication of code in software systems is considered to be a serious problem that can affect a systems maintainability and extendability. It is reported that 10-15% of code in a software system is involved in cloning. However, because of the difficultly of objectively measuring the number of false positives in a clone result set, the accuracy of these reports is difficult to evaluate. Although an important topic, little work has been done in the area of evaluating the accuracy of clone detection methods.
Abstract Code duplication or copying a code fragment and then reuse by pasting with or without any modifications is a well known code smell in software maintenance. Several studies show that about 5% to 20% of a software systems can contain duplicated code, which is basically the results of copying existing code fragments and using then by pasting with or without minor modifications.
— Many techniques for detecting duplicated source code (software clones) have been proposed in the software reengineering literature. However, comparison of these techniques in terms of performance is not widely studied. There are four general categories for clone detection techniques; textual, lexical, syntactic, and semantic. This report presents an experiment that evaluates different clone detectors based on four Java programs of small to medium size scales. These subject systems have been used in the recent literature, and can be considered as standard systems for this purpose. At least one clone detection tool has been tested for each category. The comparison of different techniques is done based on performance metrics for clone detection tools. The most widely used metrics, precision and recall, have been used to calculate quantitative values for the performance of different techniques so that they can be compared with each other. The reference clones used in the comparison are those in the Bellon corpus. Our goal was to only evaluate systems that were not previously evaluated using Bellon benchmark, and not to replicate the previous works in our main experiment.
Code clones are novel subtle way to deal with issues pertaining from one territory module to another fragment or module. They are increasing the rate of peril in bug duplication within each code that has a copy, whenever there was a bug in wellspring of clone. The clone in code is equivalent to or indistinct code in the original source code that is made either by reproduction or a couple of changes. Clone is an indefatigable sort of programming reuses that effect on help of enormous programming. Till now, the researchers have underlined on perceiving type 1, type 2, and type 3 sort of clones. The present code clone area devices are utilized to perceive source code cloning present within them. Throughout this paper, a review is done on programming clone area to assess cross breed strategy subject to various parameters. Rather than using single technique on the code, creamer methodology is being used, suggests two strategies are being solidified together. We can achieve increasingly e...
IOSR Journal of Computer Engineering, 2016
Software cloning means duplication of source code. It is most basic means of software reuse. A software clone is a code fragment which is identical to another in the source code. Clones are harmful for software maintenance because it increases the complexity of system and maintenance cost. If we detect software clones it can decrease software maintenance cost. Many code clone detection techniques have been proposed for this purpose. Several studies show that about 5% to 20% of software system can contain duplicated code which is results of copying existing code fragments and around 60% of the efforts of an organization are wasted in maintaining this. The main disadvantage of code duplication is that if a bug is detected in a code fragment; all the other fragments similar to it should be checked for the possible existence of the same bug. By using different clone detection techniques, we can detect code clones which increase the efficiency of software maintenance process and thus decreases the maintenance cost.
International Journal of Open Source Software and Processes, 2018
Over the past few decades, many tools and methods have been proposed by several researchers to detect clones automatically in programs and software. Nevertheless, it is not yet clear how to evaluate these tools in terms of accuracy, scalability, and portability. However, all of these tools have some merits and limitations but the application of these tools depends on the user requirements, so it is necessary for the user that they should be aware of the tools and its distinguishing aspects. This article presents the performance of six clone detection tools in terms of accuracy, scalability, and portability. The aim of this study is to make the selection of tools easy for detection of copied code.
Abstract Code duplication is a well-documented problem in industrial software systems. There has been considerable research into techniques for detecting duplication in software, and there are several effective tools to perform this task. However, there have been few detailed qualitative studies into how cloning actually manifests itself within software systems.
Over the last decade many techniques and tools for software clone detection have been proposed. In this paper, we provide a qualitative comparison and evaluation of the current state-of-the-art in clone detection techniques and tools, and organize the large amount of information into a coherent conceptual framework. We begin with background concepts, a generic clone detection process and an overall taxonomy of current techniques and tools. We then classify, compare and evaluate the techniques and tools in two different dimensions. First, we classify and compare approaches based on a number of facets, each of which has a set of (possibly overlapping) attributes. Second, we qualitatively evaluate the classified techniques and tools with respect to a taxonomy of editing scenarios designed to model the creation of Type-1, Type-2, Type-3 and Type-4 clones. Finally, we provide examples of how one might use the results of this study to choose the most appropriate clone detection tool or technique in the context of a particular set of goals and constraints. The primary contributions of this paper are: (1) a schema for classifying clone detection techniques and tools and a classification of current clone detectors based on this schema, and (2) a taxonomy of editing scenarios that produce different clone types and a qualitative evaluation of current clone detectors based on this taxonomy.
Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - ICSE '10, 2010
Cloned code is considered harmful for two reasons: (1) multiple, possibly unnecessary, duplicates of code increase maintenance costs and, (2) inconsistent changes to cloned code can create faults and, hence, lead to incorrect program behavior. Likewise, duplicated parts of models are problematic in model-based development. Recently, we and other authors proposed multiple approaches to automatically identify duplicates in graphical models. While it has been demonstrated that these approaches work in principal, a number of challenges remain for application in industrial practice. Based on an industrial case study undertaken with the BMW Group, this paper details on these challenges and presents solutions to the most pressing ones, namely scalability and relevance of the results. Moreover, we present tool support that eases the evaluation of detection results and thereby helps to make clone detection a standard technique in modelbased quality assurance.
Journal of emerging technologies and innovative research, 2018
In this paper, we discussed several methods and tools for detecting code replication in different dimensions. This survey provided an in-depth study of the clone detection methods and tools. From clone perceptions, the classification of clones and a general variety of selected methods and tools are discussed. This review paper covers the total worldview of clone identification and presents open research procedures in the recognition of code clones.
Code replication is ordinary difficulty, and a well recognized sign of terrible design. But Code replication is one of the nearly all well liked forms of software use again amongst developers. Clone discovery or code repetition discovery is the method troubled with the detection of code rubble that fundamentally calculate the same consequences .The most important aim of clone discovery is to recognize clone code and put back them with a single function call where the purpose would mimic the performance of a single example from the set of clones. As consequences of that, in the last decade, the issue of detect code replication led to a variety of tools that can mechanically find duplicate blocks of code. In this document dissimilar methods for code clone discovery, dissimilar tools and method used for that and the code examination will be discuss.
Analysis and detection of clones in software has recently become a popular area of research. Code cloning, generally referred to as the practice of duplicating code within a software system, is considered a serious problem by many sources [4,6,8,10,15,16,18]. Some problems associated with code clones are unnecessary increase in code size, duplicated bugs and duplicated maintenance effort to fix them, introduction of unused code, and increased code complexity. If code cloning is not managed costs as-sociated with maintaining and extending the system will increase needlessly.
Typically, clone detection tools report that 10-15% of the lines of code in a software system contribute to clones. In extreme cases, the duplication can be as high as 50% of the software system [8]. However, not all of these reported clones are related to the duplication of source code [12,13]. In many cases false positives are also part of the result set. False positives are segments of code that are reported as clones but in fact are not. In many cases these matches are caused by segments of code with very simple and repetitive structure [13]. Many of these falsely reported clones can be removed from the result set using filters, but additional manual inspection of clones is required to refine the results further [13].
In addition to false positives, there is another type of clone not directly related to the duplication of source code that may be reported by clone detection tools. These clones, called "incidental clones", are segments of code that are similar in structure and function not because of explicit copy-and-paste activities but rather arise due to other factors such as programming idioms, API interactions, and the inherent structure of programs written in a programming language. These clones can be very difficult to filter, both manually and automatically, because their form and function may actually be related. For example, building a GUI is a highly repetitive task, and interactions with the API may result in many repeated calls to the same set of functions. In these cases, it is difficult to classify the cause of the clone as copy-and-paste or incidental.
It is important that we measure the proportion of clones in a result set that are false positives or incidental clones if we wish to properly evaluate the effectiveness and accuracy of clone detection tools, yet little work has been done on this topic. This is largely because this type of evaluation would have required human subjects to classify the clones, a task that was found to be highly subjective [19]. With the recent existence of large source code repositories such as [1], we can now take an objective approach to measuring the amount of incidental cloning and false positives, there by giving us insight into the degree of true cloning within a software system. This paper proposes an experiment that will measure the commonality amongst a very large set of unrelated open source projects taken from the project. Because these systems will be generally unrelated we expect that the code will be equally unrelated, giving us a baseline of false positives that are detected in unrelated code. This will provide insight about how many clones are detected amongst unrelated code when inspecting a software system, and giving us a way to more accurately estimate the amount of true cloning in a software system. In addition, we will also measure the effect of API protocols on clone detection results by measuring the amount of cloning that occurs between software that uses the same API or library. In this work we expect the commonalities amongst software systems to be low, providing further validation of the significance of cloning found within a software system.
The goal of this study is to estimate the amount of false positives and incidental clones that exist in the results of a clone detection tool. We will do so by measuring the amount of clones that are detected amongst unrelated code, under the assumption that most clones that are detected will be false positives. This assumption was derived from the results of our previous work comparing source code of similar open source projects [2] where we found that the open source projects in our study did not share code, even though they were related in functionality.
The experiment will consist of two phases. In the first phase we will detect clones amongst a random sample of projects, giving us an estimate of false positives and incidental clones detected amongst unrelated code. In the second phase, our study subjects will consist of source code that is related to GUI construction. This phase will provide us with an estimate of the amount of incidental cloning that is detected by clone detection tools. For each phase, we will carry out the following steps:
1. Randomly select study subjects.
2. Detect clones between each study subject pair.
3. Detect clones within each study subject.
4. Measure overlap of clones within software systems with clones occurring between them.
Each of these steps will be discussed in more detail below.
Unrelated Code
For the purpose of this experiment, we will use 200 randomly selected projects, selected from the list of downloaded projects published by the author of is an on-line searchable repository of a very large number of C projects. It allows the use to query the source code using a variety of mechanisms. To detect clones amongst files that include GUI libraries, we will use the "includes" search functionality.
There will be no restriction on size of project. However, the source language will be restricted to C, the only language currently in the repository. After selecting the projects, we will proceed to download the source and run clone detection tools on them. In this study we will use two clone detection techniques to gather our results, parametrized sting matching as described by Kamiya et al. [10] and exact match string matching as described by Ducasse et al. [8]. This will allow us to measure the impact of the detection technique on the results as well as provide us with a comparison of the amount and type of false positives that are detected by the two different approaches.
In our first step of clone detection we will only detect clones that occur across each possible pair of software systems. Because we expect most of the source code to be unrelated, most clones should in fact be false positives or incidental clones. From this set of results, we will record the average percent of commonality between each pair of systems and the average size of the clones. This will provide us with a baseline of the amount false positives that occur in a results set from each of the clone detection techniques.
Through manual inspection of the results in this step, we will try to analyze the types of clones that tend to occur in both, in an effort to profile the types of code that cause false positives in the clone detection techniques we are using.
In our next steps we will detect the clones that occur within each project and measure the amount of code that occurs in both the set of clones across projects and within the projects. This will give us a further indicator as to how much code that is likely to be part of a false positive contributes to the detected clones in a software system.
GUI Code
In our next step, using we will search for any files in the repository that include header files from widget libraries such as GTK, GNOME Widgets, and xlib. Partitioning the files by project, we will detect the clones occurring across projects using the same libraries. By detecting the clones that occur between code using specialized libraries such as GUI libraries, we can gain insight into the degree of "incidental cloning" that is detected by clone detection tools. These clones will in many cases represent strategies or protocols required for the use of the libraries, something that can not be avoided. Perhaps the result of studying these clones can lead to further abstractions with the libraries themselves.
As in the first phase, will will detect clones within each of the projects as well. By measuring the overlap, well will gain insight into the contribution of "incidental clones" as part of a result set of detected clones.
Related Work
There is a wide variety of clone detection techniques that have been developed. These methods range from string comparison, metrics comparison, and program graph comparison strategies [4,6,8,10,16,18,9,5,17,14]. Currently we propose to only use two of these methods of clone detection as a pilot study. The study we propose could be expanded to other clone detection techniques. Several case studies have been performed on cloning with a software system [3,7,10,11,12,13], but none of these studies have considered measuring cloning across software systems.
There are very few studies that perform clone detection across software systems. Kamiya et al. [10] investigated the cloning across the source code of three different operating systems: Linux, FreeBSD, and NetBSD. Their analysis showed that there was about 20% cloning between FreeBSD and NetBSD, whereas there was less than 1% of the code cloned between Linux and FreeBSD or NetBSD. Because FreeBSD and NetBSD have the same origin, the cloning between them was not surprising. Because Linux was developed independently from the BSD systems, very little cloning was detected. In [2] we found similar results, finding that very few clones are detected across software that was not related. However, in both of these case, the study size is very small, making the results not generalizable. In addition, neither study considers the effects of using libraries such as GUI libraries on the clone detection results.
Previously it has been very difficult to objectively measure the amount of false positives are returned by a clone detection tool, yet this is important if we wish to confidently analyze the results of clone analysis and clone detection research. In this paper, we propose a study that will effectively find the lower limit of this value. In addition we also aim to measure the impact of the protocols required to use APIs on the clone detection results, helping us measure the effects of incidental cloning in a software system.
The results of this work will provide not only more insights into the accuracy of clone detection tools, but it also provides a platform from which we can investigate the weaknesses of tools, and also improve data filtering techniques. For example, from the resulting detected clones between software system and within software systems one may be able to train learning algorithms to classify true clones and false positives, something that we would like to research further.
Current Swedish Archaeology
Praxis Filosófica, 2015
Æthelstan, Anlaf and Constantine’s Route to the Battle., 2021
Mozambique: Jihadist Terrorism in Cabo Delgado, 2020
On Some Road and Road Station Issues of the Great Mongol Empire., 2022
Physics of Fluids
Promoting Physical Education Across …, 2009
Journal of Volcanology and Geothermal Research, 1996
The Ocular Surface, 2022
NeuroQuantology, 2022
Journal of Clinical Gastroenterology, 1990
Proceedings of the 17th IAARC/CIB/IEEE/IFAC/IFR International Symposium on Automation and Robotics in Construction, 2000
Performance Paradigm Volume 19: Moving South: The Reconceptualisation of Dance Research in the 2020's, 2024