An improved iterative receiver is developed for multiuser communications using fast frequency-hop... more An improved iterative receiver is developed for multiuser communications using fast frequency-hopping modulation. Each user employs a channel encoder to protect its information and facilitate interference cancellation at the receiver. At the destination, in order to reliably extract signals from all users, the detection algorithm employs double iteration process: an outer iteration between the interference canceler and soft-input soft-output (SISO) decoder, and an inner iteration between the soft demapper and the SISO decoder. The proposed detection algorithm works with direct as well as relay-aided transmissions. Two relay scenarios are investigated, which are amplify-and-forward and partial-decode-and-forward relaying. Under the same spectral efficiency, simulation results demonstrate the excellent performance of the proposed receiver when compared to the performance of single iterative receiver and other previously-proposed interference cancellation schemes.
Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, 2013
Prior research has shown that source code also exhibits naturalness, i.e. it is written by humans... more Prior research has shown that source code also exhibits naturalness, i.e. it is written by humans and is likely to be repetitive. The researchers also showed that the n-gram language model is useful in predicting the next token in a source file given a large corpus of existing source code. In this paper, we investigate how well statistical machine translation (SMT) models for natural languages could help in migrating source code from one programming language to another. We treat source code as a sequence of lexical tokens and apply a phrase-based SMT model on the lexemes of those tokens. Our empirical evaluation on migrating two Java projects into C# showed that lexical, phrase-based SMT could achieve high lexical translation accuracy (BLEU from 81.3-82.6%). Users would have to manually edit only 11.9-15.8% of the total number of tokens in the resulting code to correct it. However, a high percentage of total translation methods (49.5-58.6%) is syntactically incorrect. Therefore, our result calls for a more program-oriented SMT model that is capable of better integrating the syntactic and semantic information of a program to support language migration.
Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1, 2010
Abstract Previous research confirms the existence of recurring bug fixes in software systems. Ana... more Abstract Previous research confirms the existence of recurring bug fixes in software systems. Analyzing such fixes manually, we found that a large percentage of them occurs in code peers, the classes/methods having the similar roles in the systems, such as providing similar functions and/or participating in similar object interactions. Based on graph-based representation of object usages, we have developed several techniques to identify code peers, recognize recurring bug fixes, and recommend changes for code units from the bug ...
Bugs are prevalent in software systems and improving time efficiency in bug fixing is desired. We... more Bugs are prevalent in software systems and improving time efficiency in bug fixing is desired. We performed an analysis on 11,115 bug records of Eclipse JDT and found that bug resolution time is log-normally distributed and varies across fixers, technical topics, and bug severity levels. We then propose FixTime, a novel method for bug assignment. The key of FixTime is a topicbased, log-normal regression model for predicting defect resolution time on which FixTime is based to make fixing assignment recommendations. Preliminary results suggest that FixTime has higher prediction accuracy than existing approaches.
2012 34th International Conference on Software Engineering (ICSE), 2012
Fixing defects is an essential software development activity. For commercial software vendors, th... more Fixing defects is an essential software development activity. For commercial software vendors, the time to repair defects in deployed business-critical software products or applications is a key quality metric for sustained customer satisfaction. In this paper, we report on the analysis of about 1,500 defect records from an IBM middle-ware product collected over a five-year period. The analysis includes a
2013 35th International Conference on Software Engineering (ICSE), 2013
ABSTRACT PHP is a server-side language that is widely used for creating dynamic Web applications.... more ABSTRACT PHP is a server-side language that is widely used for creating dynamic Web applications. However, as a dynamic language, PHP may induce certain programming errors that reveal themselves only at run time. A common type of error is dangling references, which occur if the referred program entities have not been declared in the current program execution. To prevent the run-time errors caused by such dangling references, we introduce Dangling Reference Checker (DRC), a novel tool to statically detect those references in the source code of PHP-based Web applications. DRC first identifies the path constraints of the program executions in which a program entity appears and then matches the path constraints of the entity's declarations and references to detect dangling ones. DRC is able to detect dangling reference errors in several real-world PHP systems with high accuracy. The video demonstration for DRC is available at http://www.youtube.com/watch?v=3Dy_AKZYhLlU4.
2012 28th IEEE International Conference on Software Maintenance (ICSM), 2012
Abstract Build code in a Makefile represents the build rules with the dependencies among the file... more Abstract Build code in a Makefile represents the build rules with the dependencies among the files, and how they must be built together to produce a software system. As software evolves, its build code evolves as well to accommodate necessary changes in the build process. As part of software maintenance, it is crucial to understand how the build code is changed (eg changes in build rules or dependencies), and to verify and validate the correctness of the build process with different build configurations. Due to Make's dynamic ...
Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, 2013
Recent research has successfully applied the statistical ngram language model to show that source... more Recent research has successfully applied the statistical ngram language model to show that source code exhibits a good level of repetition. The n-gram model is shown to have good predictability in supporting code suggestion and completion. However, the state-of-the-art n-gram approach to capture source code regularities/patterns is based only on the lexical information in a local context of the code units. To improve predictability, we introduce SLAMC, a novel statistical semantic language model for source code. It incorporates semantic information into code tokens and models the regularities/patterns of such semantic annotations, called sememes, rather than their lexemes. It combines the local context in semantic n-grams with the global technical concerns/ functionality into an n-gram topic model, together with pairwise associations of program elements. Based on SLAMC, we developed a new code suggestion method, which is empirically evaluated on several projects to have relatively 18-68% higher accuracy than the state-of-the-art approach.
Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 2, 2010
Abstract New software security vulnerabilities are discovered on almost daily basis and it is vit... more Abstract New software security vulnerabilities are discovered on almost daily basis and it is vital to be able to identify and resolve them as early as possible. Fortunately, many software vulnerabilities are recurring or very similar, thus, one could effectively detect and fix a vulnerability in a system by consulting the similar vulnerabilities and fixes from other systems. In this paper, we propose, SecureSync, an automatic approach to detect and provide suggested resolutions for recurring software vulnerabilities on multiple systems ...
Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, 2012
The links between the bug reports in an issue-tracking system and the corresponding fixing change... more The links between the bug reports in an issue-tracking system and the corresponding fixing changes in a version repository are not often recorded by developers. Such linking information is crucial for research in mining software repositories in measuring software defects and maintenance efforts. However, the state-of-the-art bug-to-fix link recovery approaches still rely much on textual matching between bug reports and commit/change logs and cannot handle well the cases where their contents are not textually similar. This paper introduces MLink, a multi-layered approach that takes into account not only textual features but also source code features of the changed code corresponding to the commit logs. It is also capable of learning the association relations between the terms in bug reports and the names of entities/components in the changed source code of the commits from the established bug-to-fix links, and uses them for link recovery between the reports and commits that do not share much similar texts. Our empirical evaluation on realworld projects shows that MLink can improve the state-ofthe-art bug-to-fix link recovery methods by 11-18%, 13-17%, and 8-17% in F-score, recall, and precision, respectively.
2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2013
In this paper, we present a study of repetitiveness of code changes in software evolution. Repeti... more In this paper, we present a study of repetitiveness of code changes in software evolution. Repetitiveness is defined as the ratio of repeated changes over total changes. Focusing on fine-grained code changes, we model a change as a pair of old and new AST sub-trees within a method. A change is considered repeated within or cross-project if it matches another change having occurred in the history of the project or another project, respectively. We report the following important findings. First, repetitiveness of changes could be as high as 70-100% at small sizes and decreases exponentially as size increases. Second, repetitiveness is higher and more stable in cross-project setting than in within-project one. Third, fixing changes repeat similarly to general changes. Importantly, learning code changes and recommending them in software evolution is beneficial with accuracy for top-1 recommendation of over 30% and top-3 of nearly 35%. Repeated fixing changes could also be useful for automatic program repair.
Proceedings of the 33rd International Conference on Software Engineering, 2011
Cross-cutting concerns are unavoidable and create diculties in the development and maintenance of... more Cross-cutting concerns are unavoidable and create diculties in the development and maintenance of large-scale systems. In this paper, we present a novel approach that identifies certain groups of code units that potentially share some cross-cutting concerns and recommends them for creating and updating aspects. Those code units, called concern peers, are detected based on their similar interactions (similar calling relations
2012 34th International Conference on Software Engineering (ICSE), 2012
Code completion helps improve developers' programming productivity. However, the current support ... more Code completion helps improve developers' programming productivity. However, the current support for code completion is limited to context-free code templates or a single method call of the variable on focus. Using software libraries for development, developers often repeat API usages for certain tasks. Thus, a code completion tool could make use of API usage patterns. In this paper, we introduce GraPacc, a graphbased, pattern-oriented, context-sensitive code completion approach that is based on a database of such patterns. GraPacc represents and manages the API usage patterns of multiple variables, methods, and control structures via graph-based models. It extracts the context-sensitive features from the code under editing, e.g. the API elements on focus and their relations to other code elements. Those features are used to search and rank the patterns that are most fitted with the current code. When a pattern is selected, the current code will be completed via a novel graph-based code completion algorithm. Empirical evaluation on several real-world systems shows that GraPacc has a high level of accuracy in code completion.
2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2013
PHP is a dynamic language popularly used in Web development for writing server-side code to dynam... more PHP is a dynamic language popularly used in Web development for writing server-side code to dynamically create multiple versions of client-side pages at run time for different configurations. A PHP program contains code to be executed or produced for multiple configurations/versions. That dynamism and multi-configuration nature leads to dangling references. Specifically, in the execution for a configuration, a reference to a variable or a call to a function is dangling if its corresponding declaration cannot be found. We conducted an exploratory study to confirm the existence of such dangling reference errors including dangling cross-language and embedded references in the client-side HTML/JavaScript code and in data-accessing SQL code that are embedded in scattered PHP code. Dangling references have caused run-time fatal failures and security vulnerabilities. We developed DRC, a static analysis method to detect such dangling references. DRC uses symbolic execution to collect PHP declarations/references and to approximate all versions of the generated output, and then extracts embedded declarations/references. It associates each detected declaration/reference with a conditional constraint that represents the execution paths (i.e. configurations/versions) containing that declaration/reference. It then validates references against declarations via a novel dangling reference detection algorithm. Our empirical evaluation shows that DRC detects dangling references with high accuracy. It revealed 83 yet undiscovered defects caused by dangling references.
2013 IEEE International Conference on Software Maintenance, 2013
Localizing and fixing software faults is an important maintenance task. In a dynamic Web applicat... more Localizing and fixing software faults is an important maintenance task. In a dynamic Web application, localizing the faults is challenging due to its dynamic nature and the interactions between the application and databases. The faults could occur in the statements in the host program or inside the queries that are sent from the application to be executed in the database engines. This paper presents SQLook, a novel database- aware fault localization method that is able to locate output faults in PHP statements of a dynamic Web application as well as in SQL queries. In SQLook, a PHP interpreter is instrumented to execute an SQL query and to monitor the evaluation of those SQL predicates to determine if they affect the output process of individual data records. It performs row-based slicing across PHP statements and SQL queries to record the entities that are involved in the output of each data row. Our empirical evaluation shows that SQLook can achieve higher accuracy than the state- of-the-art database-aware fault localization approach.
An improved iterative receiver is developed for multiuser communications using fast frequency-hop... more An improved iterative receiver is developed for multiuser communications using fast frequency-hopping modulation. Each user employs a channel encoder to protect its information and facilitate interference cancellation at the receiver. At the destination, in order to reliably extract signals from all users, the detection algorithm employs double iteration process: an outer iteration between the interference canceler and soft-input soft-output (SISO) decoder, and an inner iteration between the soft demapper and the SISO decoder. The proposed detection algorithm works with direct as well as relay-aided transmissions. Two relay scenarios are investigated, which are amplify-and-forward and partial-decode-and-forward relaying. Under the same spectral efficiency, simulation results demonstrate the excellent performance of the proposed receiver when compared to the performance of single iterative receiver and other previously-proposed interference cancellation schemes.
Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, 2013
Prior research has shown that source code also exhibits naturalness, i.e. it is written by humans... more Prior research has shown that source code also exhibits naturalness, i.e. it is written by humans and is likely to be repetitive. The researchers also showed that the n-gram language model is useful in predicting the next token in a source file given a large corpus of existing source code. In this paper, we investigate how well statistical machine translation (SMT) models for natural languages could help in migrating source code from one programming language to another. We treat source code as a sequence of lexical tokens and apply a phrase-based SMT model on the lexemes of those tokens. Our empirical evaluation on migrating two Java projects into C# showed that lexical, phrase-based SMT could achieve high lexical translation accuracy (BLEU from 81.3-82.6%). Users would have to manually edit only 11.9-15.8% of the total number of tokens in the resulting code to correct it. However, a high percentage of total translation methods (49.5-58.6%) is syntactically incorrect. Therefore, our result calls for a more program-oriented SMT model that is capable of better integrating the syntactic and semantic information of a program to support language migration.
Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1, 2010
Abstract Previous research confirms the existence of recurring bug fixes in software systems. Ana... more Abstract Previous research confirms the existence of recurring bug fixes in software systems. Analyzing such fixes manually, we found that a large percentage of them occurs in code peers, the classes/methods having the similar roles in the systems, such as providing similar functions and/or participating in similar object interactions. Based on graph-based representation of object usages, we have developed several techniques to identify code peers, recognize recurring bug fixes, and recommend changes for code units from the bug ...
Bugs are prevalent in software systems and improving time efficiency in bug fixing is desired. We... more Bugs are prevalent in software systems and improving time efficiency in bug fixing is desired. We performed an analysis on 11,115 bug records of Eclipse JDT and found that bug resolution time is log-normally distributed and varies across fixers, technical topics, and bug severity levels. We then propose FixTime, a novel method for bug assignment. The key of FixTime is a topicbased, log-normal regression model for predicting defect resolution time on which FixTime is based to make fixing assignment recommendations. Preliminary results suggest that FixTime has higher prediction accuracy than existing approaches.
2012 34th International Conference on Software Engineering (ICSE), 2012
Fixing defects is an essential software development activity. For commercial software vendors, th... more Fixing defects is an essential software development activity. For commercial software vendors, the time to repair defects in deployed business-critical software products or applications is a key quality metric for sustained customer satisfaction. In this paper, we report on the analysis of about 1,500 defect records from an IBM middle-ware product collected over a five-year period. The analysis includes a
2013 35th International Conference on Software Engineering (ICSE), 2013
ABSTRACT PHP is a server-side language that is widely used for creating dynamic Web applications.... more ABSTRACT PHP is a server-side language that is widely used for creating dynamic Web applications. However, as a dynamic language, PHP may induce certain programming errors that reveal themselves only at run time. A common type of error is dangling references, which occur if the referred program entities have not been declared in the current program execution. To prevent the run-time errors caused by such dangling references, we introduce Dangling Reference Checker (DRC), a novel tool to statically detect those references in the source code of PHP-based Web applications. DRC first identifies the path constraints of the program executions in which a program entity appears and then matches the path constraints of the entity's declarations and references to detect dangling ones. DRC is able to detect dangling reference errors in several real-world PHP systems with high accuracy. The video demonstration for DRC is available at http://www.youtube.com/watch?v=3Dy_AKZYhLlU4.
2012 28th IEEE International Conference on Software Maintenance (ICSM), 2012
Abstract Build code in a Makefile represents the build rules with the dependencies among the file... more Abstract Build code in a Makefile represents the build rules with the dependencies among the files, and how they must be built together to produce a software system. As software evolves, its build code evolves as well to accommodate necessary changes in the build process. As part of software maintenance, it is crucial to understand how the build code is changed (eg changes in build rules or dependencies), and to verify and validate the correctness of the build process with different build configurations. Due to Make's dynamic ...
Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, 2013
Recent research has successfully applied the statistical ngram language model to show that source... more Recent research has successfully applied the statistical ngram language model to show that source code exhibits a good level of repetition. The n-gram model is shown to have good predictability in supporting code suggestion and completion. However, the state-of-the-art n-gram approach to capture source code regularities/patterns is based only on the lexical information in a local context of the code units. To improve predictability, we introduce SLAMC, a novel statistical semantic language model for source code. It incorporates semantic information into code tokens and models the regularities/patterns of such semantic annotations, called sememes, rather than their lexemes. It combines the local context in semantic n-grams with the global technical concerns/ functionality into an n-gram topic model, together with pairwise associations of program elements. Based on SLAMC, we developed a new code suggestion method, which is empirically evaluated on several projects to have relatively 18-68% higher accuracy than the state-of-the-art approach.
Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 2, 2010
Abstract New software security vulnerabilities are discovered on almost daily basis and it is vit... more Abstract New software security vulnerabilities are discovered on almost daily basis and it is vital to be able to identify and resolve them as early as possible. Fortunately, many software vulnerabilities are recurring or very similar, thus, one could effectively detect and fix a vulnerability in a system by consulting the similar vulnerabilities and fixes from other systems. In this paper, we propose, SecureSync, an automatic approach to detect and provide suggested resolutions for recurring software vulnerabilities on multiple systems ...
Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, 2012
The links between the bug reports in an issue-tracking system and the corresponding fixing change... more The links between the bug reports in an issue-tracking system and the corresponding fixing changes in a version repository are not often recorded by developers. Such linking information is crucial for research in mining software repositories in measuring software defects and maintenance efforts. However, the state-of-the-art bug-to-fix link recovery approaches still rely much on textual matching between bug reports and commit/change logs and cannot handle well the cases where their contents are not textually similar. This paper introduces MLink, a multi-layered approach that takes into account not only textual features but also source code features of the changed code corresponding to the commit logs. It is also capable of learning the association relations between the terms in bug reports and the names of entities/components in the changed source code of the commits from the established bug-to-fix links, and uses them for link recovery between the reports and commits that do not share much similar texts. Our empirical evaluation on realworld projects shows that MLink can improve the state-ofthe-art bug-to-fix link recovery methods by 11-18%, 13-17%, and 8-17% in F-score, recall, and precision, respectively.
2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2013
In this paper, we present a study of repetitiveness of code changes in software evolution. Repeti... more In this paper, we present a study of repetitiveness of code changes in software evolution. Repetitiveness is defined as the ratio of repeated changes over total changes. Focusing on fine-grained code changes, we model a change as a pair of old and new AST sub-trees within a method. A change is considered repeated within or cross-project if it matches another change having occurred in the history of the project or another project, respectively. We report the following important findings. First, repetitiveness of changes could be as high as 70-100% at small sizes and decreases exponentially as size increases. Second, repetitiveness is higher and more stable in cross-project setting than in within-project one. Third, fixing changes repeat similarly to general changes. Importantly, learning code changes and recommending them in software evolution is beneficial with accuracy for top-1 recommendation of over 30% and top-3 of nearly 35%. Repeated fixing changes could also be useful for automatic program repair.
Proceedings of the 33rd International Conference on Software Engineering, 2011
Cross-cutting concerns are unavoidable and create diculties in the development and maintenance of... more Cross-cutting concerns are unavoidable and create diculties in the development and maintenance of large-scale systems. In this paper, we present a novel approach that identifies certain groups of code units that potentially share some cross-cutting concerns and recommends them for creating and updating aspects. Those code units, called concern peers, are detected based on their similar interactions (similar calling relations
2012 34th International Conference on Software Engineering (ICSE), 2012
Code completion helps improve developers' programming productivity. However, the current support ... more Code completion helps improve developers' programming productivity. However, the current support for code completion is limited to context-free code templates or a single method call of the variable on focus. Using software libraries for development, developers often repeat API usages for certain tasks. Thus, a code completion tool could make use of API usage patterns. In this paper, we introduce GraPacc, a graphbased, pattern-oriented, context-sensitive code completion approach that is based on a database of such patterns. GraPacc represents and manages the API usage patterns of multiple variables, methods, and control structures via graph-based models. It extracts the context-sensitive features from the code under editing, e.g. the API elements on focus and their relations to other code elements. Those features are used to search and rank the patterns that are most fitted with the current code. When a pattern is selected, the current code will be completed via a novel graph-based code completion algorithm. Empirical evaluation on several real-world systems shows that GraPacc has a high level of accuracy in code completion.
2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2013
PHP is a dynamic language popularly used in Web development for writing server-side code to dynam... more PHP is a dynamic language popularly used in Web development for writing server-side code to dynamically create multiple versions of client-side pages at run time for different configurations. A PHP program contains code to be executed or produced for multiple configurations/versions. That dynamism and multi-configuration nature leads to dangling references. Specifically, in the execution for a configuration, a reference to a variable or a call to a function is dangling if its corresponding declaration cannot be found. We conducted an exploratory study to confirm the existence of such dangling reference errors including dangling cross-language and embedded references in the client-side HTML/JavaScript code and in data-accessing SQL code that are embedded in scattered PHP code. Dangling references have caused run-time fatal failures and security vulnerabilities. We developed DRC, a static analysis method to detect such dangling references. DRC uses symbolic execution to collect PHP declarations/references and to approximate all versions of the generated output, and then extracts embedded declarations/references. It associates each detected declaration/reference with a conditional constraint that represents the execution paths (i.e. configurations/versions) containing that declaration/reference. It then validates references against declarations via a novel dangling reference detection algorithm. Our empirical evaluation shows that DRC detects dangling references with high accuracy. It revealed 83 yet undiscovered defects caused by dangling references.
2013 IEEE International Conference on Software Maintenance, 2013
Localizing and fixing software faults is an important maintenance task. In a dynamic Web applicat... more Localizing and fixing software faults is an important maintenance task. In a dynamic Web application, localizing the faults is challenging due to its dynamic nature and the interactions between the application and databases. The faults could occur in the statements in the host program or inside the queries that are sent from the application to be executed in the database engines. This paper presents SQLook, a novel database- aware fault localization method that is able to locate output faults in PHP statements of a dynamic Web application as well as in SQL queries. In SQLook, a PHP interpreter is instrumented to execute an SQL query and to monitor the evaluation of those SQL predicates to determine if they affect the output process of individual data records. It performs row-based slicing across PHP statements and SQL queries to record the entities that are involved in the output of each data row. Our empirical evaluation shows that SQLook can achieve higher accuracy than the state- of-the-art database-aware fault localization approach.
Uploads
Papers by Tùng Nguyễn