Papers by Marcelo Ladeira
Proceedings of the Genetic and Evolutionary Computation Conference
The performance of multiobjective algorithms varies across problems, making it hard to develop ne... more The performance of multiobjective algorithms varies across problems, making it hard to develop new algorithms or apply existing ones to new problems. To simplify the development and application of new multiobjective algorithms, there has been an increasing interest in their automatic design from component parts. These automatically designed metaheuristics can outperform their humandeveloped counterparts. However, it is still uncertain what are the most influential components leading to their performance improvement. This study introduces a new methodology to investigate the effects of the final configuration of an automatically designed algorithm. We apply this methodology to a well-performing Multiobjective Evolutionary Algorithm Based on Decomposition (MOEA/D) designed by the irace package on nine constrained problems. We then contrast the impact of the algorithm components in terms of their Search Trajectory Networks (STNs), the diversity of the population, and the hypervolume. Our results indicate that the most influential components were the restart and update strategies, with higher increments in performance and more distinct metric values. Also, their relative influence depends on the problem difficulty: not using the restart strategy was more influential in problems where MOEA/D performs better; while the update strategy was more influential in problems where MOEA/D performs the worst. CCS CONCEPTS • Computing methodologies → Continuous space search; Randomized search; • Theory of computation → Theory of randomized search heuristics.
2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA), 2021
2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)
This paper presents an analysis of data about the drop out of undergraduate engineering students ... more This paper presents an analysis of data about the drop out of undergraduate engineering students at the University of Brasilia(UnB), Brazil. In Brazil, similar to other countries, there is a representative amount of engineering students that enroll in engineering majors, however, they don't get to graduate in those majors. Information about the reason for that phenomenon is important for action on the matter by university decisionmakers. This paper aims to answer the research question: What are the main factors that motivate engineering students to drop out of engineering majors at UnB? We have collected the social and performance data of engineering students from 2009 to 2019. Some of the data can be considered rare in similar studies, like students' distance from home to campus and factors like students' leave of absence requests rather than performance factors. We used three data mining techniques: Generalized Linear Model (GLM), Boosting algorithm (GBM) and Random Forest(RF). The results of the study showed that international students deserve some attention from the university and courses like Physics 1 can be challenging for engineering students.
This paper presents a case study based on the CRISP-DM Model and the use of Text Mining tools and... more This paper presents a case study based on the CRISP-DM Model and the use of Text Mining tools and techniques to automate the Passive Transparency process at the Brazilian Ministry of Mines and Energy. Thus, a Machine Learning Model is proposed to predict the class of the technical unit responsible for the data/information requested by citizens. Through the application of the algorithm LDA and TF-IDF it was possible to map the topics of the most relevant subjects for society. The stability of the model was tested from the comparative analysis between 5 known classification algorithms (Random Forest, Multinomial NB, Linear SVC, Logistic Regression, XGBoost and Gradient Boosting). XGBoost presented better performance and precision in multiclass learning outcomes.
2015 18th International Conference on Information Fusion (Fusion), 2015
The probabilistic ontology language PR-OWL (Probabilistic OWL) uses Multi-Entity Bayesian Network... more The probabilistic ontology language PR-OWL (Probabilistic OWL) uses Multi-Entity Bayesian Networks (MEBN), an extension of Bayesian networks with first-order logic, to add the ability to deal with uncertainty to OWL, the main language of the Semantic Web. A second version, PR-OWL 2, was proposed to allow the construction of hybrid ontologies, containing deterministic and probabilistic parts. Existing PROWL implementations cannot deal with very large assertive databases. This limitation is a main obstacle for applying the language in real domains, such as Maritime Domain Awareness (MDA). This paper proposes a PR-OWL extension using RDF triplestores and the OWL 2 RL profile, based on rules, in order to allow dealing with uncertainty in ontologies with millions of assertions. We illustrate our ideas with an MDA ontology built for the PROGNOS (PRobabilistic OntoloGies for Net-centric Operation Systems) project.
Evolutionary computation, 2022
The Resource Allocation approach (RA) improves the performance of MOEA/D by maintaining a big pop... more The Resource Allocation approach (RA) improves the performance of MOEA/D by maintaining a big population and updating few solutions each generation. However, most of the studies on RA generally focused on the properties of different Resource Allocation metrics. Thus, it is still uncertain what the main factors are that lead to increments in performance of MOEA/D with RA. This study investigates the effects of MOEA/D with the Partial Update Strategy in an extensive set of MOPs to generate insights into correspondences of MOEA/D with the Partial Update and MOEA/D with small population size and big population size. Our work undertakes an in-depth analysis of the populational dynamics behaviour considering their final approximation Pareto sets, anytime hypervolume performance, attained regions and number of unique non-dominated solutions. Our results indicate that MOEA/D with Partial Update progresses with the search as fast as MOEA/D with small population size and explores the search s...
Anais do Symposium on Knowledge Discovery, Mining and Learning (KDMiLe)
The literature on computerized models that help detect, study and understand signs of mental heal... more The literature on computerized models that help detect, study and understand signs of mental health disor- ders from social media has been thriving since the mid-2000s for English speakers. In Brazil, this area of research shows promising results, in addition to a variety of niches that still need exploring. Thus, we construct a large corpus from 2941 users (1486 depressive, 1455 non-depressive), and induce machine learning models to identify signs of depression from our Twitter corpus. In order to achieve our goal, we extract features by measuring linguistic style, behavioral patterns, and affect from users’ public tweets and metadata. Resulting models successfully distinguish between depressive and non-depressive classes with performance scores comparable to results in the literature. We hope that our findings can become stepping stones towards more methodologies being applied at the service of mental health.
2018 IEEE Congress on Evolutionary Computation (CEC)
This paper presents a comparative study between two data mining techniques: Genetic Programming (... more This paper presents a comparative study between two data mining techniques: Genetic Programming (GP) and Deep Learning (DL). This comparison will be based on the cart pole balancing problem. We also compared the results with Q-Learning (QL), a classic algorithm that is also used in hybridizations with GP an DL for reinforcement learning problems. Our results presented that GP can rival DL for this kind of problem.
The National Institute of Educational Research and Studies (INEP) provides ENADE data for Higher ... more The National Institute of Educational Research and Studies (INEP) provides ENADE data for Higher Education Institutions (IES) from Brazil. This data is a rich source of support in improving the quality of education offered by these IES, but requires the application of data mining techniques to achieve the standards of the learning process and thus achieve improved academic performance of students in different courses. This paper aims to present the steps of mining the data provided by INEP, which will enable the identification of standards for the IES analyzed, as well as serve as a guide for other IES that wish to follow a similar process.
Anais do Symposium on Knowledge Discovery, Mining and Learning (KDMiLe), Nov 18, 2019
The literature on computerized models that help detect, study and understand signs of mental heal... more The literature on computerized models that help detect, study and understand signs of mental health disorders from social media has been thriving since the mid-2000s for English speakers. In Brazil, this area of research shows promising results, in addition to a variety of niches that still need exploring. Thus, we construct a large corpus from 2941 users (1486 depressive, 1455 non-depressive), and induce machine learning models to identify signs of depression from our Twitter corpus. In order to achieve our goal, we extract features by measuring linguistic style, behavioral patterns, and affect from users' public tweets and metadata. Resulting models successfully distinguish between depressive and non-depressive classes with performance scores comparable to results in the literature. We hope that our findings can become stepping stones towards more methodologies being applied at the service of mental health.
2016 11th Iberian Conference on Information Systems and Technologies (CISTI), 2016
This article presents the partial results of a research, which was used advanced programming with... more This article presents the partial results of a research, which was used advanced programming with Analytic Hierarchy Process (AHP) using Expert Choice software to identify critical components of Information Technology (IT) that impact an organization and its relationship with the strategic objectives. The study is a qualitative and quantitative approach; with a literature search needed for its theoretical framework; the its development, with details of Delphi, AHP, Software Expert Choice. During the research, it created a model for this work with ABNT standards (ISO 27005, 38500 and 31000).
Abstract. Bayesian network is a graphical model appropriated to represent and to analyze uncertai... more Abstract. Bayesian network is a graphical model appropriated to represent and to analyze uncertainty, knowledge and beliefs contained implicitly in the data. In this paper we propose the XPC algorithm for structural learning in Bayesian networks using decomposable ...
Proceedings of the 4th International Workshop on Pattern Recognition in Information Systems
Knowledge Discovery in Databases (KDD) is the process by which unknown and useful knowledge and i... more Knowledge Discovery in Databases (KDD) is the process by which unknown and useful knowledge and information are extracted, by automatic or semi-automatic methods, from large amounts of data. Along the evolution of Information Technology and the rapid growth in the number and size of databases, the development of methodologies, techniques, and tools for data mining has become a major concern for researchers, and has led, in turn, to the development of applications in a variety of areas of human activity. About 1997, the processes and techniques associated with cluster analysis had begun to be researched with increasing intensity by the KDD community. Within the context of a model intended to support decisions based on cluster analysis, prior knowledge about the data structure and the application domain can be used as important constraints that lead to better results in the clusters' configurations. This paper presents an application of cluster analysis in the area of public safety using a schema that takes into account the burden of prior knowledge acquired from statistical analysis on the data. Such an information was used as a bias for the k-means algorithm that was applied to identify the dactyloscopic (fingerprint) profile of criminals in the Brazilian capital, also known as Federal District. These results was then compared with a similar analysis that disregarded the prior knowledge. It is possible to observe that the analysis using prior knowledge generated clusters that are more coherent with the expert knowledge.
One of the main algorithms for solving Multi-Objective Optimization Problems is the Multi-Objecti... more One of the main algorithms for solving Multi-Objective Optimization Problems is the Multi-Objective Evolutionary Algorithm Based on Decomposition (MOEA/D). It is characterized by decomposing the multiple objectives into a large number of single-objective subproblems, and then solving these subproblems in parallel. Usually, these subproblems are considered equivalent, but there are works that indicate that some subproblems can be more difficult than others, and that spending more computational resources in these subproblems can improve the performance of MOEA/D. One open question about this strategy of “Resource Allocation” is: what should be the criteria for allocating more computational effort on one problem or another? In this work we investigate this question. We study four different ways to prioritize subproblems: Randomly, Relative Improvement, Diversity in Decision Space (proposed in this work), and inverted Diversity in Decision Space (also proposed in this work). We compare ...
2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA), 2019
Many of the criminal cases analysed by the Prosecution Office of the Federal District and Territo... more Many of the criminal cases analysed by the Prosecution Office of the Federal District and Territories are repetitive and processing them can be streamlined by providing similar previous cases as template. We investigate the use of information retrieval techniques to enable automated identification of similar cases and evaluate if semantic search performs better than lexical search in the task of assisting legal opinion writing. As a proof of concept, syntactic indexing (TF-IDF and BM25) and semantic indexing (Latent Semantic Indexing - LSI and Latent Dirichlet Allocation - LDA) techniques were evaluated using document collections from two public prosecutors offices. In addition, we evaluate model enrichment with the use of recorded data about the cases, and also with the legal norm citations observed in documents. Baseline document collections sampled from full document collection from two public prosecutors offices were used for model evaluation utilizing Normalized Discounted Cumu...
Although several languages have been proposed for dealing with uncertainty in the Semantic Web (S... more Although several languages have been proposed for dealing with uncertainty in the Semantic Web (SW), almost no support has been given to ontological engineers on how to create such probabilistic ontologies (PO). This task of modeling POs has proven to be extremely difficult and hard to replicate. This paper presents the first tool in the world to implement a process which guides users in modeling POs, the Uncertainty Modeling Process for Semantic Technologies (UMP-ST). The tool solves three main problems: the complexity in creating POs; the difficulty in maintaining and evolving existing POs; and the lack of a centralized tool for documenting POs. Besides presenting the tool, which is implemented as a plug-in for UnBBayes, this papers also presents how the UMP-ST plug-in could have been used to build the Probabilistic Ontology for Procurement Fraud Detection and Prevention in Brazil, a proof-of-concept use case created as part of a research project at the Brazilian Office of the Gen...
Multi-objective Optimization Problems (MOP) are minimization† problems characterized by multiple,... more Multi-objective Optimization Problems (MOP) are minimization† problems characterized by multiple, conflicting objective functions. It arises in real world applications that require a compromise among multiple objectives. The set of optimal trade off solutions in the decision space is the Pareto Set (PS), and the image of this set in the objective space is the Pareto Front (PF). Finding a good approximation of the Pareto Front is a hard problem for which multiple Evolutionary Algorithms have been proposed, 1) . The Multi-Objective Evolutionary Algorithm Based on Decomposition, (MOEA/D) 2) is an effective algorithm for solving MOPs. The main characteristic of MOEA/D is to decompose the multi-objective optimization problem into a set of single objective subproblems. It has been observed that some subproblems require more attention than others, and take more effort to converge to an optimal solution 3) wasting computational effort by trying to improve solutions that are not very promisi...
2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA), 2020
Financial institutions handle with hundreds of thousands of wire transactions per day and need to... more Financial institutions handle with hundreds of thousands of wire transactions per day and need to ensure security and quality for their customers. Searching on predefined patterns is insufficient to identify frauds due to continuous evolution of fraudulent methods used by criminals. Systems used for this purpose are based on the application of some methods of Artificial Intelligence, neglect human process analysis and make little use of Visual Analytics (VA) techniques. Frauds detection domain involves time-oriented and multivariate aspects to identify anomalous transactions making fraud detection a difficult task. We propose the creation of a model for each customer based on his/her behavior, using techniques of identification of outliers and conducting analysis through VA to reduce the false positive rate in the identification of fraudulent financial transactions process. We apply this approach to a real Brazilian financial institution with a daily volume of more than 30 million o...
One of the major weaknesses of current research on the Semantic Web (SW) is the lack of proper me... more One of the major weaknesses of current research on the Semantic Web (SW) is the lack of proper means to represent and reason with uncertainty. A number of recent efforts from the SW community, the W3C, and others have recently emerged to address this gap. Such efforts have the positive side effect of bringing together two fields of research that have been apart for historical reasons, the artificial intelligence and the SW communities. One example of the potential research gains of this convergence is the current development of Probabilistic OWL (PROWL), an extension of the OWL Web Ontology Language that provides a framework to build probabilistic ontologies, thus enabling proper representation and reasoning with uncertainty within the SW context. PR-OWL is based on Multi-Entity Bayesian Networks (MEBN), a first-order probabilistic logic that combines the representational power of first-order logic (FOL) and Bayesian Networks (BN). However, PR-OWL and MEBN are still in development, ...
Proceedings of the 12th International Conference on Management of Digital EcoSystems
Effective retrieval of jurisprudence (case-law) is imperative to achieve consistency and predicta... more Effective retrieval of jurisprudence (case-law) is imperative to achieve consistency and predictability for any legal system. In this work, we propose and proceed to an empirical evaluation of a framework for jurisprudence retrieval of the Brazilian Superior Court of Justice in order to ease the task of retrieval of other decisions with the same legal opinion. The experimental results shown that our approach based on text similarity performs better than the legacy system of the Court based on Boolean queries. The building of complex Boolean queries is very specialized and we aim to offer a tool able to use free text as queries without any operator. With the legacy system as baseline, we compare the TF-IDF traditional retrieval model, the BM25 probabilistic model and the Word2Vec model. Our results indicate that the Word2Vec Skip-Gram model, trained on a specialized legal corpus and BM25 yield similar performance and surpasses the legacy system. Combining BM25 model with embedding models improved the performance up to 19%.
Uploads
Papers by Marcelo Ladeira