Papers by Mehmed Kantardzic
IEEE eBooks, 2011
This chapter contains sections titled: Introduction Gene Rating a Taxonomy Viewing the Taxonomy E... more This chapter contains sections titled: Introduction Gene Rating a Taxonomy Viewing the Taxonomy Editing the Taxonomy Validation Text Analysis Versus Structured Information Usage Scenario: Interactive Text Mining on Discussion Forum Data Conclusions This chapter contains sections titled: Acknowledgments References ]]>
Explainability techniques have been recently postulated as being effective for localizing medical... more Explainability techniques have been recently postulated as being effective for localizing medical regions of interest. However, there has been recent criticisms regarding the validity of these approaches in terms of robustness and accuracy. This study explores if combining two saliency methods, namely Saliency Maps and Grad-CAM, improves the robustness and accuracy of localization. The experiments are three-part. First, the accuracy of localization is measured and repeatability experiments are performed to determine if the results are repeatable across model architectures. Second, cascading randomization experiments are conducted to determine the robustness to changes in model parameters. Third, we measure the overlap between Saliency Maps and Grad-CAM and show there is a relationship between the overlap and accuracy of prediction.
Computational Intelligence and Neuroscience, Oct 14, 2022
With the increase of real-time stream data, knowledge discovery from stream data becomes more and... more With the increase of real-time stream data, knowledge discovery from stream data becomes more and more important, which requires an e cient data structure to store transactions and scan sliding windows once to discover frequent itemsets. We present a new method named Linking Compact Tree (LCTree). We designed an algorithm by using an improved data structure to create objective tree, which can nd frequent itemsets with linear complexity. Secondly, we can merge items in sliding windows by one scan with Head Linking List data structure. ird, by implementing data structure of Tail Linking List, we can locate the obsolete nodes and remove them easily. Finally, LCTree is able to nd all exact frequent items in data stream with reduced time and space complexity by using such a linear data structure. Experiments on datasets with di erent sizes and types were conducted to compare the proposed LCTree technique with well-known frequent item mining methods including Cantree, FP-tree, DSTree, CPSTree, and Gtree. e results of experiments show presented algorithm has better performance than other methods, and also con rm that it is a promising solution for detecting frequent item sets in real time applications.
Techniques for preprocessing data for data mining are discussed. Issues include scaling numerical... more Techniques for preprocessing data for data mining are discussed. Issues include scaling numerical data, attribute transformation, dealing with missing values, representation of time-dependent data, and outlier detection.
We study the problem of correcting spelling mistakes in text using memory-based learning techniqu... more We study the problem of correcting spelling mistakes in text using memory-based learning techniques and a very large database of token n-gram occurrences in web text as training data. Our approach uses the context in which an error appears to select the most likely candidate from words which might have been intended in its place. Using a novel correction algorithm and a massive database of training data, we demonstrate higher accuracy on correcting realword errors than previous work, and very high accuracy at a new task of ranking corrections to non-word errors given by a standard spelling correction package.
The International review of retail, distribution and consumer research, Apr 29, 2021
Retailers are facing challenges in making sense of the significant amount of data for better unde... more Retailers are facing challenges in making sense of the significant amount of data for better understanding of their customers. While retail analytics plays an increasingly important role in successful retailing management, comprehensive store segmentation based on a Data Mining-based Retail Analytics is still an under-researched area. This study seeks to address this gap by developing a novel approach to segment the stores of retail chains based on "purchasing behavior of customers" and applying it in a case study. The applicability and benefits of using Data Mining techniques to examine purchasing behavior and identify store segments are demonstrated in a case study of a global retail chain in Istanbul, Turkey. Over 600K transaction data of a global grocery retailer are analyzed and 175 stores in Ä°stanbul are successfully segmented into five segments. The results suggest that the proposed new retail analytics approach enables the retail chain to identify clusters of stores in different regions using all transaction data and advances our understanding of store segmentation at the store level. The proposed approach will provide the retail chain the opportunity to manage store clusters by making data-driven decisions in marketing, customer relationship management, supply chain management, inventory management and demand forecasting.
JMIR infodemiology, Jan 23, 2023
Background: COVID-19 has introduced yet another opportunity to web-based sellers of loosely regul... more Background: COVID-19 has introduced yet another opportunity to web-based sellers of loosely regulated substances, such as cannabidiol (CBD), to promote sales under false pretenses of curing the disease. Therefore, it has become necessary to innovate ways to identify such instances of misinformation. Objective: We sought to identify COVID-19 misinformation as it relates to the sales or promotion of CBD and used transformer-based language models to identify tweets semantically similar to quotes taken from known instances of misinformation. In this case, the known misinformation was the publicly available Warning Letters from Food and Drug Administration (FDA). Methods: We collected tweets using CBD-and COVID-19-related terms. Using a previously trained model, we extracted the tweets indicating commercialization and sales of CBD and annotated those containing COVID-19 misinformation according to the FDA definitions. We encoded the collection of tweets and misinformation quotes into sentence vectors and then calculated the cosine similarity between each quote and each tweet. This allowed us to establish a threshold to identify tweets that were making false claims regarding CBD and COVID-19 while minimizing the instances of false positives. Results: We demonstrated that by using quotes taken from Warning Letters issued by FDA to perpetrators of similar misinformation, we can identify semantically similar tweets that also contain misinformation. This was accomplished by identifying a cosine distance threshold between the sentence vectors of the Warning Letters and tweets. Conclusions: This research shows that commercial CBD or COVID-19 misinformation can potentially be identified and curbed using transformer-based language models and known prior instances of misinformation. Our approach functions without the need for labeled data, potentially reducing the time at which misinformation can be identified. Our approach shows promise in that it is easily adapted to identify other forms of misinformation related to loosely regulated substances.
In the United States, marijuana is considered illegal in several states, whereas some states have... more In the United States, marijuana is considered illegal in several states, whereas some states have decriminalized certain amounts and others have legalized marijuana for medical and recreational uses. We have created a novel approach that examines if state policy on marijuana impacts the amount and type of conversations regarding marijuana on Twitter, as well as the social networks of the those who contribute to marijuana conversations. The findings in this research may be useful for research in policy reform, law enforcement, and public health.
Technometrics, Aug 1, 2003
[1991 Proceedings] 6th Mediterranean Electrotechnical Conference
An electronic mail system, built on a distributed knowledge base and providing a set of useful op... more An electronic mail system, built on a distributed knowledge base and providing a set of useful operations in the very comfortable way, is described. In fact, all of the advanced mail operations provided by the system are based on a good and precise description of documents and users. Such operations simplify finding useful information about existing documents, sending the documents without specifying the user's address, and prioritizing, sorting, selecting, discharging, and forwarding mail. Electronic mail operations are also improved with the possibility of easier processing of the external documents and exploiting the capabilities of a modern workstation equipped with image input-output devices.<<ETX>>
Springer eBooks, 2015
Stock-out event in retail business represents a situation in which demanded item cannot be found ... more Stock-out event in retail business represents a situation in which demanded item cannot be found by customer in the expected location or is not in a saleable condition. Frequent stock-outs remain one of the biggest issues in the retail business because they directly contribute to lost sales and reduced profi ts, and indirectly contribute to reduced loyalty and potential loss of customers. Although the stock-outs can occur anywhere in the entire supply chain, literature confi rmed that the most of most of stock-outs occur at the store level. A number of researchers have tried to reveal the product and store related drivers and the factors that contribute to lower product availability. Identifi cation of stock-outs was usually performed using the point-of-sale (POS) estimation method or manual audit method, so the results and conclusions were mostly based on a small number stores and products, and they were observed in a shorter period of time. In this research, probit regression was used to examine the relationship between various store-related drivers and product availability. The data sample included 115 SKUs and 98 stores and the data was provided by a large grocery retailer in Serbia. To identify stock-outs on a large data sample, a perpetual inventory (PI) aggregation method was selected. The store related variables that were determined to be the drivers of stockout performance include distance from distribution center, average store sale and stock-keepingunit density as the most the most prominent driver. Especially high probability of stock-out can be expected when stock-keeping-unit density and average store sale are high at the same time. On the other hand, it was observed that the income level of the population living in the store area does not have a signifi cant infl uence on stock-out performance at store level.
Social networking, 2015
In this article, we are initiating the hypothesis that improvements in short term energy load for... more In this article, we are initiating the hypothesis that improvements in short term energy load forecasting may rely on inclusion of data from new information sources generated outside the power grid and weather related systems. Other relevant domains of data include scheduled activities on a grid, large events and conventions in the area, equipment duty cycle schedule, data from call centers, real-time traffic, Facebook, Twitter, and other social networks feeds, and variety of city or region websites. All these distributed data sources pose information collection, integration and analysis challenges. Our approach is concentrated on complex non-cyclic events detection where detected events have a human crowd magnitude that is influencing power requirements. The proposed methodology deals with computation, transformation, modeling, and patterns detection over large volumes of partially ordered, internet based streaming multimedia signals or text messages. We are claiming that traditional approaches can be complemented and enhanced by new streaming data inclusion and analyses, where complex event detection combined with Webbased technologies improves short term load forecasting. Some preliminary experimental results, using Gowalla social network dataset, confirmed our hypothesis as a proof-of-concept, and they paved the way for further improvements by giving new dimensions of short term load forecasting process in a smart grid.
This chapter contains sections titled: Learning Machine Statistical Learning Theory Types of Lear... more This chapter contains sections titled: Learning Machine Statistical Learning Theory Types of Learning Methods Common Learning Tasks Model Estimation Review Questions and Problems References for further study
Tehnicki Vjesnik-technical Gazette, Jun 1, 2018
Autonomous cooperative driving systems require the integration of research activities in the fiel... more Autonomous cooperative driving systems require the integration of research activities in the field of embedded systems, robotics, communication, control and artificial intelligence in order to create a secure and intelligent autonomous drivers behaviour patterns in the traffic. Beside autonomous vehicle management, an important research focus is on the cooperation behaviour management. In this paper, we propose hybrid automaton modelling to emulate flexible vehicle Platoon and vehicles cooperation interactions. We introduce novel coding function for Platoon cooperation behaviour profile generation in time, which depends of vehicles number in Platoon and behaviour types. As the behaviour prediction of transportation systems, one of the primarily used methods of artificial intelligence in Intelligent Transport Systems, we propose an approach towards NARX neural network prediction of Platoon cooperation behaviour profile. With incorporation of Platoon manoeuvres dynamic prediction, which is capable of analysing traffic behaviour, this approach would be useful for secure implementation of real autonomous vehicles cooperation.
Surgical laparoscopy, endoscopy & percutaneous techniques, Feb 1, 2006
Our objectives were to assess the safety and efficacy of different insufflation methods in women ... more Our objectives were to assess the safety and efficacy of different insufflation methods in women undergoing laparoscopy and to develop a model for selection of the appropriate insufflation technique based on the patient's characteristics and surgeon's experience. We performed a retrospective analysis of laparoscopic procedures on 3086 women over a 13-year period at the
John Wiley & Sons, Inc. eBooks, Oct 5, 2011
This chapter contains sections titled: Introduction Data-Mining Roots Data-Mining Process Large D... more This chapter contains sections titled: Introduction Data-Mining Roots Data-Mining Process Large Data Sets Data Warehouses for Data Mining Business Aspects of Data Mining: Why a Data-Mining Project Fails Organization of This Book Review Questions and Problems References for Further Study
arXiv (Cornell University), Mar 30, 2017
With the popularity of massive open online courses (MOOCs), grading through crowdsourcing has bec... more With the popularity of massive open online courses (MOOCs), grading through crowdsourcing has become a prevalent approach towards large scale classes. However, for getting grades for complex tasks, which require specific skills and efforts for grading, crowdsourcing encounters a restriction of insufficient knowledge of the workers from the crowd. Due to knowledge limitation of the crowd graders, grading based on partial perspectives becomes a big challenge for evaluating complex tasks through crowdsourcing. Especially for those tasks which not only need specific knowledge for grading, but also should be graded as a whole instead of being decomposed into smaller and simpler sub-tasks. We propose a framework for grading complex tasks via multiple views, which are different grading perspectives defined by experts for the task, to provide uniformity. Aggregation algorithm based on graders' variances are used to combine the grades for each view. We also detect bias patterns of the graders, and de-bias them regarding each view of the task. Bias pattern determines how the behavior is biased among graders, which is detected by a statistical technique. The proposed approach is analyzed on a synthetic data set. We show that our model gives more accurate results compared to the grading approaches without different views and de-biasing algorithm.
Uploads
Papers by Mehmed Kantardzic