Papers by Basabi Chakraborty
Advances in Intelligent Systems and Computing, 2017
A graph-theoretic approach is presented in this paper to visually represent feature association i... more A graph-theoretic approach is presented in this paper to visually represent feature association in data sets. This visual representation of feature association, which has been named as Feature Association Map (FAM), is based on similarity between features measured using pair-wise Pearson’s product moment correlation coefficient. Highly similar features will appear as clusters in the graph visualization. Data sets with high number of features as part of feature clusters will indicate the possibility of strong feature association. The efficacy of this method has been demonstrated in ten publicly available data sets. FAM can be applied effectively in the area of feature selection.
Proceedings of the IEEE International Symposium on Industrial Electronics ISIE-02, 2002
... Soft com-puting tools and their hybridization become an active research area for solving real... more ... Soft com-puting tools and their hybridization become an active research area for solving real world pattern recogni-tion problems. In the present work a genetic-fuzzy hybrid algo-rithm has been proposed for feature subset selection. ...
Expert Systems with Applications, 2017
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service... more This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. Highlights • FCTFS works in both autonomous and user guided mode. • The defined taxonomy helps in arriving at optimal number of good quality clusters. • Feature elimination due to irrelevance and redundancy is clearly isolated. • It is faster than traditional search based methods. • Yields superior results compared to some state of the art methods over 24 data sets.
Evolutionary Intelligence, 2020
Market prediction is important for well-organized portfolio management with wise selection of inv... more Market prediction is important for well-organized portfolio management with wise selection of investments. As share market prices change dynamically depending on various factors, manual tracking is difficult. Machine learning tools are now becoming popular for automatic prediction and recommendation for stock trading. In this work, the objective is to apply popular machine learning techniques for time series clustering on real ETF (Exchange Traded Funds) data and conduct a performance comparison of the results. Here four clustering methods are used which are: 1) Hierarchical clustering with Euclidean distance, 2) k-means clustering with a) Euclid distance (ED) and b) Dynamic Time Warping (DTW) as the distance measures, 3) k-means with shape based distance measure (k-Shape) and 4) k-means with a newly developed shape based transformation along with DTW as a similarity measure LTAA (Log Time Axis Area). The clustering results on 20 ETF data from Tokyo Stock Exchange have been analyzed...
Expert Systems with Applications, 2017
The algorithm visually partitions the redundant and non-redundant features • Visual partition ena... more The algorithm visually partitions the redundant and non-redundant features • Visual partition enables right strategy adoption for right group of features • Graph-theoretic principle (vertex cover,independent set) used for subset selection • Algorithm applies for both supervised as well as unsupervised feature selection • Better results (accuracy/purity) than benchmark supervised/unsupervised algorithms
International Journal on Smart Sensing and Intelligent Systems, 2020
Multisensor time series data is common in many ap- plications of process industry, medical and he... more Multisensor time series data is common in many ap- plications of process industry, medical and health care, biometrics etc.Analysis of multisensor time series data requires analysis of multidimensional time series(MTS) which is challenging as they constitute a huge volume of data of dynamic nature. Traditional machine learning algorithms for classification and clustering developed for static data can not be applied directly to MTS data. Various techniques have been developed to represent MTS data in a suitable manner for analysis by popular machine learning algorithms. Though a plethora of different approaches have been developed so far, 1NN classifier based on dynamic time warping (DTW) has been found to be the most popular due to its simplicity. In this work, an approach for time series classification is proposed based on multidimensional delay vector representation of time series. Multivariate time series is considered here as a group of single time series and each time series is...
2022 International Electronics Symposium (IES)
TENCON 2019 - 2019 IEEE Region 10 Conference (TENCON), 2019
With the increasing use of smartphones, lots of smartphone based applications have been developed... more With the increasing use of smartphones, lots of smartphone based applications have been developed. Smart-phones are used in personal health care or monitoring activities of elderly persons. These types of smartphone applications require continuous authentication of the user for taking action in case of detachment of the smartphone from the user due to forgetfulness or theft. Continuous authentication on smartphone requires authentication process having low computational overhead. In this work, the objective is to develop low cost user authentication algorithm from time series data of user activities taken from sensors like accelerometer or gyroscope. Deep neural networks are used for user authentication. A two-step authentication process has been developed in which sensor data has been first classified into different activities and activity dependent authentication is proposed. For lowering computational cost of classifier, knowledge distillation is used to reduce the model parameters. Fine tuning is used to cope with the limited number of training data. As a result the authentication accuracy has been improved by 5% to 10%, also authentication time of 0.032 sec has been achieved which is useful for real time authentication. Simulation studies have been done by several bench mark data sets to evaluate the efficiency of the proposed approach.
IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2002
2021 IEEE 4th International Conference on Knowledge Innovation and Invention (ICKII), 2021
Human activity recognition (HAR) approaches play significant roles in understanding behavioral ac... more Human activity recognition (HAR) approaches play significant roles in understanding behavioral activities. These tools allow personalized services and support for users in varied circumstances. Hence accurate prediction of user activity is valuable for several applications such as elderly care, healthcare, sports, and smart homes. In recent times, machine learning with deep learning models, particularly Convolutional Neural Networks (CNN), are becoming popular for human activity recognition. However, our earlier studies have proved that Recurrent Neural Networks (RNN) produce superior results to CNN. In the current study, a new approach, Long Short Term Memory(LSTM) neural network and Particle Swarm Optimization (PSO) algorithm are proposed. Though LSTM models produce good results in HAR, there are few parameters to consider for tracking down the optimal LSTM model. Therefore PSO algorithm is utilized for the optimization of the parameters. It possesses a quick convergence rate and the ability to enhance the prediction accuracy of LSTM in contrast to popular methods. After the training of the LSTM model, the weights are further optimized by using the PSO algorithm and are substituted into the model during the verification stage. This study investigates the performance of the LSTM-PSO model compared with few recurrent neural networks. For the implementation of the model, a benchmark WISDM dataset is utilized. The simulation results indicate that the proposed approach refines the recognition rate up to 97% which exceeds those of the compared models.
2004 IEEE International Conference onComputational Intelligence for Measurement Systems and Applications, 2004. CIMSA.
Abstract The process of knowledge discovery from vast real life data is encountered with varietie... more Abstract The process of knowledge discovery from vast real life data is encountered with varieties of problems like, presence of noise and outliers in the data set, selection of proper subset of attributes (features) from a large number of relevant and irrelevant attributes, ...
2004 IEEE International Conference onComputational Intelligence for Measurement Systems and Applications, 2004. CIMSA.
Search for optimal route from source to destination is a well-known optimization problem and lot ... more Search for optimal route from source to destination is a well-known optimization problem and lot of good solutions like Dijkstra algorithm, Bellman-Ford algorithm etc. are available with practical applications. But simultaneous search for multiple semioptimal routes are difficult with the above mentioned solutions as they produce the best one at a time. Genetic algorithm (GA) based solutions are currently available for simultaneous search of multiple routes. But the problem in finding multiple routes is that the selected routes resemble each other i.e., partly overlap. In this paper a GA based algorithm with a novel fitness function has been proposed for simultaneous search of multiple routes avoiding overlapping. Using a portion of real road map the simulation of the proposed algorithm and other currently available algorithm are done. The simulation results demonstrate the effectiveness of the proposed algorithm over other algorithms.
International Journal of Computational Science and Engineering, 2012
Social media, which enable people to easily communicate and effectively share the information thr... more Social media, which enable people to easily communicate and effectively share the information through the web, are rapidly spreading recently. In such media, effective topic extraction technique from messages has been significant so that trend topics and their reputation can be recognised. However, since messages contain redundancy and topic boundaries are ambiguous, it is difficult to extract appropriate topics. As the first step for topic extraction, this paper proposes an effective measure to automatic determination of appropriate number of topics based on the intra-cluster distance and the inter-cluster distance among topic clusters. We present our experimental results to show the effectiveness of our proposed approach.
IEEE Transactions on Neural Networks, 2000
Unsupervised learning is used to categorize multidimensional data into a number of meaningful cla... more Unsupervised learning is used to categorize multidimensional data into a number of meaningful classes on the basis of the similarity or correlation between individual samples. In neural-network implementation of various unsupervised algorithms such as principal component analysis (PCA), competitive learning or self-organizing map (SOM), sample vectors are normalized to equal lengths so that similarity could be easily and efficiently obtained by their dot products. In general, sample vectors span the whole multidimensional feature space and existing normalization methods distort the intrinsic patterns present in the sample set. In this work, a novel method of normalization by mapping the samples to a new space of one more dimension has been proposed. The original distribution of the samples in the feature space is shown to be almost preserved in the transformed space. Simple rules are given to map from original space to the normalized space and vice versa.
Proceedings of the Computer Graphics International Conference
It has been very significant to visualize time series big data. In the paper we shall discuss des... more It has been very significant to visualize time series big data. In the paper we shall discuss design on time series statistical data. As an example, we present an animation of Gibbs sampling process to clarify the time changes. Gibbs sampling is widely used MCMC algorithm in the deep learning field. We consider that an additional z-axis coordinate or a time line are helpful for the visualization and the functions could be implemented automatically by some kind of chart-wizards. We shall discuss the design rules and tips on visualization of time series data.
2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)
Social media offers a wealth of insight into how significant topics such as the Great East Japan ... more Social media offers a wealth of insight into how significant topics such as the Great East Japan Earthquake, the Arab Spring, and the Boston Bombing affect individuals. The scale of available data, however, can be intimidating: during the Great East Japan Earthquake, over 8 million tweets were sent each day from Japan alone. Conventional word vector-based social media analysis method using Latent Semantic Analysis, Latent Dirichlet Allocation, or graph community detection often cannot scale to such a large volume of data due to their space and time complexity. To overcome the scalability problem, in this paper, high performance Singular Vector Decomposition (SVD) library redsvd has been used to identify topics over time from the huge data set of over two hundred million tweets sent in the 21 days following the Great East Japan Earthquake. While we begin with word count vectors of authors and words for each time slot (in our case, every hour), authors' clusters from each slot are extracted by SVD and k-means. And then, the original fast feature selection algorithm named CWC has been used to extract discriminative words from each cluster. As a result, authors' clusters recognized as topics as well as issues of conventional social media analysis method for big data can be visualized overcoming the scalability problem.
International Journal of Intelligent Transportation Systems Research
Distracted driving is one of the main cause of traffic accidents. Car manufacurers are now develo... more Distracted driving is one of the main cause of traffic accidents. Car manufacurers are now developing various driving support systems to ensure safe driving because it is an important activity of people as their major means of transportation. In this work, we have examined the method of detecting distracted driving from the driving data collected from different sensors attached to a driving simulator while driving with various road conditions and cognitive loads. In our study, we used a driving simulator for collecting data of drivers while driving in normal state with concentration and in distracted state by imposing cognitive load to simulate cognitive distraction. Based on the collected data, we developed driver specific model of driving behaviour in several scenario with increasing cognitive load and attempted to detect distracted driving in real time from the individual driving model to send alert to the driver. We explored machine learning algorithms including deep neural networks for the proposed development of real time cognitive distraction detection method from driving data. It is found that different drivers have different driving behaviour and use of personal driving model is important for the detection of distracted driving in real time. It is also found that convolutional neural network (CNN) is a promising tool for the development of a personalized driving assistance system which can detect distracted driving for alerting a driver in real time.
Symmetry, 2021
Accurate global horizontal irradiance (GHI) forecasting is crucial for efficient management and f... more Accurate global horizontal irradiance (GHI) forecasting is crucial for efficient management and forecasting of the output power of photovoltaic power plants. However, developing a reliable GHI forecasting model is challenging because GHI varies over time, and its variation is affected by changes in weather patterns. Recently, the long short-term memory (LSTM) deep learning network has become a powerful tool for modeling complex time series problems. This work aims to develop and compare univariate and several multivariate LSTM models that can predict GHI in Guntur, India on a very short-term basis. To build the multivariate time series models, we considered all possible combinations of temperature, humidity, and wind direction variables along with GHI as inputs and developed seven multivariate models, while in the univariate model, we considered only GHI variability. We collected the meteorological data for Guntur from 1 January 2016 to 31 December 2016 and built 12 datasets, each c...
Uploads
Papers by Basabi Chakraborty