Natural disasters destroy valuable resources and are necessary to recognize so that appropriate s... more Natural disasters destroy valuable resources and are necessary to recognize so that appropriate strategies may be designed. In recent past, social networks are very good source to gather event specific information. This working notes paper is based on the task of Disaster Image Retrieval from Social Media dataset (DRISM), as a part of MediaEval, 2017. The Dataset of images and their relevant metadata is taken from various social networks including Twitter and Flicker. An ensemble approach is adopted in this paper where different visual and metadata features are integrated. Kernel Discriminant analysis using spectral regression is then used as dimensionality reduction technique. Mean Average Precision (MAP) at various cutoffs are reported in this paper.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
The use of attention models for automated image captioning has enabled many systems to produce ac... more The use of attention models for automated image captioning has enabled many systems to produce accurate and meaningful descriptions for images. Over the years, many novel approaches have been proposed to enhance the attention process using different feature representations. In this paper, we extend this approach by creating a guided attention network mechanism, that exploits the relationship between the visual scene and text-descriptions using spatial features from the image, high-level information from the topics, and temporal context from caption generation, which are embedded together in an ordered embedding space. A pairwise ranking objective is used for training this embedding space which allows similar images, topics and captions in the shared semantic space to maintain a partial order in the visual-semantic hierarchy and hence, helps the model to produce more visually accurate captions. The experimental results based on MSCOCO dataset shows the competitiveness of our approach, with many state-of-the-art models on various evaluation metrics.
This paper illustrates our approach to the shared task on similar language translation in the fif... more This paper illustrates our approach to the shared task on similar language translation in the fifth conference on machine translation (WMT-20). Our motivation comes from the latest state of the art neural machine translation in which Transformers and Recurrent Attention models are effectively used. A typical sequence-sequence architecture consists of an encoder and a decoder Recurrent Neural Network (RNN). The encoder recursively processes a source sequence and reduces it into a fixed-length vector (context), and the decoder generates a target sequence, token by token, conditioned on the same context. In contrast, the advantage of transformers is to reduce the training time by offering a higher degree of parallelism at the cost of freedom for sequential order. With the introduction of Recurrent Attention, it allows the decoder to focus effectively on order of the source sequence at different decoding steps. In our approach, we have combined the recurrence based layered encoder-decod...
Disastrous situations can be better managed by availability of timely and relevant information. S... more Disastrous situations can be better managed by availability of timely and relevant information. Social media plays very important role in providing information of disastrous events. The working paper is based on "Multimedia satellite task: Emergency response for flooding events", as a part of MediaEval, 2018. The dataset was provided with tweets and their respective images, which may or may not acquire evidence of roads and its status of pass-ability. An ensemble approach is followed in this paper by combining local features and global features of images. Text contents of the tweets were processed by their TF-IDF scores. Moreover, two level classification is performed by applying Spectral Regression based Kernel Discriminant Analysis (SRKDA) on individual feature categories as well as ensemble of different feature types. It is observed that F1 score produced by visual, text and ensemble of both text and visual features for evidence of road remain 74.58%, 58.30% and 76.61% ...
ing for each document. There are several possible extensions to this work: The proposed documen... more ing for each document. There are several possible extensions to this work: The proposed document clustering approach has many practical applications. One direction is to apply this technique on some specific application area along with application specific optimizations to see the outcome. For example: web search results can be clustered using this approach. The snippets for each cluster are generated to see the quality of these snippets. In the proposed approach each term, whether it is from lexical chain or from topic maps, has an equal effect on similarity calculation for a pair of documents. One possible direction is to introduce discriminative feature weighting for the features in this approach. Discriminative feature weighting has encouraging results for both text clustering and classification tasks.
Cluster analysis of textual documents is a common technique for better ltering, navigation, under... more Cluster analysis of textual documents is a common technique for better ltering, navigation, under-standing and comprehension of the large document collection. Document clustering is an autonomous methodthat separate out large heterogeneous document collection into smaller more homogeneous sub-collections calledclusters. Self-organizing maps (SOM) is a type of arti cial neural network (ANN) that can be used to performautonomous self-organization of high dimension feature space into low-dimensional projections called maps. Itis considered a good method to perform clustering as both requires unsupervised processing. In this paper, weproposed a SOM using multi-layer, multi-feature to cluster documents. The paper implements a SOM usingfour layers containing lexical terms, phrases and sequences in bottom layers respectively and combining all atthe top layers. The documents are processed to extract these features to feed the SOM. The internal weightsand interconnections between these layer...
A flood is an overflow of water that swamps dry land. The gravest effects of flooding are the los... more A flood is an overflow of water that swamps dry land. The gravest effects of flooding are the loss of human life and economic losses. An early warning of these events can be very effective in minimizing the losses. Social media websites such as Twitter and Facebook are quite effective in the efficient dissemination of information pertinent to any emergency. Users on these social networking sites share both textual and rich content images and videos. The Multimedia Evaluation Benchmark (MediaEval) offers challenges in the form of shared tasks to develop and evaluate new algorithms, approaches and technologies for explorations and exploitations of multimedia in decision making for real time problems. Since 2015, the MediaEval has been running a shared task of predicting several aspects of flooding and through these shared tasks, many improvements have been observed. In this paper, the classification framework VRBagged-Net is proposed and implemented for flood classification. The frame...
This paper presents the method proposed and implemented by team FAST-NU-DS, in "The Flood-re... more This paper presents the method proposed and implemented by team FAST-NU-DS, in "The Flood-related Multimedia Task at MediaEval 2020". The task includes data of tweets in Italian language, extracted during floods between 2017 and 2019. The proposed method has utilized text of the tweet and its relevant image for the purpose of binary classification, which identifies whether or not the particular tweet is about flood incident. The proposed method has designed an ensemble based method for the classification of tweets, on the basis of textual data, visual data and combination of both. For visual data, the proposed method has utilized the technique of data augmentation for oversampling of the minority class and applied stratified random sampling for the selection of input. Moreover, Visual Geometry Group (VGG16) convolutional neural network, pretrained on ImageNet and Places365 is utilized by the proposed method. For classification of textual data, the technique of Term Frequen...
This paper presents the contribution of NUCES DSGP team for the Multimedia Satellite Task at Medi... more This paper presents the contribution of NUCES DSGP team for the Multimedia Satellite Task at MediaEval 2019. The essential tasks include News Image Topic Disambiguation (NITD) and Multimodal Flood level Estimation (MFLE) from news images. An ensemble based deep learning method has been applied to the News Image Topic Disambiguation task, where data augmentation and transfer learning were used for binary classification of images. During training, the challenge of class imbalance is managed by using data augmentation technique and selection of equal sample size from each class. For Multimodal Flood Level Estimation task, person’s lower body keypoints were detected along with image flood probability scores from two deep convolutional network architectures, namely ResNet50 and VGG19. The confidence scores of detected keypoints and the convolutional networks’ output probabilities were combined and were passed to a Random Forest classifier for a final prediction score. The evaluation of t...
Document clustering recently became a vital approach as numbers of documents on web and on propri... more Document clustering recently became a vital approach as numbers of documents on web and on proprietary repositories are increased in unprecedented manner. The documents that are written in human language generally contain some context and usage of words mainly dependent upon the same context; recently researchers have attempted to enrich document representation via external knowledge base. This can facilitate the contextual information in the clustering process. An enrichment process with explicit content analysis using Wikipedia as knowledge base has been proposed. The approach is distinct in the sense that only the conceptual words from a document were used and their frequency to embed the contextual information. Hence, the approach does not over enrich the documents. A vector based representation, with cosine similarity and agglomerative hierarchical clustering is used to perform actual document clustering. The proposed method was compared with existing relevant approaches on NEW...
The explosive growth of data in banking sector is common phenomena. It is due to early adaptation... more The explosive growth of data in banking sector is common phenomena. It is due to early adaptation of information system by Banks. This vast volume of historical data related to financial position of individuals and organizations compel banks to evaluate credit worthiness of clients to offers new services. Credit scoring can be defined as a technique that facilitates lenders in deciding to grant or reject credit to consumers. A credit score is a product of advanced analytical models that catch a snapshot of the consumer credit history and translate it into a numeric number that signify the amount of risks that will be generated in a specific deal by the consumer. Automated Credit scoring mechanism has replaced onerous, error-prone labour-intensive manual reviews that were less transparent and lacks statistical-soundness in almost all financial organizations. The credit scoring functionality is a type of classification problem for the new customer. There are numerous data classificati...
Document clustering is an unsupervised machine learning technique that organizes a large collecti... more Document clustering is an unsupervised machine learning technique that organizes a large collection of documents into smaller, topic homogenous, meaningful sub-collections (clusters). Traditional document clustering approaches use extracted features like: word (term), phrases, sequences and topics from the documents as descriptors for clustering process. These features do not consider the relationship among different words that are used to convey the contextual information within the document. Recently, Graph-of-Word approach is introduced in information research; this approach addresses the problem of independence assumption by building a graph of word from the words that appeared in a document. Hence, the relationships among words are captured in the representation. It is an unweighted directed graph whose vertices represent unique terms and whose edges represent co-occurrences between the terms. The representation is simplified by using a sliding window of size = 3 with the text ...
The paper presents a text classification approach for classifying tweets into two classes: availa... more The paper presents a text classification approach for classifying tweets into two classes: availability/ need, based on the content of the tweets. The approach uses a language model for classification based on word-embedding of fixed length to get the semantic relationship among words. The approach uses logistic regression for actual classification. The logistic regression measures the relationship between the categorical dependent variable (tweet label) and a fixed length words embedding of the tweetcontent(words), by estimating the probabilities of tweets produced by embedding words. The regression function is estimated by maximum likelihood estimation of composition of tweets by these embedding words. The approach produced 84% accurate classification for the two classes on the training set provided for shared task on "Information Retrieval from Microblogs during Disasters (IRMiDis)". as a part of, The 9th meeting of Forum for Information Retrieval Evaluation (FIRE 2017).
Micro-blogging websites like Twitter are very popular among internet users, over 100 million twee... more Micro-blogging websites like Twitter are very popular among internet users, over 100 million tweets are posted every day. The websites are active in e cient dissemination of information pertinent to any emergency like ood and earthquake. Recent research proved that these platforms can e ectively be used for monitoring, evaluations and coordinating relief operations in such situations. One of the very critical issue of such applications is to identifying the validity of these posts automatically during the emergency situation as factual information or rumors. The idea is to verify the tweet from some other authentic news source. Forum for Information Retrieval Evaluation (FIRE 2018) edition included a shared task for Information Retrieval from Microblogs during Disasters (IRMiDis). The subtask 1, is identifying the tweets from their content as fact or fact-checkable tweets. The main idea of this task is to identify the validity of the tweets so that the rumors or baseless situational...
Enterprise networks face a large number of threats that are managed and mitigated with a combinat... more Enterprise networks face a large number of threats that are managed and mitigated with a combination of proprietary and third-party security tools and services. However, the techniques and principles employed by the said tools, processes, and services are quite conventional. They lack the rapid evolution, as required to protect against modern, state-of-the-art threats faced, specifically, against distributed denial of service (DDoS) attacks. The lack of efficiency of a network is directly proportional to the number of applications and services it hosts, mainly to protect against external and internal threats. Moreover, the effectiveness of such security mechanisms relies on their independent and proactive approach, which is useful for known malware and their attack vectors, but become obsolete when there is a new malware or zero-day vulnerability is exploits. This paper presents an intelligent, highly responsive, and scalable security framework for enterprise networks. The proposed framework incorporates Apache Spark Framework for security analytics. It accurately identifies anomalies related to DDoS attacks from real-time network traffic by using customized machine learning algorithms, meticulously trained against selected feature-set. Encouraging results are obtained when tested against different scenarios and bench-marked with the results achieved by related studies in similar scenarios.
Journal of Independent Studies and Research - Computing, 2015
Document clustering is usually performed as an unsupervised task. It attempts to separate differe... more Document clustering is usually performed as an unsupervised task. It attempts to separate different groups of documents (clusters) from a document collection based on implicitly identifying the common patterns present in these documents. A semi-supervised approach to this problem recently reported promising results. In semi-supervised approach, an explicit background knowledge (for example: Must-link or Cannot-link information for a pair of documents) is used in the form of constraints to drive the clustering process in the right direction. In this paper, a semi-supervised approach to document clustering is proposed. There are three main contributions through this paper (i) a document is transformed primarily into a graph representation based on Graph-of-Word approach. From this graph, a word sequences of size=3 is extracted. This sequence is used as a feature for the semi-supervised clustering. (ii) A similarity function based on commonword sequences is proposed, and (iii) the constrained based algorithm is designed to perform the actual cluster process through active learning. The proposed algorithm is implemented and extensively tested on three standard text mining datasets. The method clearly outperforms the recently proposed algorithms for document clustering in term of standard evaluation measures for document clustering task.
Journal of Independent Studies and Research - Computing, 2015
Credit cards are now widely used by consumers for purchasing various goods and services due to wi... more Credit cards are now widely used by consumers for purchasing various goods and services due to widespread use of internet and consequential growth of E-commerce over the past few decades. This enhanced use of credit cards has increased the associated risks such as fraudulent use of credit cards that can cause financial loss to the card holders as well as to financial institutions. It is an ethical issue and has legal implications in various countries where laws and regulations forces financial intuitions and credit card companies to employ various techniques to detect and prevent the credit card frauds. Although the changes in technological systems also change the nature of frauds but data mining techniques such as classification, regression and clustering are very useful and are widely used to prevent and detect the frauds associated with credit cards. The credit card fraud prevention and detection functionality is a type of classification problem for the new customer as well for existing customers. There are multiple data mining techniques that can be employed for classification of customers and each has its own pros and cons. This study will compare four classification techniques namely Naïve Bayes, Bayesian network, Artificial Neural Network and Artificial Immune Systems for credit card transactions classification on a dataset obtained from a commercial bank in Pakistan. The major contribution of this study is use of real data on which extensive experiments have been performed and various results have been analysed with conclusion of best technique.
Journal of Independent Studies and Research - Computing, 2015
Global Terrorist Dataset (GTD) is a vast collection of terrorist activities reported around the g... more Global Terrorist Dataset (GTD) is a vast collection of terrorist activities reported around the globe. The terrorism database incorporates more than 27,000 terrorism incidents from 1968 to 2014. Every record has spatial data, a period stamp, and a few different fields (e.g. strategies, weapon sorts, targets and wounds). There were few earlier studies to find interesting patterns from this textual gamut of data. The author believes that GTD has numerous interesting patterns still hidden and the full potential of this resource is still to be divulged. In this Independent Study, the author tries to investigate the GTD through co-clustering method for pattern discovery. Author has extracted textual data from GTD as per motivation to cluster the data in space and time simultaneously, through co-clustering. Co-clustering has become an important and powerful tool for data mining. By using co-clustering, bilateral data can be analysed by describing the connections between two different entities. There are many applications in the real world that can extensively benefits from this approach of co-clustering, such as market basket analysis and recommendation system. In this study, the effectiveness of coclustering model will be described by performing experiment on database of global terrorist events.
Natural disasters destroy valuable resources and are necessary to recognize so that appropriate s... more Natural disasters destroy valuable resources and are necessary to recognize so that appropriate strategies may be designed. In recent past, social networks are very good source to gather event specific information. This working notes paper is based on the task of Disaster Image Retrieval from Social Media dataset (DRISM), as a part of MediaEval, 2017. The Dataset of images and their relevant metadata is taken from various social networks including Twitter and Flicker. An ensemble approach is adopted in this paper where different visual and metadata features are integrated. Kernel Discriminant analysis using spectral regression is then used as dimensionality reduction technique. Mean Average Precision (MAP) at various cutoffs are reported in this paper.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
The use of attention models for automated image captioning has enabled many systems to produce ac... more The use of attention models for automated image captioning has enabled many systems to produce accurate and meaningful descriptions for images. Over the years, many novel approaches have been proposed to enhance the attention process using different feature representations. In this paper, we extend this approach by creating a guided attention network mechanism, that exploits the relationship between the visual scene and text-descriptions using spatial features from the image, high-level information from the topics, and temporal context from caption generation, which are embedded together in an ordered embedding space. A pairwise ranking objective is used for training this embedding space which allows similar images, topics and captions in the shared semantic space to maintain a partial order in the visual-semantic hierarchy and hence, helps the model to produce more visually accurate captions. The experimental results based on MSCOCO dataset shows the competitiveness of our approach, with many state-of-the-art models on various evaluation metrics.
This paper illustrates our approach to the shared task on similar language translation in the fif... more This paper illustrates our approach to the shared task on similar language translation in the fifth conference on machine translation (WMT-20). Our motivation comes from the latest state of the art neural machine translation in which Transformers and Recurrent Attention models are effectively used. A typical sequence-sequence architecture consists of an encoder and a decoder Recurrent Neural Network (RNN). The encoder recursively processes a source sequence and reduces it into a fixed-length vector (context), and the decoder generates a target sequence, token by token, conditioned on the same context. In contrast, the advantage of transformers is to reduce the training time by offering a higher degree of parallelism at the cost of freedom for sequential order. With the introduction of Recurrent Attention, it allows the decoder to focus effectively on order of the source sequence at different decoding steps. In our approach, we have combined the recurrence based layered encoder-decod...
Disastrous situations can be better managed by availability of timely and relevant information. S... more Disastrous situations can be better managed by availability of timely and relevant information. Social media plays very important role in providing information of disastrous events. The working paper is based on "Multimedia satellite task: Emergency response for flooding events", as a part of MediaEval, 2018. The dataset was provided with tweets and their respective images, which may or may not acquire evidence of roads and its status of pass-ability. An ensemble approach is followed in this paper by combining local features and global features of images. Text contents of the tweets were processed by their TF-IDF scores. Moreover, two level classification is performed by applying Spectral Regression based Kernel Discriminant Analysis (SRKDA) on individual feature categories as well as ensemble of different feature types. It is observed that F1 score produced by visual, text and ensemble of both text and visual features for evidence of road remain 74.58%, 58.30% and 76.61% ...
ing for each document. There are several possible extensions to this work: The proposed documen... more ing for each document. There are several possible extensions to this work: The proposed document clustering approach has many practical applications. One direction is to apply this technique on some specific application area along with application specific optimizations to see the outcome. For example: web search results can be clustered using this approach. The snippets for each cluster are generated to see the quality of these snippets. In the proposed approach each term, whether it is from lexical chain or from topic maps, has an equal effect on similarity calculation for a pair of documents. One possible direction is to introduce discriminative feature weighting for the features in this approach. Discriminative feature weighting has encouraging results for both text clustering and classification tasks.
Cluster analysis of textual documents is a common technique for better ltering, navigation, under... more Cluster analysis of textual documents is a common technique for better ltering, navigation, under-standing and comprehension of the large document collection. Document clustering is an autonomous methodthat separate out large heterogeneous document collection into smaller more homogeneous sub-collections calledclusters. Self-organizing maps (SOM) is a type of arti cial neural network (ANN) that can be used to performautonomous self-organization of high dimension feature space into low-dimensional projections called maps. Itis considered a good method to perform clustering as both requires unsupervised processing. In this paper, weproposed a SOM using multi-layer, multi-feature to cluster documents. The paper implements a SOM usingfour layers containing lexical terms, phrases and sequences in bottom layers respectively and combining all atthe top layers. The documents are processed to extract these features to feed the SOM. The internal weightsand interconnections between these layer...
A flood is an overflow of water that swamps dry land. The gravest effects of flooding are the los... more A flood is an overflow of water that swamps dry land. The gravest effects of flooding are the loss of human life and economic losses. An early warning of these events can be very effective in minimizing the losses. Social media websites such as Twitter and Facebook are quite effective in the efficient dissemination of information pertinent to any emergency. Users on these social networking sites share both textual and rich content images and videos. The Multimedia Evaluation Benchmark (MediaEval) offers challenges in the form of shared tasks to develop and evaluate new algorithms, approaches and technologies for explorations and exploitations of multimedia in decision making for real time problems. Since 2015, the MediaEval has been running a shared task of predicting several aspects of flooding and through these shared tasks, many improvements have been observed. In this paper, the classification framework VRBagged-Net is proposed and implemented for flood classification. The frame...
This paper presents the method proposed and implemented by team FAST-NU-DS, in "The Flood-re... more This paper presents the method proposed and implemented by team FAST-NU-DS, in "The Flood-related Multimedia Task at MediaEval 2020". The task includes data of tweets in Italian language, extracted during floods between 2017 and 2019. The proposed method has utilized text of the tweet and its relevant image for the purpose of binary classification, which identifies whether or not the particular tweet is about flood incident. The proposed method has designed an ensemble based method for the classification of tweets, on the basis of textual data, visual data and combination of both. For visual data, the proposed method has utilized the technique of data augmentation for oversampling of the minority class and applied stratified random sampling for the selection of input. Moreover, Visual Geometry Group (VGG16) convolutional neural network, pretrained on ImageNet and Places365 is utilized by the proposed method. For classification of textual data, the technique of Term Frequen...
This paper presents the contribution of NUCES DSGP team for the Multimedia Satellite Task at Medi... more This paper presents the contribution of NUCES DSGP team for the Multimedia Satellite Task at MediaEval 2019. The essential tasks include News Image Topic Disambiguation (NITD) and Multimodal Flood level Estimation (MFLE) from news images. An ensemble based deep learning method has been applied to the News Image Topic Disambiguation task, where data augmentation and transfer learning were used for binary classification of images. During training, the challenge of class imbalance is managed by using data augmentation technique and selection of equal sample size from each class. For Multimodal Flood Level Estimation task, person’s lower body keypoints were detected along with image flood probability scores from two deep convolutional network architectures, namely ResNet50 and VGG19. The confidence scores of detected keypoints and the convolutional networks’ output probabilities were combined and were passed to a Random Forest classifier for a final prediction score. The evaluation of t...
Document clustering recently became a vital approach as numbers of documents on web and on propri... more Document clustering recently became a vital approach as numbers of documents on web and on proprietary repositories are increased in unprecedented manner. The documents that are written in human language generally contain some context and usage of words mainly dependent upon the same context; recently researchers have attempted to enrich document representation via external knowledge base. This can facilitate the contextual information in the clustering process. An enrichment process with explicit content analysis using Wikipedia as knowledge base has been proposed. The approach is distinct in the sense that only the conceptual words from a document were used and their frequency to embed the contextual information. Hence, the approach does not over enrich the documents. A vector based representation, with cosine similarity and agglomerative hierarchical clustering is used to perform actual document clustering. The proposed method was compared with existing relevant approaches on NEW...
The explosive growth of data in banking sector is common phenomena. It is due to early adaptation... more The explosive growth of data in banking sector is common phenomena. It is due to early adaptation of information system by Banks. This vast volume of historical data related to financial position of individuals and organizations compel banks to evaluate credit worthiness of clients to offers new services. Credit scoring can be defined as a technique that facilitates lenders in deciding to grant or reject credit to consumers. A credit score is a product of advanced analytical models that catch a snapshot of the consumer credit history and translate it into a numeric number that signify the amount of risks that will be generated in a specific deal by the consumer. Automated Credit scoring mechanism has replaced onerous, error-prone labour-intensive manual reviews that were less transparent and lacks statistical-soundness in almost all financial organizations. The credit scoring functionality is a type of classification problem for the new customer. There are numerous data classificati...
Document clustering is an unsupervised machine learning technique that organizes a large collecti... more Document clustering is an unsupervised machine learning technique that organizes a large collection of documents into smaller, topic homogenous, meaningful sub-collections (clusters). Traditional document clustering approaches use extracted features like: word (term), phrases, sequences and topics from the documents as descriptors for clustering process. These features do not consider the relationship among different words that are used to convey the contextual information within the document. Recently, Graph-of-Word approach is introduced in information research; this approach addresses the problem of independence assumption by building a graph of word from the words that appeared in a document. Hence, the relationships among words are captured in the representation. It is an unweighted directed graph whose vertices represent unique terms and whose edges represent co-occurrences between the terms. The representation is simplified by using a sliding window of size = 3 with the text ...
The paper presents a text classification approach for classifying tweets into two classes: availa... more The paper presents a text classification approach for classifying tweets into two classes: availability/ need, based on the content of the tweets. The approach uses a language model for classification based on word-embedding of fixed length to get the semantic relationship among words. The approach uses logistic regression for actual classification. The logistic regression measures the relationship between the categorical dependent variable (tweet label) and a fixed length words embedding of the tweetcontent(words), by estimating the probabilities of tweets produced by embedding words. The regression function is estimated by maximum likelihood estimation of composition of tweets by these embedding words. The approach produced 84% accurate classification for the two classes on the training set provided for shared task on "Information Retrieval from Microblogs during Disasters (IRMiDis)". as a part of, The 9th meeting of Forum for Information Retrieval Evaluation (FIRE 2017).
Micro-blogging websites like Twitter are very popular among internet users, over 100 million twee... more Micro-blogging websites like Twitter are very popular among internet users, over 100 million tweets are posted every day. The websites are active in e cient dissemination of information pertinent to any emergency like ood and earthquake. Recent research proved that these platforms can e ectively be used for monitoring, evaluations and coordinating relief operations in such situations. One of the very critical issue of such applications is to identifying the validity of these posts automatically during the emergency situation as factual information or rumors. The idea is to verify the tweet from some other authentic news source. Forum for Information Retrieval Evaluation (FIRE 2018) edition included a shared task for Information Retrieval from Microblogs during Disasters (IRMiDis). The subtask 1, is identifying the tweets from their content as fact or fact-checkable tweets. The main idea of this task is to identify the validity of the tweets so that the rumors or baseless situational...
Enterprise networks face a large number of threats that are managed and mitigated with a combinat... more Enterprise networks face a large number of threats that are managed and mitigated with a combination of proprietary and third-party security tools and services. However, the techniques and principles employed by the said tools, processes, and services are quite conventional. They lack the rapid evolution, as required to protect against modern, state-of-the-art threats faced, specifically, against distributed denial of service (DDoS) attacks. The lack of efficiency of a network is directly proportional to the number of applications and services it hosts, mainly to protect against external and internal threats. Moreover, the effectiveness of such security mechanisms relies on their independent and proactive approach, which is useful for known malware and their attack vectors, but become obsolete when there is a new malware or zero-day vulnerability is exploits. This paper presents an intelligent, highly responsive, and scalable security framework for enterprise networks. The proposed framework incorporates Apache Spark Framework for security analytics. It accurately identifies anomalies related to DDoS attacks from real-time network traffic by using customized machine learning algorithms, meticulously trained against selected feature-set. Encouraging results are obtained when tested against different scenarios and bench-marked with the results achieved by related studies in similar scenarios.
Journal of Independent Studies and Research - Computing, 2015
Document clustering is usually performed as an unsupervised task. It attempts to separate differe... more Document clustering is usually performed as an unsupervised task. It attempts to separate different groups of documents (clusters) from a document collection based on implicitly identifying the common patterns present in these documents. A semi-supervised approach to this problem recently reported promising results. In semi-supervised approach, an explicit background knowledge (for example: Must-link or Cannot-link information for a pair of documents) is used in the form of constraints to drive the clustering process in the right direction. In this paper, a semi-supervised approach to document clustering is proposed. There are three main contributions through this paper (i) a document is transformed primarily into a graph representation based on Graph-of-Word approach. From this graph, a word sequences of size=3 is extracted. This sequence is used as a feature for the semi-supervised clustering. (ii) A similarity function based on commonword sequences is proposed, and (iii) the constrained based algorithm is designed to perform the actual cluster process through active learning. The proposed algorithm is implemented and extensively tested on three standard text mining datasets. The method clearly outperforms the recently proposed algorithms for document clustering in term of standard evaluation measures for document clustering task.
Journal of Independent Studies and Research - Computing, 2015
Credit cards are now widely used by consumers for purchasing various goods and services due to wi... more Credit cards are now widely used by consumers for purchasing various goods and services due to widespread use of internet and consequential growth of E-commerce over the past few decades. This enhanced use of credit cards has increased the associated risks such as fraudulent use of credit cards that can cause financial loss to the card holders as well as to financial institutions. It is an ethical issue and has legal implications in various countries where laws and regulations forces financial intuitions and credit card companies to employ various techniques to detect and prevent the credit card frauds. Although the changes in technological systems also change the nature of frauds but data mining techniques such as classification, regression and clustering are very useful and are widely used to prevent and detect the frauds associated with credit cards. The credit card fraud prevention and detection functionality is a type of classification problem for the new customer as well for existing customers. There are multiple data mining techniques that can be employed for classification of customers and each has its own pros and cons. This study will compare four classification techniques namely Naïve Bayes, Bayesian network, Artificial Neural Network and Artificial Immune Systems for credit card transactions classification on a dataset obtained from a commercial bank in Pakistan. The major contribution of this study is use of real data on which extensive experiments have been performed and various results have been analysed with conclusion of best technique.
Journal of Independent Studies and Research - Computing, 2015
Global Terrorist Dataset (GTD) is a vast collection of terrorist activities reported around the g... more Global Terrorist Dataset (GTD) is a vast collection of terrorist activities reported around the globe. The terrorism database incorporates more than 27,000 terrorism incidents from 1968 to 2014. Every record has spatial data, a period stamp, and a few different fields (e.g. strategies, weapon sorts, targets and wounds). There were few earlier studies to find interesting patterns from this textual gamut of data. The author believes that GTD has numerous interesting patterns still hidden and the full potential of this resource is still to be divulged. In this Independent Study, the author tries to investigate the GTD through co-clustering method for pattern discovery. Author has extracted textual data from GTD as per motivation to cluster the data in space and time simultaneously, through co-clustering. Co-clustering has become an important and powerful tool for data mining. By using co-clustering, bilateral data can be analysed by describing the connections between two different entities. There are many applications in the real world that can extensively benefits from this approach of co-clustering, such as market basket analysis and recommendation system. In this study, the effectiveness of coclustering model will be described by performing experiment on database of global terrorist events.
Uploads
Papers by Muhammad Rafi