Named entity recognition (NER) play a vital role in various application of Natural Language proce... more Named entity recognition (NER) play a vital role in various application of Natural Language processing. Although a significant work has been done in general and biomedical domain NER, but agriculture domain has been ignored for a long time. Agriculture entity includes name of crops, crop diseases, fertilizers etc. Due to the inapplicability of conventional features which has been used for identifying general named entities, recognizing and extracting the agricultural entities become a rigorous and challenging task. As NER in agriculture domain has not been yet explored a lot, thus building up a NER system for agriculture domain is very recent and vital work. This paper proposes a novel context-based approach to develop a NER system for agriculture domain. The proposed approach employs the context pattern for extracting the required entity of interest. The experiment is carried out in two different genres 1) Word Context Pattern 2) POS context pattern. In word context pattern, merely the cooccurring word tokens corresponding to the required entity is considered. While in Part of Speech (POS) rather than considering the co-occurring word tokens, their POS structure is plied. We have proposed seven part of speech patterns which are most likely to comprise all the instances of required entity of interest. The remarkable point is that the proposed POS patterns have not only device the known agricultural entities but have also extracted out 55 hidden entities from the data set. To boost up the performance of the NER system semantic similarity module has also been exercised. The proposed approach attains an accuracy of 70.45 % and recall of 91.3% which is appreciable as the preparatory work.
International Journal of Engineering and Technology, 2017
The product reviews and the blogs play a vital role in giving the insight to end user for making ... more The product reviews and the blogs play a vital role in giving the insight to end user for making a decision. Direct impact of reviews and ratings on the sale of the product raises a strong possibility of fake reviews. E-commerce sites are often indulged in writing fake reviews to promote/demote particular products and services. These fictitious opinions that are written to sound authentic are known as deceptive opinion/review spam. Review spam detection has received significant attention in both business and academia due to the potential impact fake reviews can have on consumer behaviour and purchasing decisions. To curb this issue many e-commerce companies have even started to certify the reviewers. But it covers an only small chunk of reviewers, so this technique couldn't be enough to deal with the problem of deceptive opinion spamming. Manually, it is difficult to detect these deceptive opinions. This work primarily focuses on enhancing the accuracy of existing deceptive opinion spam classifiers using psycholinguistic/sociolinguistic deceptive clues. We have formulated this problem in different ways and solve them with many machine learning techniques. This work carried out up on the publicly available gold standard corpus of deceptive opinion spam and achieved up to 92 percent cross-validation accuracy in restaurants and around 94 percent in hotels domain by the final classifier. A detail comparative results analysis has been done for all used machine learning algorithms. Keyword-Opinion Spamming, Opinion Mining, Web Mining, Psycholinguistic Features, and Machine Learning I. INTRODUCTION Opinion spamming can be defined as writing fake reviews that try to mislead human readers deliberately by giving undeserving positive opinions or false negative opinions to promote or demote some target products, services or organizations. People with malicious intentions post fake opinions without disclosing their true identity, also known as opinion spammer. Opinion spam can be broadly classified as disruptive opinion and deceptive opinion spam. Most of the previous work has focused on disruptive opinion spam, which is in the form of advertisement and other irrelevant non-opinion text. But deceptive opinion where people intentionally try to mislead others by writing fake reviews, remained a less explored field. Disruptive opinion spam can easily be identified and ignored by the user as they have quite distinguishable features that correspond to the advertisement and other commercial interests. On the other hand deceptive opinions are neither identifiable by a human reader nor even easily ignored as they have a serious impact on revenue generation and reputation. A study conducted on the impact of consumer reviews in restaurant domain finds that one-star increase in Yelp rating leads to a 5-9 percent increase in revenue [1]. Several high-profile cases have been reported in the news. The main motive behind the spamming is the monetary benefits. Opinion spam classifier can be seen as a two class text classification problem , however it is different from general text classification in terms of features. Traditional text classifiers mainly use syntactic, semantic, statistical etc. feature for classification purpose. Such features may be useful for classifying spam opinions also. But for detecting deceptive opinions, we need to keep in mind that these opinions are intentional, so a link needs to be established between use of regular words and deceptive behaviour to catch spammers. The problem of linking the opinion and opinion holder (reviewer) behaviour is not an easy task. Moreover, the task becomes more difficult in absence of information regarding opinion holder in most of the cases. To build a spam review detection model, researchers may use reviews or reviewer's characteristics. But in most of the cases and domains, they have to rely on review text due of unavailability of reviewer details. Most opinions are found in form of reviews so opinion and review is used interchangeably.
International Journal of Hybrid Information Technology, 2017
Online product reviews have become the major source of information for the end users to make purc... more Online product reviews have become the major source of information for the end users to make purchasing decisions. Companies/individuals often hire people for writing fake reviews to increase the sale of their products. These individuals are known as opinion spammers and their activities are known as opinion spamming. Manually it is difficult for a human being to detect these deceptive reviews. Features play a major role to build effective deceptive reviews detection classifiers. We have observed human behavior through reviews, blogs datasets, and transferred these observations into features.Towards the end, we have built automated deceptive reviews classifiers using document level and aspect level domain independent features. We have performed our experiments in hotels domain. We achieved around 93 percent accuracy on Myle Ott's gold standard dataset [1] and up to 86 percent accuracy on the self-crawled Yelp 1 dataset.
Pseudo-Relevance Feedback (PRF) is a well-known method of query expansion for improving the perfo... more Pseudo-Relevance Feedback (PRF) is a well-known method of query expansion for improving the performance of information retrieval systems. All the terms of PRF documents are not important for expanding the user query. Therefore selection of proper expansion term is very important for improving system performance. Individual query expansion terms selection methods have been widely investigated for improving its performance. Every individual expansion term selection method has its own weaknesses and strengths. To overcome the weaknesses and to utilize the strengths of the individual method, we used multiple terms selection methods together. In this paper, first the possibility of improving the overall performance using individual query expansion terms selection methods has been explored. Second, Borda count rank aggregation approach is used for combining multiple query expansion terms selection methods. Third, the semantic similarity approach is used to select semantically similar term...
International Journal of Information Retrieval Research, 2015
Pseudo-relevance feedback (PRF) is a type of relevance feedback approach of query expansion that ... more Pseudo-relevance feedback (PRF) is a type of relevance feedback approach of query expansion that considers the top ranked retrieved documents as relevance feedback. In this paper the authors focus is to capture the limitation of co-occurrence and PRF based query expansion approach and the authors proposed a hybrid method to improve the performance of PRF based query expansion by combining query term co-occurrence and query terms contextual information based on corpus of top retrieved feedback documents in first pass. Firstly, the paper suggests top retrieved feedback documents based query term co-occurrence approach to select an optimal combination of query terms from a pool of terms obtained using PRF based query expansion. Second, contextual window based approach is used to select the query context related terms from top feedback documents. Third, comparisons were made among baseline, co-occurrence and contextual window based approaches using different performance evaluating metri...
International Journal of Computational Vision and Robotics, 2010
Semantic similarity is becoming a generic issue in a variety of applications in area of informati... more Semantic similarity is becoming a generic issue in a variety of applications in area of information retrieval (IR). Most of the researchers are using ontology as a tool for finding semantic similarities. Use of ontology allows terms in documents to be replaced by the concepts. The concepts are generally selected by identifying semantically related terms and finding a suitable term (concept) to replace them. Several approaches have been proposed for finding concepts by selecting semantically related terms, however no attempt has been made to automatise the process. The motivation of this paper is to suggest an automatic method of identifying the concepts from documents using hypernym relationship in ontologies and propose an algorithm for the same. WordNet ontology has been used for implementing the algorithm. The algorithm can be used for finding document concepts and clustering the documents based on these concepts.
International journal of computer science & …, 2009
International Journal of Computer science & Information Technology (IJCSIT), Vol 1, No 2, Novembe... more International Journal of Computer science & Information Technology (IJCSIT), Vol 1, No 2, November 2009 ... Hazra Imran 1 and Aditi Sharan 2 ... 1 Department of Computer Science, Jamia Hamdard ,New Delhi ,India [email protected] 2School of Computers and System ...
2015 2nd International Conference on Computing for Sustainable Global Development (INDIACom), 2015
Ontology is recently one of the hot issues in research community. Domain specific Ontologies are ... more Ontology is recently one of the hot issues in research community. Domain specific Ontologies are being utilized as a search engine on the web page with an objective to make searching on the web page substantially more efficient, especially when it is more important to find the right web page, than searching with usual keywords. Ontology can play a very important role in the process of creating as well as managing the knowledge. This paper addresses the important issues in developing domain specific ontology for agriculture domain. We propose a generic approach for agriculture domain ontology representing entities and their relationships. We have developed a small ontology using the suggested approach. Our work is significant as we have not found any significant work targeting ontology development in agriculture domain.
system for focused information retrieval. This paper is an attempt to improve personalized web se... more system for focused information retrieval. This paper is an attempt to improve personalized web search. User's ProfIle provides an important input for performing personalized web search. This paper proposes a framework for constructing an Enhanced User ProfIle by using user's browsing history and enriching it using domain knowledge. This Enhanced User Profile can be used for improving the performance of personalized web search. In this paper we have used the Enhanced User ProfIle specifically for suggesting relevant pages to the user. The experimental results show that the suggestions provided to the user using Enhanced User Profile are better than those obtained by using a User Profile.
In this paper, we present a survey of important work done on automatic query expansion. Automatic... more In this paper, we present a survey of important work done on automatic query expansion. Automatic query expansion is the process of automatically supplementing additional terms or phrases to the original query and is considered an extremely promising technique to improve the retrieval effectiveness. In this survey, we discussed a large number of recent approaches to automatic query expansion that include linguistic based, corpusspecific based, query-specific based and search log based approaches. Some of them use lexical resource such as WordNet and other use search log and web data for query expansion. The following questions are also addressed in this work. Why the query expansion is important for information retrieval? What are the main steps of automatic query expansion? What approaches of automatic query expansion are available and how do they compare? What are the critical issues and research directions of automatic query expansion?
One hundred users, one hundred needs. Current web search engines are built to serve all users, in... more One hundred users, one hundred needs. Current web search engines are built to serve all users, independent of the needs of any individual user. Personalized web search is considered as a promising solution in this direction and is an area of immense research potential. The main objective of this paper is to establish the significance of evolutionary techniques for providing personalized web search. The paper provides a framework of interactive personalized query expansion model that provides a thematically rich query expansion. The model uses knowledge base at the back end. The knowledge is derived from knowledge rich open directories and user's browsing history. User's browsing history helps in identifying the needs of user, while open directories provide topic specific knowledge that act as a source for selecting the terms for expanding the query. Further Genetic algorithm has been used to select thematically rich term for expansion. The model offers the personalization at...
Advances in Intelligent Systems and Computing, 2018
Product reviews and blogs play a vital role in giving an insight to end user for making purchasin... more Product reviews and blogs play a vital role in giving an insight to end user for making purchasing decision. Studies show a direct link between product reviews/rating and revenue of product. So, review hosting sites are often targeted to promote or demote products by writing fake reviews. These fictitious opinions which are written to sound authentic known as deceptive opinion spam. To build an automatic classifier for opinion spam detection, feature engineering plays an important role. Deceptive cues are needed to be transformed into features. We have extracted various psychological, linguistic, and other textual features from text reviews. We have used mMulti-view Ensemble Learning (MEL) to build the classifier. Rough Set Based Optimal Feature Set Partitioning (RS-OFSP) algorithm is proposed to construct views for MEL. Proposed algorithm shows promising results when compared to random feature set partitioning (Bryll Pattern Recognit 36(6):1291–1302, 2003) [1] and optimal feature s...
Sentiment analysis aims to determine the sentiment strength from a textual source for good decisi... more Sentiment analysis aims to determine the sentiment strength from a textual source for good decision making. This work focuses on application of sentiment analysis in financial news. The semantic orientation of documents is first calculated by tuning the existing technique for financial domain. The existing technique is found to have limitations in identifying representative phrases that effectively capture the sentiment of the text. Two alternative techniques-one using Noun-verb combinations and the other a hybrid one, are evaluated. Noun-verb approach yields best results in the experiment conducted.
Aspect category detection (ACD) is an important subtask of aspect-based sentiment analysis (ABSA)... more Aspect category detection (ACD) is an important subtask of aspect-based sentiment analysis (ABSA). It is a challenging problem due to subjectivity involved in categorization, as well as the existence of overlapping classes. Among various approaches that have been applied to ACD include rule-based approaches along with other machine learning approaches, and most of them are statistical in nature. In this article, we have used an association rule-based approach. To deal with the statistical limitation of association rules, we proposed a hybridized rule-based approach that combines association rules with the semantic association. For semantic associations, we have used the notion of word-embeddings. Experiments were performed on SemEval dataset, a standard benchmark dataset for aspect categorization in the restaurant domain. We observed that semantic associations can complement statistical association and improve the accuracy of classification. The proposed method performs better than several state-of-the-art methods.
In this paper, our focus is to capture the limitations of Pseudo-Relevance Feedback (PRF) based q... more In this paper, our focus is to capture the limitations of Pseudo-Relevance Feedback (PRF) based query expansion (QE) and propose a hybrid method to improve the performance of PRF-based QE by combining corpus-based term co-occurrence information, context window of query terms and semantic information of term. Firstly, the paper suggests use of various corpus-based term co-occurrence approaches to select an optimal combination of query terms from a pool of terms obtained using PRF-based QE. Third, we use semantic similarity approach to rank the QE terms obtained from top feedback documents. Fourth, we combine co-occurrence, context window and semantic similarity based approaches together to select the best expansion for query reformulation. The experiments were performed on FIRE ad-hoc and TREC-3 benchmark datasets of information retrieval task. The results show significant improvement in terms of precision, recall and mean average precision (MAP). This experiment shows that the combination of various techniques in an intelligent way gives us goodness of all of them.
International Journal of Information Retrieval Research, 2018
This article proposes a new concept of Lexical Network for Automatic Text Document Summarization.... more This article proposes a new concept of Lexical Network for Automatic Text Document Summarization. Instead of a number of chains, the authors are getting a network of sentences which is called as Lexical Network termed as LexNetwork. This network is created between sentences based on different lexical and semantic relations. In this network, a node is representing sentences and edges are representing strength between two sentences. Strength means the number of relations present between the two sentences. The importance of the sentences is decided based on different centrality measures and extracted for the summary. WSD is done with Simple Lesk technique, and Cosine-Similarity threshold (Ɵ, TH) is used as post processing task. In this article, the authors are suggesting that a Cosine similarity threshold 10% is better vs. 5%, and an Eigen-Value based centrality measure is better for summarization process. At last for comparison, they are using Semantrica-Lexalytics System.
International Journal on Semantic Web and Information Systems, 2018
Automatic text document summarization is active research area in text mining field. In this artic... more Automatic text document summarization is active research area in text mining field. In this article, the authors are proposing two new approaches (three models) for sentence selection, and a new entropy-based summary evaluation criteria. The first approach is based on the algebraic model, Singular Value Decomposition (SVD), i.e. Latent Semantic Analysis (LSA) and model is termed as proposed_model-1, and Second Approach is based on entropy that is further divided into proposed_model-2 and proposed_model-3. In first proposed model, the authors are using right singular matrix, and second & third proposed models are based on Shannon entropy. The advantage of these models is that these are not a Length dominating model, giving better results, and low redundancy. Along with these three new models, an entropy-based summary evaluation criteria is proposed and tested. They are also showing that their entropy based proposed models statistically closer to DUC-2002's standard/gold summary. ...
Named entity recognition (NER) play a vital role in various application of Natural Language proce... more Named entity recognition (NER) play a vital role in various application of Natural Language processing. Although a significant work has been done in general and biomedical domain NER, but agriculture domain has been ignored for a long time. Agriculture entity includes name of crops, crop diseases, fertilizers etc. Due to the inapplicability of conventional features which has been used for identifying general named entities, recognizing and extracting the agricultural entities become a rigorous and challenging task. As NER in agriculture domain has not been yet explored a lot, thus building up a NER system for agriculture domain is very recent and vital work. This paper proposes a novel context-based approach to develop a NER system for agriculture domain. The proposed approach employs the context pattern for extracting the required entity of interest. The experiment is carried out in two different genres 1) Word Context Pattern 2) POS context pattern. In word context pattern, merely the cooccurring word tokens corresponding to the required entity is considered. While in Part of Speech (POS) rather than considering the co-occurring word tokens, their POS structure is plied. We have proposed seven part of speech patterns which are most likely to comprise all the instances of required entity of interest. The remarkable point is that the proposed POS patterns have not only device the known agricultural entities but have also extracted out 55 hidden entities from the data set. To boost up the performance of the NER system semantic similarity module has also been exercised. The proposed approach attains an accuracy of 70.45 % and recall of 91.3% which is appreciable as the preparatory work.
International Journal of Engineering and Technology, 2017
The product reviews and the blogs play a vital role in giving the insight to end user for making ... more The product reviews and the blogs play a vital role in giving the insight to end user for making a decision. Direct impact of reviews and ratings on the sale of the product raises a strong possibility of fake reviews. E-commerce sites are often indulged in writing fake reviews to promote/demote particular products and services. These fictitious opinions that are written to sound authentic are known as deceptive opinion/review spam. Review spam detection has received significant attention in both business and academia due to the potential impact fake reviews can have on consumer behaviour and purchasing decisions. To curb this issue many e-commerce companies have even started to certify the reviewers. But it covers an only small chunk of reviewers, so this technique couldn't be enough to deal with the problem of deceptive opinion spamming. Manually, it is difficult to detect these deceptive opinions. This work primarily focuses on enhancing the accuracy of existing deceptive opinion spam classifiers using psycholinguistic/sociolinguistic deceptive clues. We have formulated this problem in different ways and solve them with many machine learning techniques. This work carried out up on the publicly available gold standard corpus of deceptive opinion spam and achieved up to 92 percent cross-validation accuracy in restaurants and around 94 percent in hotels domain by the final classifier. A detail comparative results analysis has been done for all used machine learning algorithms. Keyword-Opinion Spamming, Opinion Mining, Web Mining, Psycholinguistic Features, and Machine Learning I. INTRODUCTION Opinion spamming can be defined as writing fake reviews that try to mislead human readers deliberately by giving undeserving positive opinions or false negative opinions to promote or demote some target products, services or organizations. People with malicious intentions post fake opinions without disclosing their true identity, also known as opinion spammer. Opinion spam can be broadly classified as disruptive opinion and deceptive opinion spam. Most of the previous work has focused on disruptive opinion spam, which is in the form of advertisement and other irrelevant non-opinion text. But deceptive opinion where people intentionally try to mislead others by writing fake reviews, remained a less explored field. Disruptive opinion spam can easily be identified and ignored by the user as they have quite distinguishable features that correspond to the advertisement and other commercial interests. On the other hand deceptive opinions are neither identifiable by a human reader nor even easily ignored as they have a serious impact on revenue generation and reputation. A study conducted on the impact of consumer reviews in restaurant domain finds that one-star increase in Yelp rating leads to a 5-9 percent increase in revenue [1]. Several high-profile cases have been reported in the news. The main motive behind the spamming is the monetary benefits. Opinion spam classifier can be seen as a two class text classification problem , however it is different from general text classification in terms of features. Traditional text classifiers mainly use syntactic, semantic, statistical etc. feature for classification purpose. Such features may be useful for classifying spam opinions also. But for detecting deceptive opinions, we need to keep in mind that these opinions are intentional, so a link needs to be established between use of regular words and deceptive behaviour to catch spammers. The problem of linking the opinion and opinion holder (reviewer) behaviour is not an easy task. Moreover, the task becomes more difficult in absence of information regarding opinion holder in most of the cases. To build a spam review detection model, researchers may use reviews or reviewer's characteristics. But in most of the cases and domains, they have to rely on review text due of unavailability of reviewer details. Most opinions are found in form of reviews so opinion and review is used interchangeably.
International Journal of Hybrid Information Technology, 2017
Online product reviews have become the major source of information for the end users to make purc... more Online product reviews have become the major source of information for the end users to make purchasing decisions. Companies/individuals often hire people for writing fake reviews to increase the sale of their products. These individuals are known as opinion spammers and their activities are known as opinion spamming. Manually it is difficult for a human being to detect these deceptive reviews. Features play a major role to build effective deceptive reviews detection classifiers. We have observed human behavior through reviews, blogs datasets, and transferred these observations into features.Towards the end, we have built automated deceptive reviews classifiers using document level and aspect level domain independent features. We have performed our experiments in hotels domain. We achieved around 93 percent accuracy on Myle Ott's gold standard dataset [1] and up to 86 percent accuracy on the self-crawled Yelp 1 dataset.
Pseudo-Relevance Feedback (PRF) is a well-known method of query expansion for improving the perfo... more Pseudo-Relevance Feedback (PRF) is a well-known method of query expansion for improving the performance of information retrieval systems. All the terms of PRF documents are not important for expanding the user query. Therefore selection of proper expansion term is very important for improving system performance. Individual query expansion terms selection methods have been widely investigated for improving its performance. Every individual expansion term selection method has its own weaknesses and strengths. To overcome the weaknesses and to utilize the strengths of the individual method, we used multiple terms selection methods together. In this paper, first the possibility of improving the overall performance using individual query expansion terms selection methods has been explored. Second, Borda count rank aggregation approach is used for combining multiple query expansion terms selection methods. Third, the semantic similarity approach is used to select semantically similar term...
International Journal of Information Retrieval Research, 2015
Pseudo-relevance feedback (PRF) is a type of relevance feedback approach of query expansion that ... more Pseudo-relevance feedback (PRF) is a type of relevance feedback approach of query expansion that considers the top ranked retrieved documents as relevance feedback. In this paper the authors focus is to capture the limitation of co-occurrence and PRF based query expansion approach and the authors proposed a hybrid method to improve the performance of PRF based query expansion by combining query term co-occurrence and query terms contextual information based on corpus of top retrieved feedback documents in first pass. Firstly, the paper suggests top retrieved feedback documents based query term co-occurrence approach to select an optimal combination of query terms from a pool of terms obtained using PRF based query expansion. Second, contextual window based approach is used to select the query context related terms from top feedback documents. Third, comparisons were made among baseline, co-occurrence and contextual window based approaches using different performance evaluating metri...
International Journal of Computational Vision and Robotics, 2010
Semantic similarity is becoming a generic issue in a variety of applications in area of informati... more Semantic similarity is becoming a generic issue in a variety of applications in area of information retrieval (IR). Most of the researchers are using ontology as a tool for finding semantic similarities. Use of ontology allows terms in documents to be replaced by the concepts. The concepts are generally selected by identifying semantically related terms and finding a suitable term (concept) to replace them. Several approaches have been proposed for finding concepts by selecting semantically related terms, however no attempt has been made to automatise the process. The motivation of this paper is to suggest an automatic method of identifying the concepts from documents using hypernym relationship in ontologies and propose an algorithm for the same. WordNet ontology has been used for implementing the algorithm. The algorithm can be used for finding document concepts and clustering the documents based on these concepts.
International journal of computer science & …, 2009
International Journal of Computer science & Information Technology (IJCSIT), Vol 1, No 2, Novembe... more International Journal of Computer science & Information Technology (IJCSIT), Vol 1, No 2, November 2009 ... Hazra Imran 1 and Aditi Sharan 2 ... 1 Department of Computer Science, Jamia Hamdard ,New Delhi ,India [email protected] 2School of Computers and System ...
2015 2nd International Conference on Computing for Sustainable Global Development (INDIACom), 2015
Ontology is recently one of the hot issues in research community. Domain specific Ontologies are ... more Ontology is recently one of the hot issues in research community. Domain specific Ontologies are being utilized as a search engine on the web page with an objective to make searching on the web page substantially more efficient, especially when it is more important to find the right web page, than searching with usual keywords. Ontology can play a very important role in the process of creating as well as managing the knowledge. This paper addresses the important issues in developing domain specific ontology for agriculture domain. We propose a generic approach for agriculture domain ontology representing entities and their relationships. We have developed a small ontology using the suggested approach. Our work is significant as we have not found any significant work targeting ontology development in agriculture domain.
system for focused information retrieval. This paper is an attempt to improve personalized web se... more system for focused information retrieval. This paper is an attempt to improve personalized web search. User's ProfIle provides an important input for performing personalized web search. This paper proposes a framework for constructing an Enhanced User ProfIle by using user's browsing history and enriching it using domain knowledge. This Enhanced User Profile can be used for improving the performance of personalized web search. In this paper we have used the Enhanced User ProfIle specifically for suggesting relevant pages to the user. The experimental results show that the suggestions provided to the user using Enhanced User Profile are better than those obtained by using a User Profile.
In this paper, we present a survey of important work done on automatic query expansion. Automatic... more In this paper, we present a survey of important work done on automatic query expansion. Automatic query expansion is the process of automatically supplementing additional terms or phrases to the original query and is considered an extremely promising technique to improve the retrieval effectiveness. In this survey, we discussed a large number of recent approaches to automatic query expansion that include linguistic based, corpusspecific based, query-specific based and search log based approaches. Some of them use lexical resource such as WordNet and other use search log and web data for query expansion. The following questions are also addressed in this work. Why the query expansion is important for information retrieval? What are the main steps of automatic query expansion? What approaches of automatic query expansion are available and how do they compare? What are the critical issues and research directions of automatic query expansion?
One hundred users, one hundred needs. Current web search engines are built to serve all users, in... more One hundred users, one hundred needs. Current web search engines are built to serve all users, independent of the needs of any individual user. Personalized web search is considered as a promising solution in this direction and is an area of immense research potential. The main objective of this paper is to establish the significance of evolutionary techniques for providing personalized web search. The paper provides a framework of interactive personalized query expansion model that provides a thematically rich query expansion. The model uses knowledge base at the back end. The knowledge is derived from knowledge rich open directories and user's browsing history. User's browsing history helps in identifying the needs of user, while open directories provide topic specific knowledge that act as a source for selecting the terms for expanding the query. Further Genetic algorithm has been used to select thematically rich term for expansion. The model offers the personalization at...
Advances in Intelligent Systems and Computing, 2018
Product reviews and blogs play a vital role in giving an insight to end user for making purchasin... more Product reviews and blogs play a vital role in giving an insight to end user for making purchasing decision. Studies show a direct link between product reviews/rating and revenue of product. So, review hosting sites are often targeted to promote or demote products by writing fake reviews. These fictitious opinions which are written to sound authentic known as deceptive opinion spam. To build an automatic classifier for opinion spam detection, feature engineering plays an important role. Deceptive cues are needed to be transformed into features. We have extracted various psychological, linguistic, and other textual features from text reviews. We have used mMulti-view Ensemble Learning (MEL) to build the classifier. Rough Set Based Optimal Feature Set Partitioning (RS-OFSP) algorithm is proposed to construct views for MEL. Proposed algorithm shows promising results when compared to random feature set partitioning (Bryll Pattern Recognit 36(6):1291–1302, 2003) [1] and optimal feature s...
Sentiment analysis aims to determine the sentiment strength from a textual source for good decisi... more Sentiment analysis aims to determine the sentiment strength from a textual source for good decision making. This work focuses on application of sentiment analysis in financial news. The semantic orientation of documents is first calculated by tuning the existing technique for financial domain. The existing technique is found to have limitations in identifying representative phrases that effectively capture the sentiment of the text. Two alternative techniques-one using Noun-verb combinations and the other a hybrid one, are evaluated. Noun-verb approach yields best results in the experiment conducted.
Aspect category detection (ACD) is an important subtask of aspect-based sentiment analysis (ABSA)... more Aspect category detection (ACD) is an important subtask of aspect-based sentiment analysis (ABSA). It is a challenging problem due to subjectivity involved in categorization, as well as the existence of overlapping classes. Among various approaches that have been applied to ACD include rule-based approaches along with other machine learning approaches, and most of them are statistical in nature. In this article, we have used an association rule-based approach. To deal with the statistical limitation of association rules, we proposed a hybridized rule-based approach that combines association rules with the semantic association. For semantic associations, we have used the notion of word-embeddings. Experiments were performed on SemEval dataset, a standard benchmark dataset for aspect categorization in the restaurant domain. We observed that semantic associations can complement statistical association and improve the accuracy of classification. The proposed method performs better than several state-of-the-art methods.
In this paper, our focus is to capture the limitations of Pseudo-Relevance Feedback (PRF) based q... more In this paper, our focus is to capture the limitations of Pseudo-Relevance Feedback (PRF) based query expansion (QE) and propose a hybrid method to improve the performance of PRF-based QE by combining corpus-based term co-occurrence information, context window of query terms and semantic information of term. Firstly, the paper suggests use of various corpus-based term co-occurrence approaches to select an optimal combination of query terms from a pool of terms obtained using PRF-based QE. Third, we use semantic similarity approach to rank the QE terms obtained from top feedback documents. Fourth, we combine co-occurrence, context window and semantic similarity based approaches together to select the best expansion for query reformulation. The experiments were performed on FIRE ad-hoc and TREC-3 benchmark datasets of information retrieval task. The results show significant improvement in terms of precision, recall and mean average precision (MAP). This experiment shows that the combination of various techniques in an intelligent way gives us goodness of all of them.
International Journal of Information Retrieval Research, 2018
This article proposes a new concept of Lexical Network for Automatic Text Document Summarization.... more This article proposes a new concept of Lexical Network for Automatic Text Document Summarization. Instead of a number of chains, the authors are getting a network of sentences which is called as Lexical Network termed as LexNetwork. This network is created between sentences based on different lexical and semantic relations. In this network, a node is representing sentences and edges are representing strength between two sentences. Strength means the number of relations present between the two sentences. The importance of the sentences is decided based on different centrality measures and extracted for the summary. WSD is done with Simple Lesk technique, and Cosine-Similarity threshold (Ɵ, TH) is used as post processing task. In this article, the authors are suggesting that a Cosine similarity threshold 10% is better vs. 5%, and an Eigen-Value based centrality measure is better for summarization process. At last for comparison, they are using Semantrica-Lexalytics System.
International Journal on Semantic Web and Information Systems, 2018
Automatic text document summarization is active research area in text mining field. In this artic... more Automatic text document summarization is active research area in text mining field. In this article, the authors are proposing two new approaches (three models) for sentence selection, and a new entropy-based summary evaluation criteria. The first approach is based on the algebraic model, Singular Value Decomposition (SVD), i.e. Latent Semantic Analysis (LSA) and model is termed as proposed_model-1, and Second Approach is based on entropy that is further divided into proposed_model-2 and proposed_model-3. In first proposed model, the authors are using right singular matrix, and second & third proposed models are based on Shannon entropy. The advantage of these models is that these are not a Length dominating model, giving better results, and low redundancy. Along with these three new models, an entropy-based summary evaluation criteria is proposed and tested. They are also showing that their entropy based proposed models statistically closer to DUC-2002's standard/gold summary. ...
Uploads
Papers by Aditi Sharan