Document Classification Research Papers

Bookmark
Download
- by Helene Fowkes
- •
- 3
  Learning, Document Classification, Chi Square Test

Interest in the area of pattern recognition has been renewed recently due to emerging applications which are not only challenging but also computationally more demanding. These applications include data mining (identifying a "pattern",... more

Bookmark
Download
- by Dr. D. H. Rao
- •
- 14
  Computer Science, Data Mining, Pattern Recognition, Neural Network

Legal texts play an essential role in the organisation, be it public or private where each actor must be aware of, and comply with regulations. However, because of the difficulties of the legal domain, the actors prefer to rely on the... more

Bookmark
- by Nasria BOUHYAOUI and +1
  Fatima Laallam
- •
- 4
  Annotation, Arabic Natural Language Processing, Document Classification, Legal text

In recent years, XML has been established as a major means for information management, and has been broadly utilized for complex data representation (e.g. multimedia objects). Owing to an unparalleled increasing use of the XML standard,... more

To help the growing qualitative and quantitative demands for information from the WWW, efficient automatic Web page classifiers are urgently needed. However, a classifier applied to the WWW faces a huge-scale dimensionality problem since... more

The web contains a wealth of product reviews, but sifting through them is a daunting task. Ideally, an opinion mining tool would process a set of search results for a given item, generating a list of product attributes (quality, features,... more

Bookmark
Download
- by Kushal Dave
- •
- 6
  Information Retrieval, Machine Learning, Web search, Feature Extraction

The widespread use of information technologies for construction is considerably increasing the number of electronic text documents stored in construction management information systems. Consequently, automated methods for organizing and... more

Document image classification is an important step in Office Automation, Digital Libraries, and other document image analysis applications. There is great diversity in document image classifiers: they differ in the problems they solve, in... more

Bookmark
Download
- by dengel washington
- •
- 19
  Artificial Intelligence, Image Processing, Fuzzy Logic, Modeling

Frequent itemset mining (FIM) is a core operation for several data mining applications as association rules computation, correlations, document classification, and many others, which has been extensively studied over the last decades.... more

— With the increasing availability of electronic documents and the rapid growth of the World Wide Web, the task of automatic categorization of documents became the key method for organizing the information and know-ledge discovery. Proper... more

"Este manual originou-se da necessidade de padronização e instruções de normalização mais detalhadas para a entrada de termos de indexação. Almeja-se a recuperação da informação de maneira uniforme e apropriada nos sistemas de informação... more

TWLT is an acronym of Twente Workshop(s) on Language Technology. These workshops on natural language theory and technology are organised by the Parlevink Project, a language theory and technology project of the . For each workshop... more

With the increasing availability of electronic documents and the rapid growth of the World Wide Web, the task of automatic categorization of documents became the key method for organizing the information and knowledge discovery. Proper... more

Instead of trying to produce a programme to simulate the adult mind, why not rather try to produce one which simulates the child's? If this were then subjected to an appropriate course of education one would obtain the adult brain. A mis... more

BackgroundOpen-source clinical natural-language-processing (NLP) systems have lowered the barrier to the development of effective clinical document classification systems. Clinical natural-language-processing systems annotate the syntax... more

Bookmark
Download
- by Julie Womack and +1
  Amy Justice
- •
- 15
  Engineering, Natural Language Processing, Machine Learning, Data Mining

Integrating Different Strategies for Cross-Language Information Retrieval in the MIETTA Project Paul Buitelaar, Klaus Netter, Feiyu Xu DFKI Language Technology Lab Stuhlsatzenhausweg 3, 66123 Saarbrücken, Germany {paulb, netter, feiyu}@... more

Automatic classification has become an important research area due to the rapid increase of digital information. Evidently, manual classification of documents is a tough work due to occurrences of vocabulary ambiguities of classification... more

The use of ontology in order to provide a mechanism to enable machine reasoning has continuously increased during the last few years. This paper suggests an automated method for document classification using an ontology, which expresses... more

Bookmark
Download
- by Peng Zhou
- •
- 4
  Ontology, Machine Learning, Data Mining, Document Classification

An increasing and overwhelming amount of biomedical information is available in the research literature mainly in the form of free-text. Biologists need tools that automate their information search and deal with the high volume and... more

Automatic document classification due to its various applications in data mining and information technology is one of the important topics in computer science. Classification plays a vital role in many information management and retrieval... more

A method of document comparison based on a hierarchical dictionary of topics (concepts) is described. The hierarchical links in the dictionary are supplied with the weights that are used for detecting the main topics of a document and for... more

It is well known that links are an important source of information when dealing with Web collections. However, the question remains on whether the same techniques that are used on the Web can be applied to collections of documents... more

The automatic classification of legal case documents has become very important, owing to the justice denials, delays and failures observed in the judicial case management systems. Our hybrid text classification model employed extensive... more

Bookmark
Download
- by Chinedu Obasi and +1
  Dr Ugwu Chidiebere
- •
- 6
  Machine Learning, Text Mining, Support Vector Machines, Statistical machine learning

Product or company names used in this set are for identification purposes only. Inclusion of the names of the products or companies does not indicate a claim of ownership by IGI Global of the trademark or registered trademark.

Bookmark
Download
- by Alfonso E. Romero
- •
- 4
  Text Classification, Thesaurus, Document Classification, Thesauri

Automated document classification is the machine learning fundamental that refers to assigning automatic categories among scanned images of the documents. It reached the state-of-art stage but it needs to verify the performance and... more

Bookmark
Download
- by Suleiman M . A . Gargaare and +1
  Faizur Rashid
- •
- 6
  Algorithms, Artificial Intelligence, Natural Language Processing, Machine Learning

The combination of multiple features or views when representing documents or other kinds of objects usually leads to improved results in classification (and retrieval) tasks. Most systems assume that those views will be available both at... more

Pattern classification has been successfully applied in many problem domains, such as biometric recognition, document classification or medical diagnosis. Missing or unknown data are a common drawback that pattern recognition techniques... more

The amount of narrative clinical text documents stored in Electronic Patient Records (EPR) of Hospital Information Systems is increasing. Physicians spend a lot of time finding relevant patient-related information for medical decision... more

With the increasing availability of electronic documents and the rapid growth of the World Wide Web, the task of automatic categorization of documents became the key method for organizing the information and knowledge discovery. Proper... more

Bookmark
Download
- by Baharum Baharudin and +1
  Khairullah Khan
- •
- 7
  Information Systems, Information Retrieval, Information Technology, Machine Learning

With the increased use of Internet, a large number of consumers first consult on line resources for their healthcare decisions. The problem of the existing information structure primarily lies in the fact that the vocabulary used in... more

Bookmark
Download
- by Luis de Campos
- •
- 4
  Text Classification, Thesaurus, Document Classification, Thesauri

In this work, we jointly apply several text mining methods to a corpus of legal documents in order to compare the separation quality of two inherently different document classification schemes. The classification schemes are compared with... more

The amount of narrative clinical text documents stored in Electronic Patient Records (EPR) of Hospital Information Systems is increasing. Physicians spend a lot of time finding relevant patient-related information for medical decision... more

The goal of the reported research is the development of a computational approach that could help a cognitive scientist to interactively represent a learner's mental models, and to automatically validate their coherence with respect to the... more

Classifier for the Internet Resource Discovery (ACIRD), which uses machine learning techniques to organize and retrieve Internet documents. ACIRD consists of a knowledge acquisition process, document classifier and two-phase search... more

Bookmark
Download
- by Jan-Ming Ho
- •
- 11
  Information Retrieval, Machine Learning, Data Mining, Search Engines

0-7803-7868-7/03/$17.00 0 2003 IEEE.

Bookmark
Download
- by Debashis Ghosh
- •
- 18
  Data Mining, Image Analysis, Writing, Image Classification

Quantifying the concept of co-occurrence and iterated co-occurrence yields indices of similarity between words or between documents. These similarities are associated with a reversible Markov transition matrix, the formal properties of... more

We propose a simple Bayesian network-based text classifier, which may be considered as a discriminative counterpart of the generative multinomial naive Bayes classifier. The method relies on the use of a fixed network topology with the... more

We propose a method which, given a document to be classified, automatically generates an ordered set of appropriate descriptors extracted from a thesaurus. The method creates a Bayesian network to model the thesaurus and uses... more

This paper uses Systemic Functional Linguistic (SFL) theory as a basis for extracting semantic features of documents. We focus on the pronominal and determination system and the role it plays in constructing interpersonal distance. By... more

Bookmark
Download
- by Jon Patrick
- •
- 20
  Discourse Analysis, Psychology, Geography, Computer Science

Email has become an important means of electronic communication but the viability of its usage is marred by Un-solicited Bulk Email (UBE) messages. UBE poses technical and socio-economic challenges to usage of emails. Besides, the... more

A new algorithm based on learning vector quantisation classifier is presented based on a modified proximity-measure, which enforces a predetermined correct classification level in training while using sliding-mode approach for stable... more

ii iii

ABSTRACT: Improvements in hardware, communication technology and database have led to the explosion of multimedia information repositories. In order to provide the quality of information retrieval and the quality of services, it is... more

Feature selection is of paramount concern in document classification process which improves the efficiency and accuracy of text classifier. Vector Space Model is used to represent the "Bag of Word" BOW of the documents with term weighting... more

Bookmark
Download
- by Aurangzeb Khan
- •
- 16
  Ontology, Machine Learning, Data Mining, Feature Selection

In this paper we propose a matching algorithm for measuring the structural similarity between an XML document and a DTD. The matching algorithm, by comparing the document structure against the one the DTD requires, is able to identify... more

We present a novel approach for classifying documents that combines different pieces of evidence (e.g., textual features of documents, links, and citations) transparently, through a data mining technique which generates rules associating... more

Bookmark
Download
- by Marco Cristo
- •
- 5
  Data Mining, Digital Library, Classification, Quality Criteria

This paper reports the results of an experiment in which an attempt is made to determine whether word length and sentence length can be considered as the two indispensable parameters in the identification of Bangla medical text documents,... more

Bookmark
Download
- by Prof. Niladri Sekhar Dash and +2
  Kaushik Roy
  Ankita Dhar
- •
- 2
  Text Categorization, Document Classification

In this paper we present the Dual Support Apriori for Temporal data (DSAT) algorithm. This is a novel technique for discovering Jumping Emerging Patterns (JEPs) from time series data using a sliding window technique. Our approach is... more

Bookmark
Download
- by Frans Coenen
- •
- 20
  Time Series, Temporal Data Mining, Text Classification, Graph Mining

Document Classification

Log In