Topic Detection and Tracking
54 Followers
Recent papers in Topic Detection and Tracking
The Linguistic Data Consortium at the University of Pennsylvania has recently been engaged in the creation of large-scale annotated corpora of broadcast news materials in support of the ongoing Topic Detection and Tracking (TDT) research... more
In this paper, two clustering algorithms called dynamic hierarchical compact and dynamic hierarchical star are presented. Both methods aim to construct a cluster hierarchy, dealing with dynamic data sets. The first creates disjoint... more
Ce travail porte sur la question de la visualisation thématique en recherche d’informations. Dans un contexte de plus en plus prégnant de circulation d’informations et face à d’importants flux de données il convient de synthétiser... more
A crucial step in processing speech audio data for information extraction, topic detection, or browsing/playback is to segment the input into sentence and topic units. Speech segmentation is challenging, since the cues typically present... more
Topic detection and tracking (TDT) applications aim to organize the temporally ordered stories of a news stream according to the events. Two major problems in TDT are new event detection (NED) and topic tracking (TT). These problems focus... more
Web mining - is the application of data mining techniques to discover patterns from the Web. Topic tracking is one of the technologies that has been developed and can be used in the text mining process. The main purpose of topic tracking... more
A methodology for automatically identifying and clustering semantic features or topics in a heterogeneous text collection is presented. Textual data is encoded using a low rank nonnegative matrix factorization algorithm to retain natural... more
Resumen: En los últimos años las de tecnologías de visualización de información y minería de datos se han posicionado como herramientas clave para el análisis de grandes almacenes digitales de documentación científica. Un ejemplo de estas... more
This paper presents a keyword extraction technique that can be used for tracking topics over time. In our work, keywords are a set of significant words in an article that gives high-level description of its contents to readers.... more
Near-duplicate keyframes (NDK) play a unique role in large-scale video search, news topic detection and tracking. In this paper, we propose a novel NDK retrieval approach by exploring both visual and textual cues from the visual... more
How to cope with information overload is becoming an increasingly important problem even for scientists. Search engines such as Scholar, CiteSeer, SmealSearch, Google, MSN and Yahoo tries to solve this problem by indexing (variable size)... more
In this paper propose a topic tracking and visualization method using Independent Topic Analysis. Independent Topic Analysis is a method for extracting mutually independent topics from the documents data by using the Independent Component... more
In recent years, the rapid growth of the Internet has changed the way people interact globally. The internet usage is quite diverse, which one of them is a media to collect user generated content, including online review. Public sentiment... more
This paper presents the TNO tracking system which was evaluatedat the 2000 Topic Detection and Tracking evaluation project (TDT2000). The objective of the TDT tracking task is to track eventsof interest over time. We built a baseline... more
In this paper, we carry out a study about the main themes treated by the International Journal of Information Technology & Decision Making during its¯rst 10 years (2002À2011). The themes are detected, quanti¯ed and visualized using an... more
Microblog is a social network service which is able to aggregate messages to explore new knowledge. Nowadays, more and more users contribute what they found and what they thought by posting short messages. This phenomenon makes people... more
In this work, we present a new semantic language modeling approach to model news stories in the Topic Detection and Tracking (TDT) task. In the new approach, we build a unigram language model for each semantic class in a news story. We... more
Rapid proliferation of the World Wide Web led to an enormous increase in the availability of textual corpora. In this paper, the problem of topic detection and tracking is considered with application to news items. The proposed approach... more
Web content clustering is very important part of topic detection and tracking issue. In our paper we focus on pre-processing phase of web content clustering. We focus on blog articles published in Slovak language. We evaluate the impact... more
The Center for Intelligent Information Retrieval at UMass Amherst submitted runs for all four tasks, namely, Hierarchical Topic Detection, Topic Tracking, New Event Detection and Link Detection. In this paper, we describe our models,... more
This paper introduces Topic Tracking for Punjabi language. Text mining is a field that automatically extracts previously unknown and useful information from unstructured textual data. It has strong connections with natural language... more
The Linguistic Data Consortium at the University of Pennsylvania has recently been engaged in the creation of large-scale annotated corpora of broadcast news materials in support of the ongoing Topic Detection and Tracking (TDT) research... more
Information Retrieval (IR) aims at modelling, designing and implementing systems able to provide fast and effective content-based access to a large amount of information. Information can be of any kind: textual, visual, or auditory. The... more
We describe a new probabilistic Sentence Tree Language Modeling approach that captures term dependency patterns in Topic Detection and Tracking's (TDT) Story Link Detection task. New features of the approach include modeling the... more
Story clustering is a critical step for news retrieval, topic mining, and summarization. Nonetheless, the task remains highly challenging owing to the fact that news topics exhibit clusters of varying densities, shapes, and sizes.... more
Topic Detection and Tracking (TDT) tasks are evaluated using a cost function. The standard TDT cost function assumes a constant probability of relevance P (rel) across all topics. In practice, P (rel) varies widely across topics. We argue... more
We present an algorithm that allows for indexing music by topic. The application scenario is an information retrieval system into which any song with known lyrics can be inserted and indexed so as to make a music collection browseable by... more
Topic detection and tracking and topic segmentation play an important role in capturing the local and sequential information of documents. Previous work in this area usually focuses on single documents, although similar multiple documents... more
In this paper we introduce a probabilistic framework to exploit hierarchy, structure sharing and duration information for topic transition detection in videos. Our probabilistic detection framework is a combination of a shot... more
Tracking topics on social media streams is non-trivial as the number of topics mentioned grows without bound. This complexity is compounded when we want to track such topics against other fast moving streams. We go beyond traditional... more
Sentiment analysis or opinion mining aims to use automated tools to detect subjective information such as opinions, attitudes, and feelings expressed in text. This paper proposes a novel probabilistic modeling framework called joint... more
The technologies for single-and multi-document summarization that are described and evaluated in this article can be used on heterogeneous texts for different summarization tasks. They refer to the extraction of important sentences from... more
The technologies for single-and multi-document summarization that are described and evaluated in this article can be used on heterogeneous texts for different summarization tasks. They refer to the extraction of important sentences from... more
From last few decades there is wide spread usage of social network platforms such as twitter or other micro blogging systems which contains huge amount of timely generated data. Tweeter is fastest means of information sharing where user... more
This paper presents several methods for topic detection on newspaper articles, using either a general vocabulary or topic-specific vocabularies. Specific vocabularies are determined manually or statistically. In both cases, we aim at... more
Continuing progress in the automatic transcription of broadcast speech via speech recognition has raised the possibility of applying information retrieval techniques to the resulting (errorful) text. In this paper we describe a general... more
Information retrieval (IR) research has reached a point where it is appropriate to assess progress and to define a research agenda for the next five to ten years. This report summarizes a discussion of IR research challenges that took... more
Information retrieval is moving beyond the stage where users simply type one or more keywords and retrieve a ranked list of documents. In such a scenario users have to go through the returned documents in order to find what they are... more
Topics in situated and task oriented communication depend heavily on the given, often changing environment, making the detection of predetermined topics in many cases useless. Detection of non-predefined topics can enhance... more
Twitter is a user-generated content system that allows its users to share short text messages, called tweets, for a variety of purposes, including daily conversations, URLs sharing and information news. Considering its world-wide... more
In this paper we introduce a probabilistic framework to exploit hierarchy, structure sharing and duration information for topic transition detection in videos. Our probabilistic detection framework is a combination of a shot... more
The TDT-3 Text and Speech Corpus expands on previous phases of Topic Detection and Tracking data collections, by increasing the number of news sources being sampled, by including Mandarin Chinese as well as English news data, and by... more
The LDC began its first Broadcast News (BN) speech collection in the spring of 1996, facing a host of challenges including IPR negotiations with broadcasters, establishment of new transcription conventions and tools, and a compressed... more
First Story Detection is hard because the most accurate systems become progressively slower with each document processed. We present a novel approach to FSD, which operates in constant time/space and scales to very high volume streams. We... more
Nous présentons dans cet article une mémoire de traduction sous-phrastique sensible au domaine de traduction, une première étape vers l'intégration du contexte. Ce système est en mesure de recycler les traductions déjà « vues » par la... more
As part of MITRE's work under the DARPA TIDES (Translingual Information Detection, Extraction and Summarization) program, we are preparing a series of demonstrations to showcase the TIDES Integrated Feasibility Experiment on Bio-Security... more
We introduce the relative rank differential statistic which is a non-parametric approach to document and dialog analysis based on word frequency rank-statistics. We also present a simple method to establish semantic saliency in dialog,... more