William Teahan

Followers

Following

Co-authors

Public Views

Interests

Uploads

Papers by William Teahan

Unbounded Length Contexts for PPM

by I. Witten and William Teahan

The Computer Journal, 1997

The PPM data compression scheme has set the performance standard in lossless compression of text ... more The PPM data compression scheme has set the performance standard in lossless compression of text throughout the past decade. PPM is a "nite-context statistical modelling technique that can be viewed as blending together several "xed-order context models to predict the next character in the input sequence. This paper gives a brief introduction to PPM, and describes a variant of the algorithm, called PPM*, which exploits contexts of unbounded length. Although requiring considerably greater computational resources (in both time and space), this reliably achieves compression superior to the benchmark PPMC version. Its major contribution is that it shows that the full information available by considering all substrings of the input string can be used effectively to generate high-quality predictions. Hence, it provides a useful tool for exploring the bounds of compression.

Download

Probability estimation for PPM

ABSTRACT The state of the art in lossless text compression is the PPM data compression scheme. Tw... more ABSTRACT The state of the art in lossless text compression is the PPM data compression scheme. Two approaches to the problem of selecting the context models used in the scheme are described. One uses an a priori upper bound on the lengths of the contexts, while the other method is unbounded. Several techniques that improve the probability estimation are described, including four new methods: partial update exclusions for the unbounded approach, deterministic scaling, recency scaling and multiple probability estimators. Each of these methods improves the performance for both the bounded and unbounded approaches. In addition, further savings are possible by combining the two approaches. 1 Introduction The state of the art in lossless text compression is the PPM data compression scheme [1, 4]. PPM, or prediction by partial matching, is an adaptive statistical modeling technique based on blending together different length context models to predict the next character in the input sequence. The sche...

Bangor at TREC 2004: Question Answering Track

Trec, 2004

This paper describes the participation of the School of Informatics, University of Wales, Bangor ... more This paper describes the participation of the School of Informatics, University of Wales, Bangor in the 2004 Text Retrieval Conference. We present additions and modications to the QITEKAT system, initially developed as an entry for the 2003 QA evaluation, including automated regular expression induction, improved question matching, and application of our knowledge framework to the modied question types presented in the 2004 track. Results are presented which show improvements on last year's performance, and we discuss future directions for the system.

Download

Grammatical Evolution and the Santa Fe Trail Problem

by Loukas Georgiou and William Teahan

ParCop: A Decentralized Peer-to-Peer Computing System

Third International Symposium on Parallel and Distributed Computing/Third International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Networks, 2004

We present ParCop, a decentralized peer-to-peer (P2P) computing system. In ParCop, the data and t... more We present ParCop, a decentralized peer-to-peer (P2P) computing system. In ParCop, the data and tasks are mobilized and flow freely between the computational resources (peers). ParCop allows each peer to utilize as well as to offer computing resources. ParCop uses the P2P model to guard against common problems that other systems suffer from, such as server failure and connection bottleneck.

A repetition based measure for verification of text collections and for text categorization

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval - SIGIR '03, 2003

ABSTRACT We suggest a way for locating duplicates and plagiarisms in a text collection using an R... more

Knowing-Aboutness: Question-Answering Using a Logic-Based Framework

Lecture Notes in Computer Science, 2005

ABSTRACT We describe the background and motivation for a logic-based framework, based on the theo... more ABSTRACT We describe the background and motivation for a logic-based framework, based on the theory of “Knowing-Aboutness”, and its specific application to Question-Answering. We present the salient features of our system, and outline the benefits of our framework in terms of a more integrated architecture that is more easily evaluated. Favourable results are presented in the TREC 2004 Question-Answering evaluation.

Directions for Open Access Publishing

Journal of Computer Science & Systems Biology, 2014

Directions for Bio-Inspired Artificial Intelligence

Journal of Computer Science & Systems Biology, 2012

A Compression-based Algorithm for Chinese Word Segmentation

Computational Linguistics, 2000

Chinese is written without using spaces or other word delimiters. Although a text may be thought ... more Chinese is written without using spaces or other word delimiters. Although a text may be thought of as a corresponding sequence of words, there is considerable ambiguity in the placement of boundaries. Interpreting a text as a sequence of words is beneficial for some information retrieval and storage tasks:for example,full-text search, word-based compression, and keyphrase extraction. We describe a scheme that infers appropriate positions for word boundaries using an adaptive language model that is standard in text compression. It is trained on a corpus of presegmented text, and when applied to new text, interpolates word boundaries so as to maximize the compression obtained. This simple and general method performs well with respect to specialized schemes for Chinese language segmentation.

Download

The Blackboard Resource Discovery Mechanism for Distributed Computing Over Peer-to-Peer Networks

A New Parallel Corpus of Arabic/English

Proceedings of the Eighth Saudi Students Conference in the UK, 2015

Does the Perceived Identity of Non-player Characters Change How We Interact with Them?

by William Teahan and Christopher Headleand

2015 International Conference on Cyberworlds (CW), 2015

Towards Crowd-Sourced Parameter Optimisation for Procedural Animation

by William Teahan, Christopher Headleand, and Gareth Henshall

2015 International Conference on Cyberworlds (CW), 2015

Applying Compression to Natural Language Processing

ABSTRACT A number of powerful modelling techniques have been developed in recent years to compres... more ABSTRACT A number of powerful modelling techniques have been developed in recent years to compress natural language text. The best of these are adaptive models operating on the character and word level which are able to perform almost as well as humans at predicting text. We show how to apply character based methods to five areas where language modelling is critical, providing novel solutions to each of these problems.

Using Compression-Based Language Models for Text Categorization

Text categorization is the problem of assigning text to any of a set of pre-specified categories.... more Text categorization is the problem of assigning text to any of a set of pre-specified categories. It is useful in indexing documents for later retrieval, as a stage in natural language processing systems, for content analysis, and in many other roles. We with to use language models developed for text compression as the basis of a text categorization scheme and

Unbounded Length Contexts for PPM

The Computer Journal, 1997

Download

Template based evolution

Proceeding of the Fifteenth Annual Conference Companion on Genetic and Evolutionary Computation Conference Companion, 2013

This paper describes a novel approach to multi-agent simulation where agents evolve freely within... more This paper describes a novel approach to multi-agent simulation where agents evolve freely within their environment. We present Template Based Evolution (TBE), a genetic evolution algorithm that evolves behaviour for embodied situated agents whose fitness is tested implicitly through repeated trials in an environment. All agents that survive in the environment breed freely, creating new agents based on the average genome of two parents. This paper describes the design of the algorithm and applies it to a model where virtual migratory creatures are evolved to survive the simulated environment. Comparisons made between the evolutionary responses of the artificial creatures and observations of natural systems justify the strength of the methodology for species simulation.

Download

Experiments on the Zero Frequency Problem

Download

Unbounded context lengths for ppm

Unbounded Length Contexts for PPM

by I. Witten and William Teahan

The Computer Journal, 1997

Download

Probability estimation for PPM

Bangor at TREC 2004: Question Answering Track

Trec, 2004

Download

Grammatical Evolution and the Santa Fe Trail Problem

by Loukas Georgiou and William Teahan

ParCop: A Decentralized Peer-to-Peer Computing System

Third International Symposium on Parallel and Distributed Computing/Third International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Networks, 2004

A repetition based measure for verification of text collections and for text categorization

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval - SIGIR '03, 2003

ABSTRACT We suggest a way for locating duplicates and plagiarisms in a text collection using an R... more

Knowing-Aboutness: Question-Answering Using a Logic-Based Framework

Lecture Notes in Computer Science, 2005

Directions for Open Access Publishing

Journal of Computer Science & Systems Biology, 2014

Directions for Bio-Inspired Artificial Intelligence

Journal of Computer Science & Systems Biology, 2012

A Compression-based Algorithm for Chinese Word Segmentation

Computational Linguistics, 2000

Download

The Blackboard Resource Discovery Mechanism for Distributed Computing Over Peer-to-Peer Networks

A New Parallel Corpus of Arabic/English

Proceedings of the Eighth Saudi Students Conference in the UK, 2015

Does the Perceived Identity of Non-player Characters Change How We Interact with Them?

by William Teahan and Christopher Headleand

2015 International Conference on Cyberworlds (CW), 2015

Towards Crowd-Sourced Parameter Optimisation for Procedural Animation

by William Teahan, Christopher Headleand, and Gareth Henshall

2015 International Conference on Cyberworlds (CW), 2015

Applying Compression to Natural Language Processing

Using Compression-Based Language Models for Text Categorization

Unbounded Length Contexts for PPM

The Computer Journal, 1997

Download

Template based evolution

Proceeding of the Fifteenth Annual Conference Companion on Genetic and Evolutionary Computation Conference Companion, 2013

Download

Experiments on the Zero Frequency Problem

Download

Unbounded context lengths for ppm

A New Thinning Algorithm for Arabic Script

by Journal of Computer Science IJCSIS, Mansoor A. Al Ghamdi, and William Teahan

Thinning is one of the critical processes for different applications in image analysis, in partic... more Thinning is one of the critical processes for different applications in image analysis, in particular for Optical Character Recognition (OCR) applications. The accuracy performance of OCR relies on the effectiveness of thinning algorithms. However, previously there has been little attention paid for proposing thinning algorithms for Arabic script. Also, there is a lack of quantitative performance measures of thinning techniques for Arabic script. Consequently, it is unclear which thinning algorithms are more appropriate for Arabic script. In this paper, a new thinning algorithm for Arabic script is proposed with several new performance metrics. An experiment is conducted to evaluate the proposed algorithm against two well established thinning algorithms with respect to the several proposed objective performance metrics. The experimental results show that the new algorithm has the best performance among the other two thinning algorithms.

Download

William Teahan

Uploads

Papers by William Teahan

Log In