Bill Teahan's Publications


D. V. Khmelev and W. J. Teahan.
Verification of text collections for text categorization and natural language processing.
Technical Report AIIA 03.1, School of Informatics, University of Wales, Bangor, 2003.

W. J. Teahan and D. J. Harper.
Using compression based language models for text categorization.
In J. Callan, B. Croft and J. Lafferty, editors, Workshop on Language Modeling and Information Retrieval, pages 83-88. ARDA, Carnegie Mellon University, 2001.

W. J. Teahan and D. J. Harper.
Combining PPM models using a text mining approach.
In J. A. Storer and M. Cohn, editors, Proceedings of the IEEE Data Compression Conference, pages 153-162. IEEE Computer Society, Snowbird, UT, 2001.
ISBN 0-7695-0594-5.

W. J. Teahan, Y. Wen*, R. McNab*, and I. H. Witten*.
A Compression-based Algorithm for Chinese Word Segmentation.
Computational Linguistics, 26(3):375-393, 2000.
ISSN 0891-2017.

W. J. Teahan.
Text Classification and Segmentation Using Minimum Cross-Entropy.
In Proceedings of the International Conference on Content-based Multimedia Information Access (RIAO 2000), pages 943-961. C.I.D.-C.A.S.I.S, Paris, France, 2000.
ISBN 2-905450-07-X.

W. J. Teahan.
An improved interface for probabilistic models of text.
In Technical Report. School of Computer and Mathematical Sciences, The Robert Gordon University, 2000.

J. G. Cleary* and W. J. Teahan.
An open interface for probabilistic models of text.
In Proceedings of the IEEE Data Compression Conference. IEEE Computer Society Press, 1999.

I. H. Witten*, Z. Bray*, M. Mahoui*, and W. J. Teahan.
Text mining: A new frontier for lossless compression.
In IEEE Data Compression Conference. IEEE Computer Society Press, 1999.

I. H. Witten, Z. Bray*, M. Mahoui*, and W. J. Teahan.
Using language models for generic entity extraction.
In ICML-99 Workshop on Machine learning in text data analysis. Stockholm, Sweden, 1999.

William John Teahan.
Modelling English Text.
PhD thesis, University of Waikato, 1998.

W. J. Teahan, S. Inglis*, J. G. Cleary*, and G. Holmes*.
Correcting English text using PPM models.
In IEEE Data Compression Conference, pages 43-52. IEEE Computer Society, Snowbird, UT, 1998.
ISBN 0-8186-8406-2.

W. J. Teahan and J. G. Cleary*.
Tag-based models of English text.
In IEEE Data Compression Conference, pages 43-52. IEEE Computer Society, Snowbird,UT, 1998.
ISBN 0-8186-8406-2.

J. G. Cleary* and W. J. Teahan.
Unbounded length contexts for PPM.
Computer Journal, 40(2/3):67-75, 1997.
ISSN 1460-2067.
Invited Paper.

W. J. Teahan and J. G. Cleary*.
Models of English text.
In IEEE Data Compression Conference. IEEE Computer Society Press, 1997.

W. J. Teahan and J. G. Cleary*.
The entropy of English using PPM-based models.
In IEEE Data Compression Conference. IEEE Computer Society Press, 1996.

J. G. Cleary* and W. J. Teahan.
Unbounded length contexts for PPM.
In IEEE Data Compression Conference. IEEE Computer Society Press, 1995.

J. G. Cleary* and W. J. Teahan.
Experiments on the zero frequency problem.
In IEEE Data Compression Conference. IEEE Computer Society Press, 1995.

W. J. Teahan.
Probability estimation for PPM.
In Proceedings of the New Zealand Computer Science Research Students' Conference. University of Waikato, Hamilton, New Zealand, 1995.



Bill Teahan (wjt@informatics.bangor.ac.uk)