Macquarie University
Browse

Named entity extraction in historical Australian newspaper text

Download (1.17 MB)
thesis
posted on 2022-03-28, 16:37 authored by A. H. M. Quamruzzaman
A Named Entity Recognition (NER) objective is to extract and to classify atomic entities in text such as proper names (Names and locations), temporal expressions and other specific notation identification. In this project, we will apply NER methods to historical newspaper text taken from the Trove archive in the National Library of Australia. We will present an evaluation of various available NER systems on a hand-annotated sample of newspaper text. We will then present the result of applying the system to the whole corpus of text. Even when the occurrence of a given name is known across a large data set, there may be many individuals who share that name; this is particularly evident in the Trove corpus since it spans a long time period (1803-1959). In the second part of this project we will develop methods to try to classify different individuals with the same name. In particular, we will classify names as either Politician, Entertainer or other based on the documents that they occur in.

History

Table of Contents

1. Introduction -- 2. Literature review -- 3. Name entity recognition on Trove newspaper text -- 4. Document classiffcation on Trove newspaper texts -- 5. Discussion -- 6. Conclusion -- Bibliography.

Notes

Theoretical thesis. Bibliography: pages 58-62

Awarding Institution

Macquarie University

Degree Type

Thesis MRes

Degree

MRes, Macquarie University, Faculty of Science and Engineering, Department of Computing

Department, Centre or School

Department of Computing

Year of Award

2016

Principal Supervisor

Steve Cassidy

Rights

Copyright A.H.M. Quamruzzaman 2015. Copyright disclaimer: http://mq.edu.au/library/copyright

Language

English

Jurisdiction

Tien Shan

Extent

1 online resource (ix, 62 pages) graphs, tables

Former Identifiers

mq:69467 http://hdl.handle.net/1959.14/1254697