Macquarie University
Browse

A Syntactically Controlled Sentence Corpus for the Evaluation of Computational Reading Models

Download (1023.81 kB)
thesis
posted on 2025-05-28, 04:02 authored by Andrew Tien Phat Tran

The development of computational models that simulate how humans read text is a longstanding endeavour in psycholinguistic research. Central to these efforts is the establishment of corpora of sentences that have theoretically interesting linguistic properties which can be used to test these models. However, with the recent advent of highly sophisticated models that attempt to simulate reading in its entirety (Reichle, 2021), there is a need for a suitable corpus of more complicated sentences to evaluate their performance. To this end, we developed a large corpus of sentences in which we either manipulated the (1) real-world plausibility of events and the ambiguity of the main verb, or (2) whether the sentences included a lexically ambiguous homograph that was disambiguated before or after it was read. Participants read these sentences while their eyes were tracked, and we measured their working memory spans and spelling ability. For the first group of sentences, we found evidence for early syntactic processing of sentences and that the plausibility of the events only affected later measures of syntactic parsing. For the second group of sentences, we found that homographs with equally frequent meanings were read more slowly because those meanings competed for activation, and that participants were largely unable to use the preceding context to rapidly disambiguate the meaning of the homographs. Finally, our analyses of the individual difference measures suggested that better spellers generally have an easier time encoding incoming linguistic input, whereas readers with higher working memory capacity tend to have an easier time integrating sentence input with their knowledge structures as required for its subsequent evaluation. The data and sentences from this corpus should therefore provide important new benchmarks for evaluating current and future computational models of reading by allowing us to test their theoretical assumptions.

History

Table of Contents

1. Introduction -- 2. Method -- 3. Results -- 4. General Discussion -- References -- Appendix A -- Appendix B -- Appendix C

Awarding Institution

Macquarie University

Degree Type

Thesis MRes

Degree

Master of Research

Department, Centre or School

School of Psychological Sciences

Year of Award

2025

Principal Supervisor

Erik Reichle

Additional Supervisor 1

Sachiko Kinoshita

Rights

Copyright: The Author Copyright disclaimer: https://www.mq.edu.au/copyright-disclaimer

Language

English

Extent

108 pages

Former Identifiers

AMIS ID: 456579

Usage metrics

    Macquarie University Theses

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC