Variation and innovation in modern English: corpus-based studies in the grammaticalization of multiword units
thesisposted on 2022-03-28, 09:50 authored by Adam Michael Richard Smith
This dissertation is an empirical inquiry into the lexicalization and grammaticalization of several types of multiword units, whose status as fixed lexical units has not been established, and whose grammatical structure and roles are still open to question. They remain on the fringes of codification and classification in current dictionaries and grammars. The set of four published papers embodied in this dissertation investigate light verbs (e.g. have a look), non-numerical quantifiers (e.g. a lot of), complex prepositions (e.g. in spite of) and complex subordinators (e.g. the moment). In their structure, each of these includes a noun phrase, but as units they constitute different grammatical functions, those of the verb, determiner, preposition and subordinator respectively. These four types of multiword unit have been examined to assess how well they meet the standard criteria for grammaticalization, such as fixity, decategorialization and syntactic reanalysis. A range of standard corpora were used for this study, allowing investigation into the synchronic variation of the items under discussion across different English language regions and registers, along with some research into recent diachronic developments. Corpora of different sizes were selected to provide sufficient data on high- and low-frequency items. For higher frequency items, the Australian, British and New Zealand components of the 1million-word International Corpus of English (ICE), as well as ICE-US (written only), complemented by the spoken Santa Barbara Corpus were used. These smaller corpora also allowed the individual linguistic contexts of examples to be more closely examined. For lower frequency items the British National Corpus and Corpus of Contemporary American English were used, as well as the Corpus of Historical American English, which provided some diachronic data. Selective examples of linguistic contexts were elicited from these larger corpora (100million-word and over) and non-relevant usages were excluded from the frequency counts by the use of search strings adapted to each item. For each data set, the frequency of fixed and variable forms of the multiword units were compared, and the wider context also examined to find examples of indeterminate grammatical use, manifested by factors such as clause position and inconsistent patterns of concord. Data was also gathered from comprehensive and learner grammars, and dictionaries for first- and second-language users, to gauge the degree of recognition of these marginal/emergent items. The body of research finds that, while each of the multiword units investigated is lexicalized to some extent, there is also syntagmatic evidence of grammaticalization in two cases. The grammatical status of the unit was indicated in the case of non-numerical quantifiers by whether the singular or plural quantifying noun agrees with the following verb; and for complex subordinators by the absence of a preceding preposition and following relative pronoun, and especially its position at the start of a clause. The thesis demonstrates that several criteria are necessary to demonstrate the grammatical status of a multiword unit, and that some criteria, such as decategorialization, may be less indicative than others. The study proposes a systematic, corpus-based approach towards identifying and classifying emerging multiword units, so as to improve coverage of their contemporary lexicogrammatical functions within grammars and dictionaries.