Natural language processing methods for attitudinal near-synonymy

Gardiner, Mary E.

doi:10.25949/19440689.v1

Natural language processing methods for attitudinal near-synonymy

thesis

posted on 2022-03-28, 21:58 authored by Mary E. Gardiner

When either a human author or a computer natural language generation system tries to express an idea, there is usually more than one way to say it. This is a problem both for systems that process language, such as systems that recognise textual entailment, which must detect when two surface forms express the same idea; and for systems that generate language, which must choose the most appropriate way to express an idea from a potentially large number of surface forms. For a natural language generation system, for a given meaning there may be multiple words that could be chosen to express it, or multiple phrases that express the same idea. However, it has also been argued that there are no true synonyms, that even words that have very similar meanings cannot be substituted for each other in all circumstances. Automatic natural language generation systems therefore have a use for modules which make effective word and phrase choices among closely related alternatives. In this thesis we consider the specific problem of choosing an appropriate word or phrase where the alternatives are closely related in meaning but differ in sentiment or attitude. One example is stingy and frugal, one of which is critical of what it describes and the other of which is complimentary. The thesis will address three aspects of the problem. The first question is whether existing methods to predict word choice among closely related words are sufficient for choosing between words that differ in sentiment. There are several methods in the literature for this, relying on statistical models of words in context. The early, relatively poor performance of these methods had been used to argue that statistical methods are not suitable for this task, but later successes with statistical approaches suggest that sufficient amounts of data make it approachable. Using a comprehensive set of data for this thesis, we show that sets of words that differ in sentiment behave in a distinct fashion, suggesting that they are particularly amenable to statistical approaches. The second aspect of our research into choosing between related words or phrases that differ in sentiment is investigating whether or not including some global information about the entire text is useful in predicting word choice. We hypothesise that information about the sentiment of a document as a whole (for example, if the document is a movie review, whether it is favourable or not) will assist in choosing between closely related words that differ in sentiment. We demonstrate several models improving prediction of the correct word in context, incorporating information from the entire document, the most successful of which are metrics which account for distance from the target word. The third aspect is an investigation into human perceptions of word choice in a particular generation task - valence shifting - with the goal of changing an existing text so that it is similar in meaning, but more negative in tone. Existing work, which includes using hand-crafted vocabularies annotated with sentiment data, and corpus-derived cues, has found this to be a difficult problem. This work investigates both the success of establishing a more negative tone, and the resulting fluency of the text by asking human judges to evaluate both aspects of the text then explores possible metrics that can predict negativity for use in valence shifting.

History

Notes

"March 2013 This dissertation is presented for the degree of Doctor of Philosophy" Includes bibliographical references

Awarding Institution

Macquarie University

Degree Type

Thesis PhD

Degree

PhD, Macquarie University, Faculty of Science, Centre for Language Technology

Department, Centre or School

Centre for Language Technology

Year of Award

2013

Principal Supervisor

Mark Dras

Rights

Copyright disclaimer: http://www.copyright.mq.edu.au Copyright Mary E. Gardiner 2013.

Language

English

Extent

1 online resources (xvii, 217 pages)

Former Identifiers

mq:30561 http://hdl.handle.net/1959.14/285529 2135167

Usage metrics

Keywords

Synonyms Written communication Written communication -- Data processing Natural language processing Word choice

Licence

In Copyright

Natural language processing methods for attitudinal near-synonymy

History

Table of Contents

Notes

Awarding Institution

Degree Type

Degree

Department, Centre or School

Year of Award

Principal Supervisor

Rights

Language

Extent

Former Identifiers

Usage metrics

Categories

Keywords

Licence

Exports