Realisation ranking and word ordering in the context of lesser resourced languages

Motazedi, Yasaman

doi:10.25949/19435181.v1

01whole.pdf (2.65 MB)

Realisation ranking and word ordering in the context of lesser resourced languages

thesis

posted on 2022-03-28, 15:13 authored by Yasaman Motazedi

Natural Language Generation (NLG) is a subfield of Language Technology, which concentrates on generating human-understandable sentences from machine-oriented input such as databases, knowledge bases or logical forms. This involves making many decisions on the surface representation of sentences, including lexical selection, use of referring expressions, and word order relying on manually constructedgrammars.In this thesis we are focusing on two subtasks of NLG: realisation ranking and word ordering . In this we are motivated by situations where there are few computational resources, and where we need to identify alternative resources and algorithms that guide the text generation task. We investigate two scenarios: one where there is a manual grammar that is involved in the language generation component, and one were there may not be.With respect to the first scenario, which often contain a component that ranks the output of the grammar based on system-internal representations, we look at finding alternative resources to extract ranking features. As the first step, we look into several supervised statistical parsers to see to what extent trees other than those from the NLG grammar are useful. The next step is moving to unsupervised parsing, a statistical method to generate a parse tree for any given sentence without involving any manual grammar or treebank training. Our findings shows that features from supervised parsers and also selected features from unsupervised parses can improve generation ranking.The second aspect is taking text generator internal structures as the ranking features and study how effective they are in ranking with respect to the grammar size.Finally, recognising whether features from external resources such as unsupervised parser can be coupled with text generator internal structure to rank generated text.Our experiments reflect that using internal features lead to better ranking than the baseline.The second scenario is performing partial-tree linearisation in the absence of manually crafted grammars. We introduce a novel Integer Linear Programming(ILP) framework on top of an existing graph-based lineariser as a declarative way of introducing linguistically motivated features into the generation process. This Framework accommodates various constraints independently and incrementally.As part of this, we present a novel application of machine learning methods for adjusting ILP parameters in order to improve the quality of generated strings.Comparing with the baseline systems in terms of the coverage and the quality of the generated text, our results prove the superiority of the ILP-based realiser when it applies linguistically motivated constraints over the learnt ILP parameters.

History

Notes

Theoretical thesis. Bibliography pages: 191-204

Awarding Institution

Macquarie University

Degree Type

Thesis PhD

Degree

PhD, Macquarie University, Faculty of Science and Engineering, Department of Computing

Department, Centre or School

Department of Computing

Year of Award

2017

Principal Supervisor

Mark Dras

Rights

Copyright Yasaman Motazedi 2017 Copyright disclaimer: http://mq.edu.au/library/copyright

Language

English

Extent

1 online resource (204 pages)

Former Identifiers

mq:71766 http://hdl.handle.net/1959.14/1277885

Usage metrics

Keywords

Natural language processing (Computer science)Parsing (Computer grammar)dependency spanning tree string regeneration Margin Infused Relaxed Algorithm (MIRA)realisation ranking Computational linguistics unsupervised parsing

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Realisation ranking and word ordering in the context of lesser resourced languages

History

Table of Contents

Notes

Awarding Institution

Degree Type

Degree

Department, Centre or School

Year of Award

Principal Supervisor

Rights

Language

Extent

Former Identifiers

Usage metrics

Categories

Keywords

Licence

Exports