Realisation ranking and word ordering in the context of lesser resourced languages
thesisposted on 2022-03-28, 15:13 authored by Yasaman Motazedi
Natural Language Generation (NLG) is a subfield of Language Technology, which concentrates on generating human-understandable sentences from machine-oriented input such as databases, knowledge bases or logical forms. This involves making many decisions on the surface representation of sentences, including lexical selection, use of referring expressions, and word order relying on manually constructedgrammars.In this thesis we are focusing on two subtasks of NLG: realisation ranking and word ordering . In this we are motivated by situations where there are few computational resources, and where we need to identify alternative resources and algorithms that guide the text generation task. We investigate two scenarios: one where there is a manual grammar that is involved in the language generation component, and one were there may not be.With respect to the first scenario, which often contain a component that ranks the output of the grammar based on system-internal representations, we look at finding alternative resources to extract ranking features. As the first step, we look into several supervised statistical parsers to see to what extent trees other than those from the NLG grammar are useful. The next step is moving to unsupervised parsing, a statistical method to generate a parse tree for any given sentence without involving any manual grammar or treebank training. Our findings shows that features from supervised parsers and also selected features from unsupervised parses can improve generation ranking.The second aspect is taking text generator internal structures as the ranking features and study how effective they are in ranking with respect to the grammar size.Finally, recognising whether features from external resources such as unsupervised parser can be coupled with text generator internal structure to rank generated text.Our experiments reflect that using internal features lead to better ranking than the baseline.The second scenario is performing partial-tree linearisation in the absence of manually crafted grammars. We introduce a novel Integer Linear Programming(ILP) framework on top of an existing graph-based lineariser as a declarative way of introducing linguistically motivated features into the generation process. This Framework accommodates various constraints independently and incrementally.As part of this, we present a novel application of machine learning methods for adjusting ILP parameters in order to improve the quality of generated strings.Comparing with the baseline systems in terms of the coverage and the quality of the generated text, our results prove the superiority of the ILP-based realiser when it applies linguistically motivated constraints over the learnt ILP parameters.