Learning with joint inference and latent linguistic structure in graphical models
thesisposted on 29.03.2022, 00:14 by Jason Naradowsky
A human listener, charged with the difficult task of mapping language to meaning, must infer a rich hierarchy of linguistic structures, beginning with an utterance and culminating in an understanding of what was spoken. Much in the same manner, developing complete natural language processing systems requires the processing of many different layers of linguistic information in order to solve complex tasks, like answering a query or translating a document. Historically the community has largely adopted a “divide and conquer” strategy, choosing to split up such complex tasks into smaller fragments which can be tackled independently, with the hope that these smaller contributions will also yield benefits to NLP systems as a whole. These individual components can be laid out in a pipeline and processed in turn, one system’s output becoming input for the next. This approach poses two problems. First, errors propagate, and, much like the childhood game of “telephone”, combining systems in this manner can lead to unintelligible outcomes. Second, each component task requires annotated training data to act as supervision for training the model. These annotations are often expensive and time-consuming to produce, may differ from each other in genre and style, and may not match the intended application. In this dissertation we pursue novel extensions of joint inference techniques for natural language processing (NLP). We present a framework that offers a general method for constructing and performing inference using graphical model formulations of typical NLP problems. Models are composed using weighted Boolean logic constraints, inference is performed using belief propagation. The systems we develop are composed of two parts: one a representation of syntax, the other a desired end task (part-of-speech tagging, semantic role labeling, named entity recognition, or relation extraction). By modeling these problems jointly, both models are trained in a single, integrated process, with uncertainty propagated between them. This mitigates the accumulation of errors typical of pipelined approaches. We further advance previous methods for performing efficient inference on graphical model representations of combinatorial structure, like dependency syntax, extending it to various forms of phrase structure parsing. Finding appropriate training data is a crucial problem for joint inference models. We observe that in many circumstances, the output of earlier components of the pipeline is often irrelevant – only the end task output is important. Yet we often have strong a priori assumptions regarding what this structure might look like: for phrase structure syntax the model should represent a valid tree, for dependency syntax it should represent a directed graph. We propose a novel marginalization based training method in which the error signal from end task annotations is used to guide the induction of a constrained latent syntactic representation. This allows training in the absence of syntactic training data, where the latent syntactic structure is instead optimized to best support the end task predictions. We find that across many NLP tasks this training method offers performance comparable to fully supervised training of each individual component, and in some instances improves upon it by learning latent structures which are more appropriate for the task.