Macquarie University
01whole.pdf (2.65 MB)

Systemic-functional modeling of text complexity in Brazilian Portuguese

Download (2.65 MB)
posted on 2024-03-20, 01:02 authored by Rodrigo Araújo e Castro

Investigating text complexity is a significant step towards modeling text simplification tasks. In the last two decades, studies in Natural Language Processing (NLP) have attempted to discover efficient simplification strategies. Although some attempts to address this issue with the construction of computer models based on language theories have provided potentially valuable insights, they remain insufficient to effectively deal with the task. Purporting to fill this gap and drawing on a comprehensive theory of language -- Systemic Functional Linguistics (SFL) (Halliday & Matthiessen, 2014) --, this thesis explores text complexity with a view to gathering findings that may inform text simplification tasks aimed to produce more accessible texts in Brazilian Portuguese. To that end, SIM-Pt (Simplified Brazilian Portuguese), a monolingual parallel corpus of aligned text segments in the physics, biology, and psychology domains, was compiled. Text segments were organized into two paired datasets: (1) two sets of naturally occurring segments, made up of, respectively, simpler and more complex segments extracted from science texts found on the Web; and (2) two sets of manually constructed segments based on the naturally occurring segments, ensuring distinct complexity levels. Each set contains approximately 200 text segments. Clauses in segments were manually analyzed in terms of Ideational, Interpersonal, and Textual meanings, and lexicogrammatical patterns were obtained on the basis of systemic and structural frequencies that could yield variables closely related to different levels of grammatical metaphor. By examining text complexity within the strata of Lexicogrammar, Semantics, and Context, we proposed a relationship between text complexity and experiential grammatical metaphor. The results show that, from the experiential viewpoint, a higher degree of experiential grammatical metaphor on average correlates with higher text complexity. The main pieces of evidence supporting this claim from the perspective of lexicogrammar were the higher frequency of relational and existential clauses in combination with middle voice and embedded clauses and the higher frequency of class shifts (especially nominalizations) and rank shifts. The findings of this thesis are expected to contribute to text simplification accounts for Brazilian Portuguese in both applied linguistics and NLP. 


Table of Contents

Chapter 1. Introduction -- Chapter 2. Theoretical framework -- Chapter 3. Methodology -- Chapter 4. Findings -- Chapter 5. Discussion -- Chapter 6. Conclusions -- References -- Appendices


Cotutelle thesis in conjunction with Federal University of Minas Gerais

Awarding Institution

Macquarie University

Degree Type

Thesis PhD


Doctor of Philosophy

Department, Centre or School

Department of Linguistics

Year of Award


Principal Supervisor

David Butt

Additional Supervisor 1

Adriana Pagano

Additional Supervisor 2

Ilka Afonso Reis


Copyright: The Author Copyright disclaimer:




375 pages

Usage metrics

    Macquarie University Theses


    Ref. manager