Identifying causal directions from text: unsupervised learning using Bayesian framework
Causality becomes increasingly important due to advances in artificial intelligence and machine learning, which demand robust interpretability and accountability. This significance is further underscored in the era of Large Language Models, like ChatGPT, where we witness achievements surpassing human performance in contextual understanding and the execution of text-related tasks. Even technology advancements, the importance of comprehending the data generation process that underlies causal directions becomes more evident, which can have valuable implications for various downstream tasks. In this project, we show empirically that word occurrences resemble the characteristics of causal directions. To achieve this, we determine a causal direction if its causal relation exists in the sentence. Identifying causal directions can add tremendous benefits into understanding semantics of the document. Firstly, knowing which entity is a cause and which one is an effect will help people understand complex phenomena more clearly. Secondly, being able to predict a likely outcome of certain events will help people make a better decision. Finally, many phenomena are difficult to comprehend in text. If Question Answering can extract causal relations and summarise them as causal directions, it will help people understand concepts easily. Nevertheless, there are two main challenges when identifying causal directions. Firstly, causal relations are few and far between in a document. Secondly, implicit causal relations make the task difficult. Hence, we propose a two-phase method: 1. Bayesian framework, which generates data from posteriors by incorporating word occurrences from the Internet’s domains. 2. Bidirectional Encoder Representations from Transformers (BERT), which utilises semantics of words based on the context to perform classification. We have two scenarios. 1. Data augmentation, where word occurrences are integrated to expand the training data of the SemEval-2010 (Task 8) dataset. 2. Unsupervised learning, where no training data is provided. In this scenario, the proposed method learns from word occurrences. In data augmentation, the proposed method boosts an F1 score ever slightly compared with BERT, 94.34% versus 94.08%, but the difference is not statistically significant. BERT is used as the baseline. In unsupervised learning, where BERT or any other supervised methods cannot be used as a baseline, the proposed method performs significantly better than random guessing, achieving an F1 score of 49.10% versus 44.90%. Random guessing serves as the baseline for comparison. The study we carry out serves as a basis when we extend the proposed method to construct a network to capture multiple causal relations in future work.