Understanding predictive coding during language comprehension by comparing computational linguistic model with the MEG data from adult humans
Language comprehension is an important task in the daily activities of human beings. It relies on the ability of human brain to correctly process the meanings of words and phrases, grammar, and text structure. It is believed that predictive coding is a fundamental mechanism of language comprehension in humans. Predictive coding is a theory of cognition in which the brain is constantly generating and updating a model of sensory input based on surprisal. Surprisal, a concept in psycholinguistics, refers to the degree of mismatch between what a human listener expected to be said next and what is really spoken. In this work, I aim to study prediction during language comprehension in humans by developing an n-gram based computational linguistic model of prediction from the text data of naturalistic speech stimuli, and test how well this model correlates with the language prediction related magnetoencephalography (MEG) signals of human brain. I used a MEG database of 16 adult humans, who were presented with audio stimuli (ABC podcasts). For the training of my n-gram model, I developed a domain-specific corpus using science and environment related articles from various web sources and online newspapers. I used the transcripts of my audio stimuli to test my n-gram model and calculated per-word surprisal. I then performed simple linear regression to find a relationship between per-word surprisal values and MEG data. Second level group statistics performed on the predictor parameter of our regression model showed significant correlation between surprisal and MEG signals between 140 and 320 milliseconds after word onset. These results are consistent with the contention that the brain responds to violation of predictions in a graded manner during the comprehension of speech stimuli. Future work will include the development of a larger domain-specific corpus and use computational linguistic models based on deep learning neural networks.