posted on 2022-03-28, 02:30authored byAzin Ramezani
An important public health issue is the spread of diseases that could be prevented by changing individual beliefs and opinions about vaccination. Monitoring the spread of people’s opinions through online social communities may be helpful for those public health purposes. This thesis builds on work on identifying negative sentiment about human papillomavirus (HPV) vaccines on Twitter using word n-grams and direct social connection information as features in a Support Vector Machine (SVM) classifier. This thesis examines four extensions to this. First, biological models have suggested that negative opinion is transmitted contagiously; we incorporate this by adding indirect social connection information via label propagation. Second, topic models are used to infer topics associated with tweets to reduce the feature space dimensionality in classification. Third, the content of web pages that are referenced in tweets are used as new features for classification. Finally, label propagation is extended by adding more features beyond social connection information, such as n-grams, topics, and linked web pages contents. All these extensions improve classification results to some extent, with label propagation particularly effective for tweets sent in the same time period, and topic models across longer time periods.