Real-time event detection from the Twitter datastream
thesisposted on 28.03.2022, 16:50 by Mahmud Hasan
Detecting events in real-time from the Twitter data stream has gained substantial attention in recent years from researchers around the world. We have performed a survey on different event detection systems and identified that one of the major challenges faced, while designing these systems, is the high volume of tweets in the Twitter stream which can incur a computationally prohibitive cost to detect events in real-time. As a solution to the problem, we have designed an end-to-end event detection framework, TwitterNews+, which incorporates a novel variant of an incremental clustering approach to provide alow computational cost and a scalable solution to detect newsworthy events in real-time from the Twitter data stream. We have conducted a parameter sensitivity analysis to fine-tune the parameters used in TwitterNews+ in order to improve its performance in detecting newsworthy events. We then performed an experimental evaluation of the effectiveness of TwitterNews+ against five state-of-the-art baselines that cover a wide range of event detection techniques. Theresults of the evaluation, performed on a publicly available tweet corpus, show that TwitterNews+ outperforms the baselines by achieving the highest recall and precision in detecting newsworthy events. Our experiments revealed that the number-of-tweets/second processing capability of TwitterNews+ is sufficiently high and thus, allows our system to achieve real-time event detection capability and scalability. Finally, we have incorporated a novel component in TwitterNews+, which can provide a set of context tweets for an event by extracting relevant additional information using theTwitter Search API. A probabilistic-feedback-based approach has been taken to maximize the relevancy of the context tweets associated with an event. The modular nature of the context providing component in TwitterNews+ allows it to be used by any event detection system to supplement the limited information often contained in an event.