Characterisation and detections of third-party content loading in the web
thesisposted on 28.03.2022, 19:24 authored by Hasina Rahman
The Web has evolved into a tangled mass of interconnected services within the last two decades,where websites import resources i.e. data or contents from third-party domains. These domains serve several purposes including analytics, tracking and advertisement. Websites trust their third parties for resources in the process of loading contents or data to their web pages. The dependency of resources sometimes extend further from third-party domains to other domains thus fabricating a chain of dependency. In the resource dependency chain, the first party websites are indeed trusting resources obtained by their direct third-parties through requests to other domains. The chain of dependency cannot be rigidly controlled by the first-party websites as they have very scarce or no information of where the loaded content have originated from. Since this is the case, the websites even end up trusting compromised websites for contents unknowingly and become prone to multifarious attacks. We characterize the implicit trust in the chain of dependency for Alexa's top 30k websites and estimate the level of risks that first-party websites may be venturing while loading resources from thirty-party domains. We found that 10.55% of the resources of top-1000 Alexa websites are obtained implicitly and that they constitute 4.1% of malicious resources in the overall count of external resources.