Return to search

Towards a Holistic and Comparative Analysis of the Free Content Web: Security, Privacy, and Performance

Free content websites that provide free books, music, games, movies, etc., have existed on the Internet for many years. While it is a common belief that such websites might be different from premium websites providing the same content types in terms of their security, a rigorous analysis that supports this belief is lacking from the literature. In particular, it is unclear if those websites are as safe as their premium counterparts. In this dissertation, we set out to investigate the similarities and differences between free content and premium websites, including their risk profiles. Moreover, we analyze and quantify through measurements the potential vulnerability of free content websites. For this purpose, we compiled a dataset of free content websites offering books, games, movies, music, and software. For comparison purposes, we also sampled a dataset of premium content websites, where users need to pay for using the service for the same type of content. For our modality of analysis, we use the SSL certificate's public information, HTTP header information, reported privacy and data sharing practices, top-level domain information, and website files and loaded scripts. The analysis is not straightforward, and en route, we address various challenges, including labeling and annotation, privacy policy understanding through a highly accurate pre-trained language model using advanced ensemble-based classification technique at the sentence and paragraph level, and data augmentation through various sources. This dissertation delivers various significant findings and conclusions concerning the security of free content websites. Our findings raise several concerns, including that the reported privacy policies may not reflect the data collection practices used by service providers, and pronounced biases across privacy policy categories. Overall, our study highlights that while there are no explicit costs associated with those websites, the cost is often implicit, in the form of compromised security and privacy.
Date01 January 2023
CreatorsAlabduljabbar, Abdulrahman
Source SetsUniversity of Central Florida
Detected LanguageEnglish
SourceElectronic Theses and Dissertations, 2020-

Page generated in 0.0019 seconds