abstract: Data breaches have been on a rise and financial sector is among the top targeted. It can take a few months and upto a few years to identify the occurrence of a data breach. A major motivation behind data breaches is financial gain, hence most of the data ends up being on sale on the darkweb websites. It is important to identify sale of such stolen information on a timely and relevant manner. In this research, we present a system for timely identification of sale of stolen data on darkweb websites. We frame identifying sale of stolen data as a multi-label classification problem and leverage several machine learning approaches based on the thread content (textual) and social network analysis of the user communication seen on darkweb websites. The system generates alerts about trends based on popularity amongst the users of such websites. We evaluate our system using the K-fold cross validation as well as manual evaluation of blind (unseen) data. The method of combining social network and textual features outperforms baseline method i.e only using textual features, by 15 to 20 % improved precision. The alerts provide a good insight and we illustrate our findings by cases studies of the results. / Dissertation/Thesis / Masters Thesis Computer Science 2018
Identifer | oai:union.ndltd.org:asu.edu/item:49147 |
Date | January 2018 |
Contributors | Dharaiya, Krishna Tushar (Author), Shakarian, Paulo (Advisor), Doupe, Adam (Committee member), Shoshitaishvili, Yan (Committee member), Arizona State University (Publisher) |
Source Sets | Arizona State University |
Language | English |
Detected Language | English |
Type | Masters Thesis |
Format | 46 pages |
Rights | http://rightsstatements.org/vocab/InC/1.0/, All Rights Reserved |
Page generated in 0.0016 seconds