Global ETD Search

1	Detecting Bots using Stream-based System with Data Synthesis Hu, Tianrui 28 May 2020 (has links) Machine learning has shown great success in building security applications including bot detection. However, many machine learning models are difficult to deploy since model training requires the continuous supply of representative labeled data, which are expensive and time-consuming to obtain in practice. In this thesis, we build a bot detection system with a data synthesis method to explore detecting bots with limited data to address this problem. We collected the network traffic from 3 online services in three different months within a year (23 million network requests). We develop a novel stream-based feature encoding scheme to support our model to perform real-time bot detection on anonymized network data. We propose a data synthesis method to synthesize unseen (or future) bot behavior distributions to enable our system to detect bots with extremely limited labeled data. The synthesis method is distribution-aware, using two different generators in a Generative Adversarial Network to synthesize data for the clustered regions and the outlier regions in the feature space. We evaluate this idea and show our method can train a model that outperforms existing methods with only 1% of the labeled data. We show that data synthesis also improves the model's sustainability over time and speeds up the retraining. Finally, we compare data synthesis and adversarial retraining and show they can work complementary with each other to improve the model generalizability. / Master of Science / An internet bot is a computer-controlled software performing simple and automated tasks over the internet. Although some bots are legitimate, many bots are operated to perform malicious behaviors causing severe security and privacy issues. To address this problem, machine learning (ML) models that have shown great success in building security applications are widely used in detecting bots since they can identify hidden patterns learning from data. However, many ML-based approaches are difficult to deploy since model training requires labeled data, which are expensive and time-consuming to obtain in practice, especially for security tasks. Meanwhile, the dynamic-changing nature of malicious bots means bot detection models need the continuous supply of representative labeled data to keep the models up-to-date, which makes bot detection more challenging. In this thesis, we build an ML-based bot detection system to detect advanced malicious bots in real-time by processing network traffic data. We explore using a data synthesis method to detect bots with limited training data to address the limited and unrepresentative labeled data problem. Our proposed data synthesis method synthesizes unseen (or future) bot behavior distributions to enable our system to detect bots with extremely limited labeled data. We evaluate our approach using real-world datasets we collected and show that our model outperforms existing methods using only 1% of the labeled data. We show that data synthesis also improves the model's sustainability over time and helps to keep it up-to-date easier. Finally, we show that our method can work complementary with adversarial retraining to improve the model generalizability. Bot Detection Security Machine learning
2	Robustifying Machine Learning based Security Applications Jan, Steve T. K. 27 August 2020 (has links) In recent years, machine learning (ML) has been explored and employed in many fields. However, there are growing concerns about the robustness of machine learning models. These concerns are further amplified in security-critical applications — attackers can manipulate the inputs (i.e., adversarial examples) to cause machine learning models to make a mistake, and it's very challenging to obtain a large amount of attackers' data. These make applying machine learning in security-critical applications difficult. In this dissertation, we present several approaches to robustifying three machine learning based security applications. First, we start from adversarial examples in image recognition. We develop a method to generate robust adversarial examples that remain effective in the physical domain. Our core idea is to use an image-to-image translation network to simulate the digital-to-physical transformation process for generating robust adversarial examples. We further show these robust adversarial examples can improve the robustness of machine learning models by adversarial retraining. The second application is bot detection. We show that the performance of existing machine learning models is not effective if we only have the limit attackers' data. We develop a data synthesis method to address this problem. The key novelty is that our method is distribution aware synthesis, using two different generators in a Generative Adversarial Network to synthesize data for the clustered regions and the outlier regions in the feature space. We show the detection performance using 1% of attackers' data is close to existing methods trained with 100% of the attackers' data. The third component of this dissertation is phishing detection. By designing a novel measurement system, we search and detect phishing websites that adopt evasion techniques not only at the page content level but also at the web domain level. The key novelty is that our system is built on the observation of the evasive behaviors of phishing pages in practice. We also study how existing browsers defenses against phishing websites that impersonate trusted entities at the web domain. Our results show existing browsers are not yet effective to detect them. / Doctor of Philosophy / Machine learning (ML) is computer algorithms that aim to identify hidden patterns from the data. In recent years, machine learning has been widely used in many fields. The range of them is broad, from natural language to autonomous driving. However, there are growing concerns about the robustness of machine learning models. And these concerns are further amplified in security-critical applications — Attackers can manipulate their inputs (i.e., adversarial examples) to cause machine learning models to predict wrong, and it's highly expensive and difficult to obtain a huge amount of attackers' data because attackers are rare compared to the normal users. These make applying machine learning in security-critical applications concerning. In this dissertation, we seek to build better defenses in three types of machine learning based security applications. The first one is image recognition, by developing a method to generate realistic adversarial examples, the machine learning models are more robust for defending against adversarial examples by adversarial retraining. The second one is bot detection, we develop a data synthesis method to detect malicious bots when we only have the limit malicious bots data. For phishing websites, we implement a tool to detect domain name impersonation and detect phishing pages using dynamic and static analysis. Machine learning Security Bot Detection Phishing Attacks
3	Types of Bots: Categorization of Accounts Using Unsupervised Machine Learning January 2019 (has links) abstract: Social media bot detection has been a signature challenge in recent years in online social networks. Many scholars agree that the bot detection problem has become an "arms race" between malicious actors, who seek to create bots to influence opinion on these networks, and the social media platforms to remove these accounts. Despite this acknowledged issue, bot presence continues to remain on social media networks. So, it has now become necessary to monitor different bots over time to identify changes in their activities or domain. Since monitoring individual accounts is not feasible, because the bots may get suspended or deleted, bots should be observed in smaller groups, based on their characteristics, as types. Yet, most of the existing research on social media bot detection is focused on labeling bot accounts by only distinguishing them from human accounts and may ignore differences between individual bot accounts. The consideration of these bots' types may be the best solution for researchers and social media companies alike as it is in both of their best interests to study these types separately. However, up until this point, bot categorization has only been theorized or done manually. Thus, the goal of this research is to automate this process of grouping bots by their respective types. To accomplish this goal, the author experimentally demonstrates that it is possible to use unsupervised machine learning to categorize bots into types based on the proposed typology by creating an aggregated dataset, subsequent to determining that the accounts within are bots, and utilizing an existing typology for bots. Having the ability to differentiate between types of bots automatically will allow social media experts to analyze bot activity, from a new perspective, on a more granular level. This way, researchers can identify patterns related to a given bot type's behaviors over time and determine if certain detection methods are more viable for that type. / Dissertation/Thesis / Presentation Materials for Thesis Defense / Masters Thesis Computer Science 2019 Computer science Bot Detection Bot Typology Categorization of Bots Twitter Bots Types of Bots Unsupervised Bot Detection
4	A method to identify Record and Replay bots on mobile applications using Behaviometrics Kolluru, Katyayani Kiranmayee January 2017 (has links) Many banking and commerce mobile applications use two-factor authentication for userauthentication purposes which include both password and behavioral based authenticationsystems. These behavioral based authentication systems use different behavioral parametersrelated to typing behavior of the user and the way user handles the phone while typing. Theydistinguish users and impostors using machine learning techniques (mostly supervised learningtechniques) on these behavioral data. Both password and behavior based systems work well indetecting imposters on mobile applications, but they can suffer from record and replay attackswhere the touch related information of the user actions is recorded and replayedprogrammatically. These are called as Record & Replay (R & R) bots. The effectiveness ofbehavioral authentication systems in identifying such attacks is unexplored. The current thesiswork tries to address this problem by developing a method to identify R & R bots on mobileapplications. In this work, behavioral data from users and corresponding R & R bot is collectedand it is observed that the touch information (location of touch on the screen, touch pressure,area of finger in contact with screen) is exactly replayed by the bot. However, sensorinformation seemed to be different in the case of user and corresponding R & R bot where thephysical touch action misses while replaying user actions on the mobile application. Based onthis observation, a feature set is extracted from the sensor data that can be used to differentiateusers from bots and a dataset is formed which contains the data corresponding to these featuresfrom both users and bots. Two machine learning techniques namely support vector machines(SVM) and logistic regression (LR) are applied on the training dataset (80% of the dataset) tobuild classifiers. The two classifiers built using the training dataset are able to classify user andbot sessions accurately in the test dataset (20% of the dataset) based on the feature set derivedfrom the sensor data. Behaviometrics Machine learning bot detection Engineering and Technology Teknik och teknologier
5	Botnet Detection Using Graph Based Feature Clustering Akula, Ravi Kiran 04 May 2018 (has links) Detecting botnets in a network is crucial because bot-activities impact numerous areas such as security, finance, health care, and law enforcement. Most existing rule and flow-based detection methods may not be capable of detecting bot-activities in an efficient manner. Hence, designing a robust botnet-detection method is of high significance. In this study, we propose a botnet-detection methodology based on graph-based features. Self-Organizing Map is applied to establish the clusters of nodes in the network based on these features. Our method is capable of isolating bots in small clusters while containing most normal nodes in the big-clusters. A filtering procedure is also developed to further enhance the algorithm efficiency by removing inactive nodes from bot detection. The methodology is verified using real-world CTU-13 and ISCX botnet datasets and benchmarked against classification-based detection methods. The results show that our proposed method can efficiently detect the bots despite their varying behaviors. Cyber sequrity graph based features bot detection Clustering
6	Detection and simulation of generic botnet from real-life large netflow dataset Harun, Sarah 09 August 2019 (has links) Botnets are networks formed with a number of machines infected by malware called bots. Detection of these malicious networks is a major concern as they pose a serious threat to network security. Most of the research on botnet detection is based on particular botnet characteristics which fail to detect other types of botnet. There exist several generic botnet detection methods that can detect varieties of botnets. But, these generic detection methods perform very poorly in real-life dataset as the methods are not developed based on a real-life botnet dataset. A crucial reason for those detection methods not being developed based on a real-life dataset is that there is a scarcity of large-scale real-life botnet dataset. Due to security and privacy concerns, organizations do not publish their real-life botnet dataset. Therefore, there is a dire need for a simulation methodology that generates a large-scale botnet dataset similar to the original real-life dataset while preserving the security and privacy of the network. In this dissertation, we develop a generic bot detection methodology that can detect a variety of bots and evaluate the methodology in a real-life, large, highly class-imbalanced dataset. Numerical results show that our methodology can detect bots more accurately than the existing methods. Realizing the need for real-life large-scale botnet dataset, we develop a simulation methodology to simulate a large-scale botnet dataset from a real-life botnet dataset. Our simulation methodology is based on Markov chain and role–mining process that can simulate the degree distributions along with triangles (community structures). To scale-up the original graph to large-scale graph, we also propose a scaling-up algorithm, Enterprise connection algorithm. We evaluate our simulated graph by comparing with the original graph as well as with the graph generated by Preferential attachment algorithm. Comparisons are done in the following three major categories: comparison of botnet subgraphs, comparison of overall graphs and comparison of scaled-up graphs. Result demonstrates that our methodology outperform Preferential attachment algorithm in simulating the triangle distributions and the botnet structure. Botnet detection Bot detection graph simulation large-scale
7	Hiding Behind Cards: Identifying Bots and Humans in Online Poker Altman, Benjamin 07 May 2013 (has links) As online gaming becomes more popular, it has also become increasingly important to identify and remove those who leverage automated player systems (bots). Manual bot detection depends on the ability of game administrators to differentiate between bots and normal players. The objective of this thesis was to determine whether expert poker players can differentiate between bot and human players in Texas Hold ‘Em Poker. Participants were deceived into thinking a number of bots and humans were playing in gameplay videos and asked to rate player botness and skill. Results showed that participants made similar observations about player behaviour, yet used these observations to reach differing conclusions about whether a given player was a bot or a human. These results cast doubt on the reliability of manual bot detection systems for online poker, yet also show that experts agree on what constitutes skilled play within such an environment. poker bots bot detection bot vs human computer player game bots bot gameplay
8	Language IndependentDetector for Auto GeneratedTweets Valipour, Saeideh January 2020 (has links) The cross-disciplinary Nordic Tweet Stream (NTS) is a project aiming at creating a multilingual text corpus consisting of tweets published in the five Nordic countries. The NTS linguists are explicitly interested in tweets having a text formulated by a human where each tweet is a personal statement, not in Tweets generated by bots and other programs or apps since they might skew the results. NTS consists of multiple parts and the part we are responsible for is a language-independent approach, using supervised machine learning, to classify every single tweet as auto-generated (AGT) or human-generated (HGT). The objective of this study is to increase data accuracy in sociolinguistic studies that utilize Twitter by reducing skewed sampling and inaccuracies in linguistic data. We define an AGT as a tweet where all or parts of the natural language content are generated automatically by a bot or other type of program. In other words, while AGT/HGT refers to an individual message, the term bot refers to nonpersonal and automated accounts that post content to online social networks. Our approach classifies a tweet using only metadata that comes with every tweet, and we utilize those metadata parameters that are both language and country independent. The empirical part shows that our results show poor success rates when it comes to unseen data. Using a bilingual training set of two languages tweets, we correctly classified only about 60-70% of all tweets in a test set using a third new language, which is still better than nothing, but probably not good enough to be used (as is) in a real-world scenario to identify AGTs in a given set of multilingual tweets. Twitter Machine Learning Classification Bot Detection Social networks Computer Sciences Datavetenskap (datalogi)
9	Bezpečnostní systém pro eliminaci útoků na webové aplikace / Security System for Web Application Attacks Elimination Vašek, Dominik January 2021 (has links) Nowadays, botnet attacks that aim to overwhelm the network layer by malformed packets and other means are usually mitigated by hardware intrusion detection systems. Application layer botnet attacks, on the other hand, are still a problem. In case of web applications, these attacks contain legitimate traffic that needs to be processed. If enough bots partake in this attack, it can lead to inaccessibility of services provided and other problems, which in turn can lead to financial loss. In this thesis, we propose a detection and mitigation system that can detect botnet attacks in realtime using statistical approach. This system is divided into several modules that together cooperate on the detection and mitigation. These parts can be further expanded. During the testing phase, the system was able to capture approximately 60% of botnet attacks that often focused on spam, login attacks and also DDoS. The number of false positive addresses is below 5%.
10	Some Improvements to Social Authentication and Bot Detection and Their Applications in IoT Krzciok, Jacob James 19 April 2023 (has links) No description available. Computer Science IoT Social Authentication Bot Detection Trustee-based social authentication

Search results