1 |
Effective Phishing Detection Using Machine Learning ApproachYaokai, Yang 01 February 2019 (has links)
No description available.
|
2 |
Deriving classifiers with single and multi-label rules using new Associative Classification methodsAbdelhamid, Neda January 2013 (has links)
Associative Classification (AC) in data mining is a rule based approach that uses association rule techniques to construct accurate classification systems (classifiers). The majority of existing AC algorithms extract one class per rule and ignore other class labels even when they have large data representation. Thus, extending current AC algorithms to find and extract multi-label rules is promising research direction since new hidden knowledge is revealed for decision makers. Furthermore, the exponential growth of rules in AC has been investigated in this thesis aiming to minimise the number of candidate rules, and therefore reducing the classifier size so end-user can easily exploit and maintain it. Moreover, an investigation to both rule ranking and test data classification steps have been conducted in order to improve the performance of AC algorithms in regards to predictive accuracy. Overall, this thesis investigates different problems related to AC not limited to the ones listed above, and the results are new AC algorithms that devise single and multi-label rules from different applications data sets, together with comprehensive experimental results. To be exact, the first algorithm proposed named Multi-class Associative Classifier (MAC): This algorithm derives classifiers where each rule is connected with a single class from a training data set. MAC enhanced the rule discovery, rule ranking, rule filtering and classification of test data in AC. The second algorithm proposed is called Multi-label Classifier based Associative Classification (MCAC) that adds on MAC a novel rule discovery method which discovers multi-label rules from single label data without learning from parts of the training data set. These rules denote vital information ignored by most current AC algorithms which benefit both the end-user and the classifier's predictive accuracy. Lastly, the vital problem related to web threats called 'website phishing detection' was deeply investigated where a technical solution based on AC has been introduced in Chapter 6. Particularly, we were able to detect new type of knowledge and enhance the detection rate with respect to error rate using our proposed algorithms and against a large collected phishing data set. Thorough experimental tests utilising large numbers of University of California Irvine (UCI) data sets and a variety of real application data collections related to website classification and trainer timetabling problems reveal that MAC and MCAC generates better quality classifiers if compared with other AC and rule based algorithms with respect to various evaluation measures, i.e. error rate, Label-Weight, Any-Label, number of rules, etc. This is mainly due to the different improvements related to rule discovery, rule filtering, rule sorting, classification step, and more importantly the new type of knowledge associated with the proposed algorithms. Most chapters in this thesis have been disseminated or under review in journals and refereed conference proceedings.
|
3 |
Detecção de Phishing no Twitter Baseada em Algoritmos de Aprendizagem OnlineBarbosa, Haline Pereira de Oliveira, 5592991791259 03 April 2018 (has links)
Submitted by Haline Barbosa (halinebarbosa@icomp.ufam.edu.br) on 2018-11-23T12:40:23Z
No. of bitstreams: 2
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5)
HalinePereiradeOliveiraBarbosa.pdf: 2143170 bytes, checksum: ff7bf1fb1f0781cd5558c12bc7cba05a (MD5) / Approved for entry into archive by Secretaria PPGI (secretariappgi@icomp.ufam.edu.br) on 2018-11-23T14:34:32Z (GMT) No. of bitstreams: 2
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5)
HalinePereiradeOliveiraBarbosa.pdf: 2143170 bytes, checksum: ff7bf1fb1f0781cd5558c12bc7cba05a (MD5) / Approved for entry into archive by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2018-11-23T18:24:02Z (GMT) No. of bitstreams: 2
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5)
HalinePereiradeOliveiraBarbosa.pdf: 2143170 bytes, checksum: ff7bf1fb1f0781cd5558c12bc7cba05a (MD5) / Made available in DSpace on 2018-11-23T18:24:02Z (GMT). No. of bitstreams: 2
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5)
HalinePereiradeOliveiraBarbosa.pdf: 2143170 bytes, checksum: ff7bf1fb1f0781cd5558c12bc7cba05a (MD5)
Previous issue date: 2018-04-03 / Twitter is one of the most used social networks in the world with about 328 million users sharing images, videos, texts and links. Due to the restrictions on message size it is common for tweets to share shortened links to websites, making it impossible to visually identify the URL before knowing what will be displayed. Faced with this scenario, Twitter becomes a means of spreading phishing attacks through malicious links. Phishing is an attack that seeks to obtain personal information like name, CPF, passwords, number of bank accounts and numbers of credit cards. Twitter phishing attack detection systems are usually built using off-line supervised machine learning, where a large amount of data is examined once to induce a single static prediction model. In these systems, the incorporation of new data requires the reconstruction of the prediction model from the processing of the entire database, making this process slow and inefficient. In this work we propose a framework to detect phishing in Twitter. The framework uses supervised online learning, that is, the classifier is updated with each processed tweet and, if it makes a wrong prediction, the model is updated by adapting quickly to the changes with low computational cost, time and maintaining its efficiency in the task of ranking. For this study we evaluated the performance of the online learning algorithms Adaptive Random Forest, Hoeffding Tree, Naive Bayes, Perceptron and Stochastic Gradient Descent. The online Adaptive Random Forest classifier presented 99.8% prequential accuracy in the classification of phishing tweets. / O Twitter é uma das redes sociais mais utilizadas no mundo com cerca de centenas de milhões de usuários compartilhando imagens, vídeos, textos e links. Devido às restrições impostas no tamanho das mensagens é comum que os tweets compartilhem links encurtados para websites impossibilitando a identificação visual prévia da URL antes de saber o que será exibido. Tal problema tornou o Twitter um dos principais meios de disseminação de ataques de phishing através de links maliciosos. Phishing é um ataque que visa obter informações pessoais como nomes, senhas, números de contas bancárias e de cartões de crédito. Em geral, os sistemas de detecção de ataques de phishing projetados para o Twitter são construídos com base em modelos de classificação off-line. Em tais sistemas, um grande volume de dados é examinado uma única vez para induzir em um único modelo de predição estático. Nesses sistemas, a incorporação de novos dados requer a reconstrução do modelo de previsão a partir do processamento de toda a base de dados, tornando esse processo lento e ineficiente. Para solucionar este problema, este trabalho propõe um framework de detecção de phishing no Twitter. O framework utiliza aprendizagem online supervisionada, ou seja, o classificador é atualizado a cada tweet processado e, caso este realize uma predição errada, o modelo é atualizado se adaptando rapidamente às mudanças com baixo custo computacional, tempo e mantendo a sua eficiência na tarefa de classificação. Para este estudo avaliamos o desempenho dos algoritmos de aprendizagem online Adaptive Random Forest, Hoeffding Tree, Naive Bayes, Perceptron e Stochastic Gradient Descent. O classificador online Adaptive Random Forest apresentou acurácia prequential 99,8%, na classificação de tweets de phishing.
|
4 |
Characterization of phishing website characteristics / Karakterisering av egenskaper hos phishing domänerKarlström, Axel, Kihlberg Gawell, Elsa January 2022 (has links)
The occurrence of phishing domains are increasing continuously as attackers are able to make use of tool kits that creates the phishing websites for them. When knowledge in web development is no longer needed, anyone can perform a phishing attack and existing detection methods can not seem to keep up. Finding new techniques to identify these malicious domains are crucial to protect the potential victims visiting the website. Many of the existing methods are focusing on the visual appearance of the websites. This thesis choose to focus on the underlying structure instead. By collecting data on style sheets and certificates from both verified phishing domains and benign domains, datasets were created for both types of domains. Using a token-based similarity algorithm on the collected style sheet data, subsets were created based on style sheet similarity. Our analysis were focused on three main parts of the results, the characteristics of phishing domains compared to benign domains, the created subsets based on style sheet similarities and the matching style sheets in two of the subsets. The characteristics of the phishing domains were for the most part rather different compared to the benign domains, except for similarities found in the data on the style sheets. The created subsets using style sheet similarities where grouped into three datasets based on the amount of matching style sheets. The three datasets, despite originating from the same dataset, proved to have distinct differences in characteristics. From the two chosen subsets, one of the subsets contained style sheets indicating the domains in the subset were created by a phishing kit. We conclude that a method based on structural similarities to identify both phishing kits and phishing domains is possible to implement. Our methodology shows the possibilities of this method, but further development and research are required to make it reliable.
|
5 |
Emploging and improving machinelearning of detection of Phishing URLsYaitskyi, Andrii January 2022 (has links)
Background: Phishing is one type of the social engineering techniques to fool users by pretending tobe a trusted person and stealing users personal data. Quite often, Phishing spreads to email services, and browsers are not always able to block Phishing URLs. The problem of Phishing continues to exist and does not decrease, so there are still issues in this problem that need to be addressed. Objectives: The object of research is the method of processing and detecting Phishing URLs. This study is intended to conduct a study to identify the possible assumptions for the method of automating the processing and detection of Phishing URLs, as well as to find out how the efficiency can be improved, and the detection of Phishing URLs, in addition, this study is also intended to understand which of machine learning algorithms are best suited for detecting Phishing URLs. Methods: In this study, the method of machine learning is used, a study was also carried out, on the basis of which it was decided that these data are not enough and that a better result could be achieved if more efficient methods were used. Therefore, in this case, it was decided to use the machine learning method, and aquantitative study was carried out to understand which machine learning algorithm is better to use in furtherwork.The subject of research - methods and means of processing and detecting Phishing URLs. Also, the research methods in this study, is analysis, observation, modeling, and experimental research Results: The result shows a higher percentage compared to the algorithm comparison. Also, the result shows that the automation procedure has been achieved, and the accuracy of Phishing URL detection hasimproved a lot, showing an accuracy of 98.417%. Compared to manual analysis of Phishing URLs, and otheralgorithms, this is a better result. Conclusions: There are some challenges in handling Phishing URLs as well as efficiency and betterdetection. However, further research is needed in this case to find out how to further improve the detection of Phishing URLs.
|
6 |
Analyse du DNS et analyse sémantique pour la détection de l'hameçonnage / DNS and semantic analysis for phishing detectionMarchal, Samuel 22 June 2015 (has links)
L’hameçonnage est une escroquerie moderne qui cible les utilisateurs de communications électroniques et vise à les convaincre de réaliser des actions pour le bénéfice d’un individu nommé hameçonneur. Les attaques d’hameçonnage s’appuient essentiellement sur de l’ingénierie sociale et la plupart de ces attaques utilisent des liens représentés par des noms de domaine et des URLs. Nous proposons donc dans cette thèse de nouvelles solutions, reposant sur une analyse lexicale et sémantique de la composition des noms de domaine et des URLs, pour combattre l’hameçonnage. Ces deux types de pointeurs sont créés et offusqués par les hameçonneurs pour piéger leurs victimes. Ainsi, nous démontrons que les noms de domaine et les URLs utilisés dans des attaques d’hameçonnage présentent des similitudes dans leur composition lexicale et sémantique, et que celles-ci sont différentes des caractéristiques présentées par les noms de domaine et les URL légitimes. Nous utilisons ces caractéristiques pour construire des modèles représentant la composition des URLs et des noms de domaine d’hameçonnage en utilisant des techniques d’apprentissage automatique et des méthodes de traitement du langage naturel. Les modèles construits sont utilisés pour des applications telles que l’identification de noms de domaine et des URLs d’hameçonnage, la notation des URLs et la prédiction des noms de domaine utilisés dans les attaques d’hameçonnage. Les techniques proposées sont évaluées sur des données réelles et elles montrent leur efficacité en répondant aux exigences de vitesse, d’universalité et de fiabilité / Phishing is a kind of modern swindles that targets electronic communications users and aims to persuade them to perform actions for a another’s benefit. Phishing attacks rely mostly on social engineering and that most phishing vectors leverage directing links represented by domain names and URLs, we introduce new solutions to cope with phishing. These solutions rely on the lexical and semantic analysis of the composition of domain names and URLs. Both of these resource pointers are created and obfuscated by phishers to trap their victims. Hence, we demonstrate in this document that phishing domain names and URLs present similarities in their lexical and semantic composition that are different form legitimate domain names and URLs composition. We use this characteristic to build models representing the composition of phishing URLs and domain names using machine learning techniques and natural language processing models. The built models are used for several applications such as the identification of phishing domain names and phishing URLs, the rating of phishing URLs and the prediction of domain names used in phishing attacks. All the introduced techniques are assessed on ground truth data and show their efficiency by meeting speed, coverage and reliability requirements. This document shows that the use of lexical and semantic analysis can be applied to domain names and URLs and that this application is relevant to detect phishing attacks
|
7 |
Malicious Intent Detection Framework for Social NetworksFausak, Andrew Raymond 05 1900 (has links)
Many, if not all people have online social accounts (OSAs) on an online community (OC) such as Facebook (Meta), Twitter (X), Instagram (Meta), Mastodon, Nostr. OCs enable quick and easy interaction with friends, family, and even online communities to share information about. There is also a dark side to Ocs, where users with malicious intent join OC platforms with the purpose of criminal activities such as spreading fake news/information, cyberbullying, propaganda, phishing, stealing, and unjust enrichment. These criminal activities are especially concerning when harming minors. Detection and mitigation are needed to protect and help OCs and stop these criminals from harming others. Many solutions exist; however, they are typically focused on a single category of malicious intent detection rather than an all-encompassing solution. To answer this challenge, we propose the first steps of a framework for analyzing and identifying malicious intent in OCs that we refer to as malicious mntent detection framework (MIDF). MIDF is an extensible proof-of-concept that uses machine learning techniques to enable detection and mitigation. The framework will first be used to detect malicious users using solely relationships and then can be leveraged to create a suite of malicious intent vector detection models, including phishing, propaganda, scams, cyberbullying, racism, spam, and bots for open-source online social networks, such as Mastodon, and Nostr.
|
Page generated in 0.1286 seconds