Spelling suggestions: "subject:"span filter"" "subject:"spa filter""
1 |
Comparing the relative efficacy of phishing emails / Jämförelse av phishing emails relativa effektivitetLingaas Türk, Jakob January 2020 (has links)
This study aimed to examine if there was a difference in how likely a victim is to click on a phishing email’s links based on the content of the email, the tone and language used and the structure of the code. This likelihood also includes the email’s ability to bypass spam filters. Method: The method used to examine this was a simulated phishing attack. Six different phishing templates were created and sent out via the Gophish framework to target groups of students (from Halmstad University), from a randomized pool of 20.000 users. The phishing emails contained a link to a landing page (hosted via a virtual machine) which tracked user status. The templates were: Covid19 Pre-Attempt, Spotify Friendly CSS, Spotify Friendly Button, Spotify Aggressive CSS, Spotify Aggressive Button, Student Union. Results: Covid19 Pre-Attempt: 72.6% initial spam filter evasion, 45.8% spam filter evasion, 4% emails opened and 100% links clicked. Spotify Friendly CSS: 50% initial spam filter evasion, 38% spam filter evasion, 26.3% emails opened and 0% links clicked. Spotify Friendly Button: 59% initial spam filter evasion, 28.8% spam filter evasion, 5.8% emails opened and 0 %links clicked. Spotify Aggressive CSS: 50% initial spam filter evasion, 38% spam filter evasion, 10.5% emails opened, and 100% links clicked. Spotify Aggressive Button: 16% initial spam filter evasion, 25% spam filter evasion, 0% emails opened and 0% emails clicked. Student Union: 40% initial spam filter evasion, 75% spam filter evasion, 33.3% emails opened and 100% links clicked. Conclusion: Differently structured emails have different capabilities for bypassing spam filters and for deceiving users. Language and tone appears to affect phishing email efficacy; the results suggest that an aggressive and authoritative tone heightens a phishing email’s ability to deceive users, but seems to not affect its ability to bypass spam filters to a similar degree. Authenticity appears to affect email efficacy; the results showed a difference in deception efficacy if an email was structured like that of a genuine sender. Appealing to emotions such as stress and fear appears to increase the phishing email’s efficacy in deceiving a user. / Syftet med denna studie var att undersöka om det fanns en skillnad i hur troligt det är att ett offer klickar på länkarna till ett phishing-e-postmeddelande, baserat på innehållet i e-postmeddelandet, tonen och språket som används och kodens struktur. Denna sannolikhet inkluderar även e-postens förmåga att kringgå skräppostfilter. Metod: Metoden som användes var en simulerad phishing-attack. Sex olika phishing-mallar skapades och skickades ut via Gophish-ramverket till målgruppen bestående av studenter (från Halmstads universitet), från en slumpmässig pool med 20 000 användare. Phishing-e-postmeddelandena innehöll en länk till en målsida (hostad via en virtuell maskin) som spårade användarstatus. Mallarna var: Covid19 Pre-Attempt, Spotify Friendly CSS, Spotify Friendly Button, Spotify Aggressive CSS, Spotify Aggressive Button, Student Union. Resultat: Covid19 förförsök: 72,6% kringgick det primära spamfiltret, 45,8% kringgick det sekundära spamfiltret, 4% e-postmeddelanden öppnade och 100% länkar klickade Spotify Friendly CSS: 50% kringgick det primära spamfiltret, 38% kringgick det sekundära spamfiltret, 26,3% e-postmeddelanden öppnade och 0% länkar klickade. Spotify Friendly Button: 59% kringgick det primära spamfiltret, 28,8% kringgick det sekundära spamfiltret, 5.8% e-postmeddelanden öppnade och 0% länkar klickade. Spotify Aggressive CSS: 50% kringgick det primära spamfiltret, 38% kringgick det sekundära spamfiltret, 10,5% e-post öppnade och 100% länkar klickade. Spotify Aggressive Button: 16% kringgick det primära spamfiltret, 25% kringgick det sekundära spamfiltret, 0% e-postmeddelanden öppnade och 0% e-postmeddelanden klickade. Studentkåren: 40% kringgick det primära spamfiltret, 75% kringgick det sekundära spamfiltret, 33,3% e-postmeddelanden öppnade och 100% länkar klickade. Slutsats: Olika strukturerade e-postmeddelanden har olika funktioner för att kringgå skräppostfilter och för att lura användare. Språk och ton tycks påverka effektiviteten för epost-phishing. Resultaten tyder på att en aggressiv och auktoritär ton ökar phishing-epostmeddelandets förmåga att lura användare, men verkar inte påverka dess förmåga att kringgå skräppostfilter i motsvarande grad. Autenticitet verkar påverka e-postens effektivitet, då resultaten visade en skillnad i effektivitet om ett e-postmeddelande var strukturerat som en äkta avsändare. Att adressera känslor som stress och rädsla verkar öka phishing-e-postens effektivitet när det gäller att lura en användare.
|
2 |
Scavenger: A Junk Mail Classification ProgramMalkhare, Rohan V 20 January 2003 (has links)
The problem of junk mail, also called spam, has reached epic proportions and various efforts are underway to fight spam. Junk mail classification using machine learning techniques is a key method to fight spam. We have devised a machine learning algorithm where features are created from individual sentences in the subject and body of a message by forming all possible word-pairings from a sentence. Weights are assigned to the features based on the strength of their predictive capabilities for spam/legitimate determination. The predictive capabilities are estimated by the frequency of occurrence of the feature in spam/legitimate collections as well as by application of heuristic rules. During classification, total spam and legitimate evidence in the message is obtained by summing up the weights of extracted features of each class and the message is classified into whichever class accumulates the greater sum.
We compared the algorithm against the popular naïve-bayes algorithm (in [8]) and found it's performance exceeded that of naïve-bayes algorithm both in terms of catching spam and for reducing false positives.
|
3 |
Analysis and Simulation of Threats in an Open, Decentralized, Distributed Spam Filtering SystemJägenstedt, Gabriel January 2012 (has links)
The existance of spam email has gone from a fairly small amounts of afew hundred in the late 1970’s to several billions per day in 2010. Thiscontinually growing problem is of great concern to both businesses andusers alike.One attempt to combat this problem comes with a spam filtering toolcalled TRAP. The primary design goal of TRAP is to enable tracking ofthe reputation of mail senders in a decentralized and distributed fashion.In order for the tool to be useful, it is important that it does not haveany security issues that will let a spammer bypass the protocol or gain areputation that it should not have.As a piece of this puzzle, this thesis makes an analysis of TRAP’s protocoland design in order to find threats and vulnerabilies capable of bypassingthe protocol safeguards. Based on these threats we also evaluate possiblemitigations both by analysis and simulation. We have found that althoughthe protocol was not designed with regards to certain attacks on the systemitself most of the attacks can be fairly easily stopped.The analysis shows that by adding cryptographic defenses to the protocola lot of the threats would be mitigated. In those cases where cryptographywould not suffice it is generally down to sane design choices in the implementationas well as not always trusting that a node is being truthful andfollowing protocol.
|
4 |
Realizace spamového filtru na bázi umělého imunitního systému / Spam Filter Implementation on the Basis of Artificial Immune SystemsNeuwirth, David January 2009 (has links)
Unsolicited e-mails generally present a major problem within the e-mail communication nowadays. There exist several methods that can detect spam and distinguish it from the requested messages. The theoretical part of the masters thesis introduces the ways of detecting unsolicited messages by using artificial immune systems. It presents and subsequently analyses several methods of the artificial immune systems that can assist in the fight against spam. The practical part of the masters thesis deals with the implementation of a spam filter on the basis of the artificial immune systems. The project ends with comparison of effectiveness of the newly designed spam filter and the one which uses common methods for spam detection.
|
5 |
Naive Bayesian Spam Filters for Log File AnalysisHavens, Russel William 13 July 2011 (has links) (PDF)
As computer system usage grows in our world, system administrators need better visibility into the workings of computer systems, especially when those systems have problems or go down. Most system components, from hardware, through OS, to application server and application, write log files of some sort, be it system-standardized logs such syslog or application specific logs. These logs very often contain valuable clues to the nature of system problems and outages, but their verbosity can make them difficult to utilize. Statistical data mining methods could help in filtering and classifying log entries, but these tools are often out of the reach of administrators. This research tests the effectiveness of three off-the-shelf Bayesian spam email filters (SpamAssassin, SpamBayes and Bogofilter) for effectiveness as log entry classifiers. A simple scoring system, the Filter Effectiveness Scale (FES), is proposed and used to compare these filters. These filters are tested in three stages: 1) the filters were tested with the SpamAssassin corpus, with various manipulations made to the messages, 2) the filters were tested for their ability to differentiate two types of log entries taken from actual production systems, and 3) the filters were trained on log entries from actual system outages and then tested on effectiveness for finding similar outages via the log files. For stage 1, messages were tested with normalized bodies, normalized headers and with each sentence from each message body as a separate message with a standardized message. The impact of each manipulation is presented. For stages 2 and 3, log entries were tested with digits normalized to zeros, with words chained together to various lengths and one or all levels of word chains used together. The impacts of these manipulations are presented. In each of these stages, it was found that these widely available Bayesian content filters were effective in differentiating log entries. Tables of correct match percentages or score graphs, according to the nature of tests and numbers of entries are presented, are presented, and FES scores are assigned to the filters according to the attributes impacting their effectiveness. This research leads to the suggestion that simple, off-the-shelf Bayesian content filters can be used to assist system administrators and log mining systems in sifting log entries to find entries related to known conditions (for which there are example log entries), and to exclude outages which are not related to specific known entry sets.
|
6 |
Prediction games : machine learning in the presence of an adversaryBrückner, Michael January 2012 (has links)
In many applications one is faced with the problem of inferring some functional relation between input and output variables from given data. Consider, for instance, the task of email spam filtering where one seeks to find a model which automatically assigns new, previously unseen emails to class spam or non-spam. Building such a predictive model based on observed training inputs (e.g., emails) with corresponding outputs (e.g., spam labels) is a major goal of machine learning.
Many learning methods assume that these training data are governed by the same distribution as the test data which the predictive model will be exposed to at application time. That assumption is violated when the test data are generated in response to the presence of a predictive model. This becomes apparent, for instance, in the above example of email spam filtering. Here, email service providers employ spam filters and spam senders engineer campaign templates such as to achieve a high rate of successful deliveries despite any filters.
Most of the existing work casts such situations as learning robust models which are unsusceptible against small changes of the data generation process. The models are constructed under the worst-case assumption that these changes are performed such to produce the highest possible adverse effect on the performance of the predictive model. However, this approach is not capable to realistically model the true dependency between the model-building process and the process of generating future data. We therefore establish the concept of prediction games: We model the interaction between a learner, who builds the predictive model, and a data generator, who controls the process of data generation, as an one-shot game. The game-theoretic framework enables us to explicitly model the players' interests, their possible actions, their level of knowledge about each other, and the order at which they decide for an action.
We model the players' interests as minimizing their own cost function which both depend on both players' actions. The learner's action is to choose the model parameters and the data generator's action is to perturbate the training data which reflects the modification of the data generation process with respect to the past data.
We extensively study three instances of prediction games which differ regarding the order in which the players decide for their action. We first assume that both player choose their actions simultaneously, that is, without the knowledge of their opponent's decision. We identify conditions under which this Nash prediction game has a meaningful solution, that is, a unique Nash equilibrium, and derive algorithms that find the equilibrial prediction model. As a second case, we consider a data generator who is potentially fully informed about the move of the learner. This setting establishes a Stackelberg competition. We derive a relaxed optimization criterion to determine the solution of this game and show that this Stackelberg prediction game generalizes existing prediction models. Finally, we study the setting where the learner observes the data generator's action, that is, the (unlabeled) test data, before building the predictive model. As the test data and the training data may be governed by differing probability distributions, this scenario reduces to learning under covariate shift. We derive a new integrated as well as a two-stage method to account for this data set shift.
In case studies on email spam filtering we empirically explore properties of all derived models as well as several existing baseline methods. We show that spam filters resulting from the Nash prediction game as well as the Stackelberg prediction game in the majority of cases outperform other existing baseline methods. / Eine der Aufgabenstellungen des Maschinellen Lernens ist die Konstruktion von Vorhersagemodellen basierend auf gegebenen Trainingsdaten. Ein solches Modell beschreibt den Zusammenhang zwischen einem Eingabedatum, wie beispielsweise einer E-Mail, und einer Zielgröße; zum Beispiel, ob die E-Mail durch den Empfänger als erwünscht oder unerwünscht empfunden wird. Dabei ist entscheidend, dass ein gelerntes Vorhersagemodell auch die Zielgrößen zuvor unbeobachteter Testdaten korrekt vorhersagt.
Die Mehrzahl existierender Lernverfahren wurde unter der Annahme entwickelt, dass Trainings- und Testdaten derselben Wahrscheinlichkeitsverteilung unterliegen. Insbesondere in Fällen in welchen zukünftige Daten von der Wahl des Vorhersagemodells abhängen, ist diese Annahme jedoch verletzt. Ein Beispiel hierfür ist das automatische Filtern von Spam-E-Mails durch E-Mail-Anbieter. Diese konstruieren Spam-Filter basierend auf zuvor empfangenen E-Mails. Die Spam-Sender verändern daraufhin den Inhalt und die Gestaltung der zukünftigen Spam-E-Mails mit dem Ziel, dass diese durch die Filter möglichst nicht erkannt werden.
Bisherige Arbeiten zu diesem Thema beschränken sich auf das Lernen robuster Vorhersagemodelle welche unempfindlich gegenüber geringen Veränderungen des datengenerierenden Prozesses sind. Die Modelle werden dabei unter der Worst-Case-Annahme konstruiert, dass diese Veränderungen einen maximal negativen Effekt auf die Vorhersagequalität des Modells haben. Diese Modellierung beschreibt die tatsächliche Wechselwirkung zwischen der Modellbildung und der Generierung zukünftiger Daten nur ungenügend. Aus diesem Grund führen wir in dieser Arbeit das Konzept der Prädiktionsspiele ein. Die Modellbildung wird dabei als mathematisches Spiel zwischen einer lernenden und einer datengenerierenden Instanz beschrieben. Die spieltheoretische Modellierung ermöglicht es uns, die Interaktion der beiden Parteien exakt zu beschreiben. Dies umfasst die jeweils verfolgten Ziele, ihre Handlungsmöglichkeiten, ihr Wissen übereinander und die zeitliche Reihenfolge, in der sie agieren.
Insbesondere die Reihenfolge der Spielzüge hat einen entscheidenden Einfluss auf die spieltheoretisch optimale Lösung. Wir betrachten zunächst den Fall gleichzeitig agierender Spieler, in welchem sowohl der Lerner als auch der Datengenerierer keine Kenntnis über die Aktion des jeweils anderen Spielers haben. Wir leiten hinreichende Bedingungen her, unter welchen dieses Spiel eine Lösung in Form eines eindeutigen Nash-Gleichgewichts besitzt. Im Anschluss diskutieren wir zwei verschiedene Verfahren zur effizienten Berechnung dieses Gleichgewichts. Als zweites betrachten wir den Fall eines Stackelberg-Duopols. In diesem Prädiktionsspiel wählt der Lerner zunächst das Vorhersagemodell, woraufhin der Datengenerierer in voller Kenntnis des Modells reagiert. Wir leiten ein relaxiertes Optimierungsproblem zur Bestimmung des Stackelberg-Gleichgewichts her und stellen ein mögliches Lösungsverfahren vor. Darüber hinaus diskutieren wir, inwieweit das Stackelberg-Modell bestehende robuste Lernverfahren verallgemeinert. Abschließend untersuchen wir einen Lerner, der auf die Aktion des Datengenerierers, d.h. der Wahl der Testdaten, reagiert. In diesem Fall sind die Testdaten dem Lerner zum Zeitpunkt der Modellbildung bekannt und können in den Lernprozess einfließen. Allerdings unterliegen die Trainings- und Testdaten nicht notwendigerweise der gleichen Verteilung. Wir leiten daher ein neues integriertes sowie ein zweistufiges Lernverfahren her, welche diese Verteilungsverschiebung bei der Modellbildung berücksichtigen.
In mehreren Fallstudien zur Klassifikation von Spam-E-Mails untersuchen wir alle hergeleiteten, sowie existierende Verfahren empirisch. Wir zeigen, dass die hergeleiteten spieltheoretisch-motivierten Lernverfahren in Summe signifikant bessere Spam-Filter erzeugen als alle betrachteten Referenzverfahren.
|
Page generated in 0.0732 seconds