441 |
Near Real-time Detection of Masquerade attacks in Web applications : catching imposters using their browsing behavorPanopoulos, Vasileios January 2016 (has links)
This Thesis details the research on Machine Learning techniques that are central in performing Anomaly and Masquerade attack detection. The main focus is put on Web Applications because of their immense popularity and ubiquity. This popularity has led to an increase in attacks, making them the most targeted entry point to violate a system. Specifically, a group of attacks that range from identity theft using social engineering to cross site scripting attacks, aim at exploiting and masquerading users. Masquerading attacks are even harder to detect due to their resemblance with normal sessions, thus posing an additional burden. Concerning prevention, the diversity and complexity of those systems makes it harder to define reliable protection mechanisms. Additionally, new and emerging attack patterns make manually configured and Signature based systems less effective with the need to continuously update them with new rules and signatures. This leads to a situation where they eventually become obsolete if left unmanaged. Finally the huge amount of traffic makes manual inspection of attacks and False alarms an impossible task. To tackle those issues, Anomaly Detection systems are proposed using powerful and proven Machine Learning algorithms. Gravitating around the context of Anomaly Detection and Machine Learning, this Thesis initially defines several basic definitions such as user behavior, normality and normal and anomalous behavior. Those definitions aim at setting the context in which the proposed method is targeted and at defining the theoretical premises. To ease the transition into the implementation phase, the underlying methodology is also explained in detail. Naturally, the implementation is also presented, where, starting from server logs, a method is described on how to pre-process the data into a form suitable for classification. This preprocessing phase was constructed from several statistical analyses and normalization methods (Univariate Selection, ANOVA) to clear and transform the given logs and perform feature selection. Furthermore, given that the proposed detection method is based on the source and1request URLs, a method of aggregation is proposed to limit the user privacy and classifier over-fitting issues. Subsequently, two popular classification algorithms (Multinomial Naive Bayes and Support Vector Machines) have been tested and compared to define which one performs better in our given situations. Each of the implementation steps (pre-processing and classification) requires a number of different parameters to be set and thus a method called Hyper-parameter optimization is defined. This method searches for the parameters that improve the classification results. Moreover, the training and testing methodology is also outlined alongside the experimental setup. The Hyper-parameter optimization and the training phases are the most computationally intensive steps, especially given a large number of samples/users. To overcome this obstacle, a scaling methodology is also defined and evaluated to demonstrate its ability to handle larger data sets. To complete this framework, several other options have been also evaluated and compared to each other to challenge the method and implementation decisions. An example of this, is the "Transitions-vs-Pages" dilemma, the block restriction effect, the DR usefulness and the classification parameters optimization. Moreover, a Survivability Analysis is performed to demonstrate how the produced alarms could be correlated affecting the resulting detection rates and interval times. The implementation of the proposed detection method and outlined experimental setup lead to interesting results. Even so, the data-set that has been used to produce this evaluation is also provided online to promote further investigation and research on this field. / Det här arbetet behandlar forskningen på maskininlärningstekniker som är centrala i utförandet av detektion av anomali- och maskeradattacker. Huvud-fokus läggs på webbapplikationer på grund av deras enorma popularitet och att de är så vanligt förekommande. Denna popularitet har lett till en ökning av attacker och har gjort dem till den mest utsatta punkten för att bryta sig in i ett system. Mer specifikt så syftar en grupp attacker som sträcker sig från identitetsstölder genom social ingenjörskonst, till cross-site scripting-attacker, på att exploatera och maskera sig som olika användare. Maskeradattacker är ännu svårare att upptäcka på grund av deras likhet med vanliga sessioner, vilket utgör en ytterligare börda. Vad gäller förebyggande, gör mångfalden och komplexiteten av dessa system det svårare att definiera pålitliga skyddsmekanismer. Dessutom gör nya och framväxande attackmönster manuellt konfigurerade och signaturbaserade system mindre effektiva på grund av behovet att kontinuerligt uppdatera dem med nya regler och signaturer. Detta leder till en situation där de så småningom blir obsoleta om de inte sköts om. Slutligen gör den enorma mängden trafik manuell inspektion av attacker och falska alarm ett omöjligt uppdrag. För att ta itu med de här problemen, föreslås anomalidetektionssystem som använder kraftfulla och beprövade maskininlärningsalgoritmer. Graviterande kring kontexten av anomalidetektion och maskininlärning, definierar det här arbetet först flera enkla definitioner såsom användarbeteende, normalitet, och normalt och anomalt beteende. De här definitionerna syftar på att fastställa sammanhanget i vilket den föreslagna metoden är måltavla och på att definiera de teoretiska premisserna. För att under-lätta övergången till implementeringsfasen, förklaras även den bakomliggande metodologin i detalj. Naturligtvis presenteras även implementeringen, där, med avstamp i server-loggar, en metod för hur man kan för-bearbeta datan till en form som är lämplig för klassificering beskrivs. Den här för´-bearbetningsfasen konstruerades från flera statistiska analyser och normaliseringsmetoder (univariate se-lection, ANOVA) för att rensa och transformera de givna loggarna och utföra feature selection. Dessutom, givet att en föreslagen detektionsmetod är baserad på käll- och request-URLs, föreslås en metod för aggregation för att begränsa problem med överanpassning relaterade till användarsekretess och klassificerare. Efter det så testas och jämförs två populära klassificeringsalgoritmer (Multinomialnaive bayes och Support vector machines) för att definiera vilken som fungerar bäst i våra givna situationer. Varje implementeringssteg (för-bearbetning och klassificering) kräver att ett antal olika parametrar ställs in och således definieras en metod som kallas Hyper-parameter optimization. Den här metoden söker efter parametrar som förbättrar klassificeringsresultaten. Dessutom så beskrivs tränings- och test-ningsmetodologin kortfattat vid sidan av experimentuppställningen. Hyper-parameter optimization och träningsfaserna är de mest beräkningsintensiva stegen, särskilt givet ett stort urval/stort antal användare. För att övervinna detta hinder så definieras och utvärderas även en skalningsmetodologi baserat på dess förmåga att hantera stora datauppsättningar. För att slutföra detta ramverk, utvärderas och jämförs även flera andra alternativ med varandra för att utmana metod- och implementeringsbesluten. Ett exempel på det är ”Transitions-vs-Pages”-dilemmat, block restriction-effekten, DR-användbarheten och optimeringen av klassificeringsparametrarna. Dessu-tom så utförs en survivability analysis för att demonstrera hur de producerade alarmen kan korreleras för att påverka den resulterande detektionsträ˙säker-heten och intervalltiderna. Implementeringen av den föreslagna detektionsmetoden och beskrivna experimentuppsättningen leder till intressanta resultat. Icke desto mindre är datauppsättningen som använts för att producera den här utvärderingen också tillgänglig online för att främja vidare utredning och forskning på området.
|
442 |
Comparison of Automatic Classifiers’ Performances using Word-based Feature Extraction Techniques in an E-government settingMarin Rodenas, Alfonso January 2011 (has links)
Nowadays email is commonly used by citizens to establish communication with their government. On the received emails, governments deal with some common queries and subjects which some handling officers have to manually answer. Automatic email classification of the incoming emails allows to increase the communication efficiency by decreasing the delay between the query and its response. This thesis takes part within the IMAIL project, which aims to provide an automatic answering solution to the Swedish Social Insurance Agency (SSIA) (“Försäkringskassan” in Swedish). The goal of this thesis is to analyze and compare the classification performance of different sets of features extracted from SSIA emails on different automatic classifiers. The features extracted from the emails will depend on the previous preprocessing that is carried out as well. Compound splitting, lemmatization, stop words removal, Part-of-Speech tagging and Ngrams are the processes used in the data set. Moreover, classifications will be performed using Support Vector Machines, k- Nearest Neighbors and Naive Bayes. For the analysis and comparison of different results, precision, recall and F-measure are used. From the results obtained in this thesis, SVM provides the best classification with a F-measure value of 0.787. However, Naive Bayes provides a better classification for most of the email categories than SVM. Thus, it can not be concluded whether SVM classify better than Naive Bayes or not. Furthermore, a comparison to Dalianis et al. (2011) is made. The results obtained in this approach outperformed the results obtained before. SVM provided a F-measure value of 0.858 when using PoS-tagging on original emails. This result improves by almost 3% the 0.83 obtained in Dalianis et al. (2011). In this case, SVM was clearly better than Naive Bayes.
|
443 |
Comparing Anomaly-Based Network Intrusion Detection Approaches Under Practical AspectsHelmrich, Daniel 07 July 2021 (has links)
While many of the currently used network intrusion detection systems (NIDS) employ signature-based approaches, there is an increasing research interest in the examination of anomaly-based detection methods, which seem to be more suited for recognizing zero-day attacks. Nevertheless, requirements for their practical deployment, as well as objective and reproducible evaluation methods, are hereby often neglected. The following thesis defines aspects that are crucial for a practical evaluation of anomaly-based NIDS, such as the focus on modern attack types, the restriction to one-class classification methods, the exclusion of known attacks from the training phase, a low false detection rate, and consideration of the runtime efficiency. Based on those principles, a framework dedicated to developing, testing and evaluating models for the detection of network anomalies is proposed. It is applied to two datasets featuring modern traffic, namely the UNSW-NB15 and the CIC-IDS-2017 datasets, in order to compare and evaluate commonly-used network intrusion detection methods. The implemented approaches include, among others, a highly configurable network flow generator, a payload analyser, a one-hot encoder, a one-class support vector machine, and an autoencoder. The results show a significant difference between the two chosen datasets: While for the UNSW-NB15 dataset several reasonably well performing model combinations for both the autoencoder and the one-class SVM can be found, most of them yield unsatisfying results when the CIC-IDS-2017 dataset is used. / Obwohl viele der derzeit genutzten Systeme zur Erkennung von Netzwerkangriffen (engl. NIDS) signaturbasierte Ansätze verwenden, gibt es ein wachsendes Forschungsinteresse an der Untersuchung von anomaliebasierten Erkennungsmethoden, welche zur Identifikation von Zero-Day-Angriffen geeigneter erscheinen. Gleichwohl werden hierbei Bedingungen für deren praktischen
Einsatz oft vernachlässigt, ebenso wie objektive und reproduzierbare Evaluationsmethoden. Die folgende Arbeit definiert Aspekte, die für eine praxisorientierte Evaluation unabdingbar sind. Dazu zählen ein Schwerpunkt auf modernen Angriffstypen, die Beschränkung auf One-Class Classification Methoden, der Ausschluss von bereits bekannten Angriffen aus dem Trainingsdatensatz,
niedrige Falscherkennungsraten sowie die Berücksichtigung der Laufzeiteffizienz. Basierend auf diesen Prinzipien wird ein Rahmenkonzept vorgeschlagen, das für das Entwickeln, Testen und Evaluieren von Modellen zur Erkennung von Netzwerkanomalien bestimmt ist. Dieses wird auf zwei Datensätze mit modernem Netzwerkverkehr, namentlich auf den UNSW-NB15 und den CIC-IDS-
2017 Datensatz, angewendet, um häufig genutzte NIDS-Methoden zu vergleichen und zu evaluieren.
Die für diese Arbeit implementierten Ansätze beinhalten, neben anderen, einen weit konfigurierbaren Netzwerkflussgenerator, einen Nutzdatenanalysierer, einen One-Hot-Encoder, eine One-Class Support Vector Machine sowie einen Autoencoder. Die Resultate zeigen einen großen Unterschied zwischen den beiden ausgewählten Datensätzen: Während für den UNSW-NB15 Datensatz verschiedene angemessen gut funktionierende Modellkombinationen, sowohl für den Autoencoder als
auch für die One-Class SVM, gefunden werden können, bringen diese für den CIC-IDS-2017 Datensatz meist unbefriedigende Ergebnisse.
|
444 |
Adaptive Anomaly Detection for Large IoT Datasets with Machine Learning and Transfer LearningNegus, Andra Stefania January 2020 (has links)
As more IoT devices enter the market it becomes increasingly important to develop reliable and adaptive ways of dealing with the data they generate. These must address data quality and reliability. Such solutions could benefit both the device producers and their customers who, as a result, could receive faster and better customer support services. Thus, this project's goal is twofold. First, it is to identify faulty data points generated by such devices. Second, it is to evaluate whether the knowledge gained from available/known sensors and appliances is transferable to other sensors on similar devices. This would make it possible to evaluate the behaviour of new appliances as soon as they are first switched on, rather than after sufficient data from them has been collected. This project uses time series data from three appliances: washing machine, washer&dryer and refrigerator. For these, two solutions are developed and tested: one for categorical and another for numerical variables. Categorical variables are analysed using the Average Value Frequency and the pure frequency of state-transition methods. Due to the limited number of possible states, the pure frequency proves to be the better solution, and the knowledge gained is transferred from the source device to the target one, with moderate success. Numerical variables are analysed using a One-class Support Vector Machine pipeline, with very promising results. Further, learning and forgetting mechanisms are developed to allow for the pipelines to adapt to changes in appliance patterns of behaviour. This includes a decay function for the numerical variables solution. Interestingly, the different weights for the source and target have little to no impact on the quality of the classification. / Nya IoT-enheter träder in på marknaden så det blir allt viktigare att utveckla tillförlitliga och anpassningsbara sätt att hantera de data de genererar. Dessa bör hantera datakvalitet och tillförlitlig- het. Sådana lösningar kan gynna båda tillverkarna av apparater och deras kunder som som ett resultat kan dra nytta av snabbare och bättre kundsupport / tjänster. Således har detta projekt två mål. Det första är att identifiera felaktiga datapunkter som genereras av sådana enheter. För det andra är det att utvärdera om kunskapen från tillgängliga / kända sensorer och apparater kan överföras till andra sensorer på liknande enheter. Detta skulle göra det möjligt att utvärdera beteendet hos nya apparater så snart de slås på första gången, snarare än efter att tillräcklig information från dem har samlats in. Detta projekt använder tidsseriedata från tre apparater: tvättmaskin, tvättmaskin och torktumlare och kylskåp. För dessa utvecklas och testas två lösningar: en för kategoriska variabler och en annan för numeriska variabler. De kategoriska variablerna analyseras med två metoder: Average Value Frequency och den rena frekvensen för tillståndsövergång. På grund av det begränsade antalet möjliga tillstånd visar sig den rena frekvensen vara den bättre lösningen, och kunskapen som erhålls överförs från källanordningen till målet, med måttlig framgång. De numeriska variablerna analyseras med hjälp av en One-class Support Vector Machine-pipeline, med mycket lovande resultat. Vidare utvecklas inlärnings- och glömningsmekanismer för att möjliggöra för rörledningarna att anpassa sig till förändringar i apparatens beteendemönster. Detta inkluderar en sönderfallningsfunktion för den numeriska variabellösningen. Intressant är att de olika vikterna för källan och målet har liten eller ingen inverkan på kvaliteten på klassificeringen.
|
445 |
Automated analysis of battery articlesHaglund, Robin January 2020 (has links)
Journal articles are the formal medium for the communication of results among scientists, and often contain valuable data. However, manually collecting article data from a large field like lithium-ion battery chemistry is tedious and time consuming, which is an obstacle when searching for statistical trends and correlations to inform research decisions. To address this a platform for the automatic retrieval and analysis of large numbers of articles is created and applied to the field of lithium-ion battery chemistry. Example data produced by the platform is presented and evaluated and sources of error limiting this type of platform are identified, with problems related to text extraction and pattern matching being especially significant. Some solutions to these problems are presented and potential future improvements are proposed.
|
446 |
ROZPOZNÁNÍ ČINNOSTÍ ČLOVĚKA VE VIDEU / HUMAN ACTION RECOGNITION IN VIDEOŘezníček, Ivo January 2014 (has links)
Tato disertační práce se zabývá vylepšením systémů pro rozpoznávání činností člověka. Současný stav vědění v této oblasti jest prezentován. Toto zahrnuje způsoby získávání digitálních obrazů a videí společně se způsoby reprezentace těchto entit za použití počítače. Dále jest prezentováno jak jsou použity extraktory příznakových vektorů a extraktory pros- torově-časových příznakových vektorů a způsoby přípravy těchto dat pro další zpracování. Příkladem následného zpracování jsou klasifikační metody. Pro zpracování se obecně obvykle používají části videa s proměnlivou délkou. Hlavní přínos této práce je vyřčená hypotéza o optimální délce analýzy video sekvence, kdy kvalita řešení je porovnatelná s řešením bez restrikce délky videosekvence. Algoritmus pro ověření této hypotézy jest navržen, implementován a otestován. Hypotéza byla experimentálně ověřena za použití tohoto algoritmu. Při hledání optimální délky bylo též dosaženo jistého zlepšení kvality klasifikace. Experimenty, výsledky a budoucí využití této práce jsou taktéž prezentovány.
|
447 |
Low Cost Open Source Modal Virtual Environment Interfaces Using Full Body Motion Tracking and Hand Gesture RecognitionMarangoni, Matthew J. 25 May 2013 (has links)
No description available.
|
448 |
Automated Gravel Road Condition Assessment : A Case Study of Assessing Loose Gravel using Audio DataSaeed, Nausheen January 2021 (has links)
Gravel roads connect sparse populations and provide highways for agriculture and the transport of forest goods. Gravel roads are an economical choice where traffic volume is low. In Sweden, 21% of all public roads are state-owned gravel roads, covering over 20,200 km. In addition, there are some 74,000 km of gravel roads and 210,000 km of forest roads that are owned by the private sector. The Swedish Transport Administration (Trafikverket) rates the condition of gravel roads according to the severity of irregularities (e.g. corrugations and potholes), dust, loose gravel, and gravel cross-sections. This assessment is carried out during the summertime when roads are free of snow. One of the essential parameters for gravel road assessment is loose gravel. Loose gravel can cause a tire to slip, leading to a loss of driver control. Assessment of gravel roads is carried out subjectively by taking images of road sections and adding some textual notes. A cost-effective, intelligent, and objective method for road assessment is lacking. Expensive methods, such as laser profiler trucks, are available and can offer road profiling with high accuracy. These methods are not applied to gravel roads, however, because of the need to maintain cost-efficiency. In this thesis, we explored the idea that, in addition to machine vision, we could also use machine hearing to classify the condition of gravel roads in relation to loose gravel. Several suitable classical supervised learning and convolutional neural networks (CNN) were tested. When people drive on gravel roads, they can make sense of the road condition by listening to the gravel hitting the bottom of the car. The more we hear gravel hitting the bottom of the car, the more we can sense that there is a lot of loose gravel and, therefore, the road might be in a bad condition. Based on this idea, we hypothesized that machines could also undertake such a classification when trained with labeled sound data. Machines can identify gravel and non-gravel sounds. In this thesis, we used traditional machine learning algorithms, such as support vector machines (SVM), decision trees, and ensemble classification methods. We also explored CNN for classifying spectrograms of audio sounds and images in gravel roads. Both supervised learning and CNN were used, and results were compared for this study. In classical algorithms, when compared with other classifiers, ensemble bagged tree (EBT)-based classifiers performed best for classifying gravel and non-gravel sounds. EBT performance is also useful in reducing the misclassification of non-gravel sounds. The use of CNN also showed a 97.91% accuracy rate. Using CNN makes the classification process more intuitive because the network architecture takes responsibility for selecting the relevant training features. Furthermore, the classification results can be visualized on road maps, which can help road monitoring agencies assess road conditions and schedule maintenance activities for a particular road. / <p>Due to unforeseen circumstances the seminar was postponed from May 7 to 28, as duly stated in the new posting page.</p>
|
449 |
Soft sensor for snow density measurementsBrandt, Filippa January 2022 (has links)
The aim of this project was to examine if a machine learning model could be used to predict snow density from six different weather parameters. These were artificially generated snow density, air temperature, ground temperature, relative humidity, windspeed and the snow depth change. The questions asked were what parameters correlates to the snow density, what model will perform best and could this approach be a better alternative to measure snow density manually. The research was performed in the application Regression Learner in MATLAB by testing five different premade machine learning models on a dataset. The premade models were, Linear Regression, GPR Matern 5/2, SVM Medium Gaussian, Wide Neural Network and Trilayered Neural Network. Also, the project includes data collection, data cleaning, data modification, data generation, training, testing, and evaluating the models. The results show that air temperature and windspeed overall are the most important parameters and the GPR Matern 5/2 and the Wide Neural Network had the highest performance. Lastly, it was concluded that the machine learning model could be a better alternative to measuring snow density with a real sensor. / Målet med detta arbete var att undersöka om en maskininlärningsmodell kunde användas för att förutse snödensitet utifrån sex olika väderparametrar. Dessa var artificiell genererad snödensitet, lufttemperatur, marktemperatur, relativ luftfuktighet, vindhastighet och snödjupsförändring. Frågeställningarna som skulle besvaras var vilka väderparametrar som korrelerar med snödensiteten, vilken eller vilka modeller som presterade bäst samt om maskininlärningsmodellen skulle kunna vara att bättre alternativ till att mäta snödensitet manuellt. Undersökningen utfördes i applikationen Regression Learner i MATLAB genom att testa fem olika förhandsgjorda modeller vilka var Linear Regression, GPR Matern 5/2, SVM Medium Gaussian, Wide neural network och Trilayered neural network. Projektet inkluderar även datainsamling, städning av data, datamodifiering, datagenerering, träning, testning och evaluering av modellerna. Resultaten visar att lufttemperaturen och vindhastigheten över lag är viktigast för modellerna och att GPR Matern 5/2 samt Wide neural network presterade bäst. Slutligen kunde man argumentera för att maskininlärningsmodellen är ett bättre alternativ till att mäta snödensitet manuellt.
|
450 |
Utilizing Genetic Algorithm and Machine Learning to Optimize a Control System in Generators : Using a PID controller to damp terminal voltage oscillationsStrand, Fredrik January 2022 (has links)
Hydropower is an important part of renewable power production in Sweden. The voltage stability of the already existing hydropower needs to be improved. One way to do this is by improving the control system that damp terminal voltage oscillations. If the oscillations in the power system are not damped it could lead to lower power outputs or in the worst case a blackout. This thesis focuses on the automatic voltage regulator (AVR) system with a proportional, integral, derivative (PID) controller. The PID controller’s parameters are optimized to dampen the terminal voltage instability in a generator. The aim is to develop a machine learning model that predicts the optimal gain parameters for a PID controller. The model is using the tuned gains from the Ziegler-Nichols (Z-N) method and the amplifier gain as inputs and gives the optimal gains as output. A linearized model of an AVR system, based on transfer functions was developed in a MATLAB script. This model was used to simulate the behaviours of an AVR system when a change in load occurs. The Z-N method and the genetic algorithm (GA) with different settings and fitness functions were used to tune a PID controller. The best performing method is GA with the fitness function developed by Zwe-Lee Gaing (ZL). The best performing settings are: roulette selection, adapt feasible mutation, and arithmetic crossover. The GA (ZL) was used in the development of a machine learning model. Two different models were developed and tested: the support vector regression (SVR) and the gaussian process regression (GPR). The data that was used to train the models were generated by changing the transfer functions’ time constants 4096 times. At each step, the Z-N, and the GA (ZL) were run. The GPR model is shown to be the superior model with a lower root mean square error (RMSE) and a higher ratio of variation (R^2). The RMSE for GPR is 0.1091, 0.0815, 0.0717 and the R^2 is 87 %, 59 %, and 86%. The result shows that the developed model has capabilities to optimize the PID controller gains of any AVR-system without knowing the characteristics of the components.
|
Page generated in 0.0266 seconds