201 |
Anomaly detection in user behavior of websites using Hierarchical Temporal Memories : Using Machine Learning to detect unusual behavior from users of a web service to quickly detect possible security hazards.Berger, Victor January 2017 (has links)
This Master's Thesis focuses on the recent Cortical Learn-ing Algorithm (CLA), designed for temporal anomaly detection. It is here applied to the problem of anomaly detec-tion in user behavior of web services, which is getting moreand more important in a network security context. CLA is here compared to more traditional state-of-the-art algorithms of anomaly detection: Hidden Markov Models (HMMs) and t-stide (an N-gram-based anomaly detector), which are among the few algorithms compatible withthe online processing constraint of this problem. It is observed that on the synthetic dataset used forthis comparison, CLA performs signicantly better thanthe other two algorithms in terms of precision of the detection. The two other algorithms don't seem to be able tohandle this task at all. It appears that this anomaly de-tection problem (outlier detection in short sequences overa large alphabet) is considerably different from what hasbeen extensively studied up to now.
|
202 |
Sensor modelling for anomaly detection in time series dataJALIL POUR, ZAHRA January 2022 (has links)
Mechanical devices in industriy are equipped with numerous sensors to capture thehealth state of the machines. The reliability of the machine’s health system depends on thequality of sensor data. In order to predict the health state of sensors, abnormal behaviourof sensors must be detected to avoid unnecessary cost.We proposed LSTM autoencoder in which the objective is to reconstruct input time seriesand predict the next time instance based on historical data, and we evaluate anomaliesin multivariate time series via reconstructed error. We also used exponential moving averageas a preprocessing step to smooth the trend of time series to remove high frequencynoise and low frequency deviation in multivariate time series data.Our experiment results, based on different datasets of multivariate time series of gasturbines, demonstrate that the proposed model works well for injected anomalies and realworld data to detect the anomaly. The accuracy of the model under 5 percent infectedanomalies is 98.45%.
|
203 |
Pattern-of-life extraction and anomaly detection using GMTI dataLiu, Tsa Chun January 2019 (has links)
Ground Moving Target Indicator (GMTI) uses the concept of airborne surveillance of moving ground objects to observe and take actions accordingly. This concept was established in the late 20th century and was put to test during the Gulf War to observe enemy movement on the other side of the mountain. During the war, due to limitations of technology, information such as enemy movement were usually observed through human readings. With the improvement of surveillance technology, tracking individual target became possible, which allows the extraction of useful features for advance usage. Such features, known as tracks, are the results of GMTI tracking. Although the quality of the tracker plays a crucial role in the system performance of this paper, the development of the tracker is not discussed in this paper. The developed system will use simulated ideal GMTI tracks as input dataset.
This paper presents an end-to-end system that includes Anomaly GMTI (AGMTI) track simulation, Pattern of Life (PoL) extraction and Anomaly Detection System (ADS). All the subsystems (AGMTI, PoL and ADS) are independent of each other, so they can either be replaced or disabled to resemble different real-world scenarios. The results from AGMTI will provide inputs for the rest of the subsystems. The results from PoL extraction will be used to improve the performance of ADS. The proposed ADS is a semi-supervised learning detection system in which the system takes prior information to support and improve detection performance, but will still operate without prior information.
The AGMTI tracks simulator will be simulated with an open-sourced software called Simulation of Urban Traffic (SUMO). The AGMTI tracks simulator subsystem will make use of SUMO's API to generate normal and anomaly GMTI tracks. The PoL extraction will be accomplished by using various clustering algorithms and statistical functions. The ADS will use combination of various anomaly detection algorithms for different anomaly events including statistical approach using Gaussian Mixture Model Expectation Maximization (GMM-EM), Hidden Markov Model (HMM), graphical approach using Weiler-Atherton Polygon Clipping (WAPC) and various clustering algorithms such as K-means clustering, Spectral clustering and DBSCAN.
Finally, as extensions to the proposed system, this paper also presents Contextual Pattern of Life (CPoL) and Grouped Anomaly Detection. The CPoL is an extension to the PoL to enhance the quality and robustness of the extraction. The Grouped Anomaly is extension to both AGMTI track simulator and ADS to diversify the possible scenarios. The results from the ADS will be evaluated. Details of implementation will be provided so the system can be replicated. / Thesis / Master of Applied Science (MASc)
|
204 |
Early Anomaly Detection in Electrical Bushings Manufacturing at Hitachi EnergyQuintero Suárez, Felipe January 2022 (has links)
The manufacturing of electrical bushings for high voltages is complicated and highly demanding technology-wise. This process has more than 10 steps where a single mistake in the chain could cause a complete failure of the final product. A faulty bushing represents high costs to the company both economically and in terms of public image. Nowadays, fault detection is corrective-oriented, which means that there is low traceability on where the problem happens, and it is only detected once the final product is tested. This thesis aims to test a machine learning tool from Imagimob® to determine if is possible to detect faults during the manufacturing process using the existing captured data. To perform the test, a sample from 2019 was taken where the production of the bushings reached a 60% scrap rate. A deep-learning neural network with a 2D convolutional layer was implemented. The outcome of the system showed an efficiency of 80%. However, due to the complexity of the bushing manufacturing process, the few data samples, and the addition of different factors that can result in a faulty bushing, a range of probability is set depending on the number of anomalies detected. With such validation, the tool can label 18% of the bushings as surely faulty, and 27% as most likely faulty. The limitation of the tool is that the information must be analyzed after each step is done, and not continuously. Hence further research should be carried out on implementing a real-time tool. / Tillverkningen av elektriska genomföringar för högspänning är komplicerad och mycket krävande teknikmässigt. Denna process har mer än 10 steg där ett enda misstag i kedjan kan orsaka ett fullständigt misslyckande i slutprodukten. En felaktig genomföring innebär höga kostnader för företaget både ekonomiskt och motverkar en god image. Nuförtiden är feldetekteringen korrigerande-orienterad, det betyder att det är låg spårbarhet på var problemet uppstår och upptäcks först när slutprodukten testas. Syftet med detta examensarbete är att testa ett maskininlärningsverktyg från Imagimob® för att avgöra om det är möjligt att upptäcka fel under tillverkningsprocessen med hjälp av befintliga insamlade data. För att utföra testet togs ett prov från 2019 där produktionen av genomföringar nådde 60 % skrotmängd. Ett djupt lärande-neuralt nätverk med 2D-konvolutionelt lager implementerades. Det slutliga resultatet av systemet visade en effektivitet på 80 %. På grund av komplexiteten i tillverkningsprocessen för genomföringarna, de få datapunkterna och tillägget av olika faktorer som kan resultera i en felaktig genomföring, ställs ett sannolikhetsområde in beroende på antalet upptäckta avvikelser. Med en sådan validering kan verktyget markera 18 % av genomföringarna som säkert felaktiga och 27 % som troligen felaktiga. Begränsningen med verktyget är att informationen måste analyseras efter att varje steg är gjort, och inte kontinuerligt, därför bör ytterligare forskning göras för att implementera ett realtidsverktyg.
|
205 |
Deep Quantile Regression for Unsupervised Anomaly Detection in Time-SeriesTambuwal, Ahmad I., Neagu, Daniel 18 November 2021 (has links)
Yes / Time-series anomaly detection receives increasing research interest given the growing number of data-rich application domains. Recent additions to anomaly detection methods in research literature include deep neural networks (DNNs: e.g., RNN, CNN, and Autoencoder). The nature and performance of these algorithms in sequence analysis enable them to learn hierarchical discriminative features and time-series temporal nature. However, their performance is affected by usually assuming a Gaussian distribution on the prediction error, which is either ranked, or threshold to label data instances as anomalous or not. An exact parametric distribution is often not directly relevant in many applications though. This will potentially produce faulty decisions from false anomaly predictions due to high variations in data interpretation. The expectations are to produce outputs characterized by a level of confidence. Thus, implementations need the Prediction Interval (PI) that quantify the level of uncertainty associated with the DNN point forecasts, which helps in making better-informed decision and mitigates against false anomaly alerts. An effort has been made in reducing false anomaly alerts through the use of quantile regression for identification of anomalies, but it is limited to the use of quantile interval to identify uncertainties in the data. In this paper, an improve time-series anomaly detection method called deep quantile regression anomaly detection (DQR-AD) is proposed. The proposed method go further to used quantile interval (QI) as anomaly score and compare it with threshold to identify anomalous points in time-series data. The tests run of the proposed method on publicly available anomaly benchmark datasets demonstrate its effective performance over other methods that assumed Gaussian distribution on the prediction or reconstruction cost for detection of anomalies. This shows that our method is potentially less sensitive to data distribution than existing approaches. / Petroleum Technology Development Fund (PTDF) PhD Scholarship, Nigeria (Award Number: PTDF/ ED/PHD/IAT/884/16)
|
206 |
Anomaly Detection for Control CentersGyamfi, Cliff Oduro 06 1900 (has links)
The control center is a critical location in the power system infrastructure. Decisions regarding the power system’s operation and control are often made from the control center. These control actions are made possible through SCADA communication. This capability however makes the power system vulnerable to cyber attacks. Most of the decisions taken by the control center dwell on the measurement data received from substations. These measurements estimate the state of the power grid. Measurement-based cyber attacks have been well studied to be a major threat to control center operations. Stealthy false data injection attacks are known to evade bad data detection. Due to the limitations with bad data detection at the control center, a lot of approaches have been explored especially in the cyber layer to detect measurement-based attacks. Though helpful, these approaches do not look at the physical layer. This study proposes an anomaly detection system for the control center that operates on the laws of physics. The system also identifies the specific falsified measurement and proposes its estimated measurement value. / United States Department of Energy (DOE)
National Renewable Energy Laboratory (NREL) / Master of Science / Electricity is an essential need for human life. The power grid is one of the most important human inventions that fueled other technological innovations in the industrial revolution. Changing demands in usage have added to its operational complexity. Several modifications have been made to the power grid since its invention to make it robust and operationally safe. Integration of ICT has significantly improved the monitoring and operability of the power grid. Improvements through ICT have also exposed the power grid to cyber vulnerabilities. Since the power system is a critical infrastructure, there is a growing need to keep it secure and operable for the long run. The control center of the power system serves mainly as the decision-making hub of the grid. It operates through a communication link with the various dispersed devices and substations on the grid. This interconnection makes remote control and monitoring decisions possible from the control center. Data from the substations through the control center are also used in electricity markets and economic dispatch. The control center is however susceptible to cyber-attacks, particularly measurement-based attacks. When attackers launch measurement attacks, their goal is to force control actions from the control center that can make the system unstable. They make use of the vulnerabilities in the cyber layer to launch these attacks. They can inject falsified data packets through this link to usurp correct ones upon arrival at the control center. This study looks at an anomaly detection system that can detect falsified measurements at the control center. It will also indicate the specific falsified measurements and provide an estimated value for further analysis.
|
207 |
Network Anomaly Detection with Incomplete Audit DataPatcha, Animesh 04 October 2006 (has links)
With the ever increasing deployment and usage of gigabit networks, traditional network anomaly detection based intrusion detection systems have not scaled accordingly. Most, if not all, systems deployed assume the availability of complete and clean data for the purpose of intrusion detection. We contend that this assumption is not valid. Factors like noise in the audit data, mobility of the nodes, and the large amount of data generated by the network make it difficult to build a normal traffic profile of the network for the purpose of anomaly detection.
From this perspective, the leitmotif of the research effort described in this dissertation is the design of a novel intrusion detection system that has the capability to detect intrusions with high accuracy even when complete audit data is not available. In this dissertation, we take a holistic approach to anomaly detection to address the threats posed by network based denial-of-service attacks by proposing improvements in every step of the intrusion detection process. At the data collection phase, we have implemented an adaptive sampling scheme that intelligently samples incoming network data to reduce the volume of traffic sampled, while maintaining the intrinsic characteristics of the network traffic. A Bloom filters based fast flow aggregation scheme is employed at the data pre-processing stage to further reduce the response time of the anomaly detection scheme. Lastly, this dissertation also proposes an expectation-maximization algorithm based anomaly detection scheme that uses the sampled audit data to detect intrusions in the incoming network traffic. / Ph. D.
|
208 |
Evaluation of Scan Methods Used in the Monitoring of Public Health Surveillance DataFraker, Shannon E. 07 December 2007 (has links)
With the recent increase in the threat of biological terrorism as well as the continual risk of other diseases, the research in public health surveillance and disease monitoring has grown tremendously. There is an abundance of data available in all sorts of forms. Hospitals, federal and local governments, and industries are all collecting data and developing new methods to be used in the detection of anomalies. Many of these methods are developed, applied to a real data set, and incorporated into software. This research, however, takes a different view of the evaluation of these methods.
We feel that there needs to be solid statistical evaluation of proposed methods no matter the intended area of application. Using proof-by-example does not seem reasonable as the sole evaluation criteria especially concerning methods that have the potential to have a great impact in our lives. For this reason, this research focuses on determining the properties of some of the most common anomaly detection methods. A distinction is made between metrics used for retrospective historical monitoring and those used for prospective on-going monitoring with the focus on the latter situation. Metrics such as the recurrence interval and time-to-signal measures are therefore the most applicable. These metrics, in conjunction with control charts such as exponentially weighted moving average (EWMA) charts and cumulative sum (CUSUM) charts, are examined. Two new time-to-signal measures, the average time-between-signal events and the average signal event length, are introduced to better compare the recurrence interval with the time-to-signal properties of surveillance schemes. The relationship commonly thought to exist between the recurrence interval and the average time to signal is shown to not exist once autocorrelation is present in the statistics used for monitoring. This means that closer consideration needs to be paid to the selection of which of these metrics to report.
The properties of a commonly applied scan method are also studied carefully in the strictly temporal setting. The counts of incidences are assumed to occur independently over time and follow a Poisson distribution. Simulations are used to evaluate the method under changes in various parameters. In addition, there are two methods proposed in the literature for the calculation of the p-value, an adjustment based on the tests for previous time periods and the use of the recurrence interval with no adjustment for previous tests. The difference in these two methods is also considered. The quickness of the scan method in detecting an increase in the incidence rate as well as the number of false alarm events that occur and how long the method signals after the increase threat has passed are all of interest. These estimates from the scan method are compared to other attribute monitoring methods, mainly the Poisson CUSUM chart. It is shown that the Poisson CUSUM chart is typically faster in the detection of the increased incidence rate. / Ph. D.
|
209 |
Anomaly crowd movement detection using machinelearning techniquesLongberg, Victor January 2024 (has links)
This master’s thesis investigates the application of anomaly detection techniques to analyze crowdmovements using cell location data, a topic of growing interest in public safety and policymaking. Thisresearch uses machine learning algorithms, specifically Isolation Forest and DBSCAN, to identify unusualmovement patterns within a large, unlabeled dataset. The study addresses the challenges inherent inprocessing and analyzing vast amounts of spatial and temporal data through a comprehensive method-ology that includes data preprocessing, feature engineering, and optimizing algorithm parameters. Thefindings highlight the feasibility of employing anomaly detection in real-world scenarios, demonstratingthe algorithms’ ability to detect anomalies and offering insights into crowd dynamics.
|
210 |
On the Effectiveness of Dimensionality Reduction for Unsupervised Structural Health Monitoring Anomaly DetectionSoleimani-Babakamali, Mohammad Hesam 19 April 2022 (has links)
Dimensionality reduction techniques (DR) enhance data interpretability and reduce space complexity, though at the cost of information loss. Such methods have been prevalent in the Structural Health Monitoring (SHM) anomaly detection literature. While DR is favorable in supervised anomaly detection, where possible novelties are known a priori, the efficacy is less clear in unsupervised detection. In this work, we perform a detailed assessment of the DR performance trade-offs to determine whether the information loss imposed by DR can impact SHM performance for previously unseen novelties. As a basis for our analysis, we rely on an SHM anomaly detection method operating on input signals' fast Fourier transform (FFT). FFT is regarded as a raw, frequency-domain feature that allows studying various DR techniques. We design extensive experiments comparing various DR techniques, including neural autoencoder models, to capture the impact on two SHM benchmark datasets exclusively. Results imply the loss of information to be more detrimental, reducing the novelty detection accuracy by up to 60\% with autoencoder-based DR. Regularization can alleviate some of the challenges though unpredictable. Dimensions of substantial vibrational information mostly survive DR; thus, the regularization impact suggests that these dimensions are not reliable damage-sensitive features regarding unseen faults. Consequently, we argue that designing new SHM anomaly detection methods that can work with high-dimensional raw features is a necessary research direction and present open challenges and future directions. / M.S. / Structural health monitoring (SHM) aids the timely maintenance of infrastructures, saving human lives and natural resources. Infrastructure will undergo unseen damages in the future. Thus, data-driven SHM techniques for handling unlabeled data (i.e., unsupervised learning) are suitable for real-world usage. Lacking labels and defined data classes, data instances are categorized through similarities, i.e., distances. Still, distance metrics in high-dimensional spaces can become meaningless. As a result, applying methods to reduce data dimensions is currently practiced, yet, at the cost of information loss. Naturally, a trade-off exists between the loss of information and the increased interpretability of low-dimensional spaces induced by dimensionality reduction procedures. This study proposes an unsupervised SHM technique that works with low and high-dimensional data to assess that trade-off. Results show the negative impacts of dimensionality reduction to be more severe than its benefits. Developing unsupervised SHM methods with raw data is thus encouraged for real-world applications.
|
Page generated in 0.1203 seconds