Global ETD Search

11	Tuning and Optimising Concept Drift Detection Do, Ethan Quoc-Nam January 2021 (has links) Data drifts naturally occur in data streams due to seasonality, change in data usage, and the data generation process. Concepts modelled via the data streams will also experience such drift. The problem of differentiating concept drift from anomalies is important to identify normal vs abnormal behaviour. Existing techniques achieve poor responsiveness and accuracy towards this differentiation task. We take two approaches to address this problem. First, we extend an existing sliding window algorithm to include multiple windows to model recently seen data stream patterns, and define new parameters to compare the data streams. Second, we study a set of optimisers and tune a Bi-LSTM model parameters to maximize accuracy. / Thesis / Master of Applied Science (MASc) concept drift anomaly detection concept drift detection
12	Spatio-Temporal Anomaly Detection Das, Mahashweta January 2009 (has links) No description available. Computer Science Anomaly Detection Spatio-Temporal Mining
13	Building trustworthy machine learning systems in adversarial environments Wang, Ning 26 May 2023 (has links) Modern AI systems, particularly with the rise of big data and deep learning in the last decade, have greatly improved our daily life and at the same time created a long list of controversies. AI systems are often subject to malicious and stealthy subversion that jeopardizes their efficacy. Many of these issues stem from the data-driven nature of machine learning. While big data and deep models significantly boost the accuracy of machine learning models, they also create opportunities for adversaries to tamper with models or extract sensitive data. Malicious data providers can compromise machine learning systems by supplying false data and intermediate computation results. Even a well-trained model can be deceived to misbehave by an adversary who provides carefully designed inputs. Furthermore, curious parties can derive sensitive information of the training data by interacting with a machine-learning model. These adversarial scenarios, known as poisoning attack, adversarial example attack, and inference attack, have demonstrated that security, privacy, and robustness have become more important than ever for AI to gain wider adoption and societal trust. To address these problems, we proposed the following solutions: (1) FLARE, which detects and mitigates stealthy poisoning attacks by leveraging latent space representations; (2) MANDA, which detects adversarial examples by utilizing evaluations from diverse sources, i.e, model-based prediction and data-based evaluation; (3) FeCo which enhances the robustness of machine learning-based network intrusion detection systems by introducing a novel representation learning method; and (4) DP-FedMeta, which preserves data privacy and improves the privacy-accuracy trade-off in machine learning systems through a novel adaptive clipping mechanism. / Doctor of Philosophy / Over the past few decades, machine learning (ML) has become increasingly popular for enhancing efficiency and effectiveness in data analytics and decision-making. Notable applications include intelligent transportation, smart healthcare, natural language generation, intrusion detection, etc. While machine learning methods are often employed for beneficial purposes, they can also be exploited for malicious intents. Well-trained language models have demonstrated generalizability deficiencies and intrinsic biases; generative ML models used for creating art have been repurposed by fraudsters to produce deepfakes; and facial recognition models trained on big data have been found to leak sensitive information about data owners. Many of these issues stem from the data-driven nature of machine learning. While big data and deep models significantly improve the accuracy of ML models, they also enable adversaries to corrupt models and infer sensitive data. This leads to various adversarial attacks, such as model poisoning during training, adversarially crafted data in testing, and data inference. It is evident that security, privacy, and robustness have become more important than ever for AI to gain wider adoption and societal trust. This research focuses on building trustworthy machine-learning systems in adversarial environments from a data perspective. It encompasses two themes: securing ML systems against security or privacy vulnerabilities (security of AI) and using ML as a tool to develop novel security solutions (AI for security). For the first theme, we studied adversarial attack detection in both the training and testing phases and proposed FLARE and MANDA to secure matching learning systems in the two phases, respectively. Additionally, we proposed a privacy-preserving learning system, dpfed, to defend against privacy inference attacks. We achieved a good trade-off between accuracy and privacy by proposing an adaptive data clipping and perturbing method. In the second theme, the research is focused on enhancing the robustness of intrusion detection systems through data representation learning. adversarial machine learning anomaly detection differential privacy
14	Semi-supervised and Self-evolving Learning Algorithms with Application to Anomaly Detection in Cloud Computing Pannu, Husanbir Singh 12 1900 (has links) Semi-supervised learning (SSL) is the most practical approach for classification among machine learning algorithms. It is similar to the humans way of learning and thus has great applications in text/image classification, bioinformatics, artificial intelligence, robotics etc. Labeled data is hard to obtain in real life experiments and may need human experts with experimental equipments to mark the labels, which can be slow and expensive. But unlabeled data is easily available in terms of web pages, data logs, images, audio, video les and DNA/RNA sequences. SSL uses large unlabeled and few labeled data to build better classifying functions which acquires higher accuracy and needs lesser human efforts. Thus it is of great empirical and theoretical interest. We contribute two SSL algorithms (i) adaptive anomaly detection (AAD) (ii) hybrid anomaly detection (HAD), which are self evolving and very efficient to detect anomalies in a large scale and complex data distributions. Our algorithms are capable of modifying an existing classier by both retiring old data and adding new data. This characteristic enables the proposed algorithms to handle massive and streaming datasets where other existing algorithms fail and run out of memory. As an application to semi-supervised anomaly detection and for experimental illustration, we have implemented a prototype of the AAD and HAD systems and conducted experiments in an on-campus cloud computing environment. Experimental results show that the detection accuracy of both algorithms improves as they evolves and can achieve 92.1% detection sensitivity and 83.8% detection specificity, which makes it well suitable for anomaly detection in large and streaming datasets. We compared our algorithms with two popular SSL methods (i) subspace regularization (ii) ensemble of Bayesian sub-models and decision tree classifiers. Our contributed algorithms are easy to implement, significantly better in terms of space, time complexity and accuracy than these two methods for semi-supervised anomaly detection mechanism. Machine learning anomaly detection cloud computing
15	Threat Detection in Program Execution and Data Movement: Theory and Practice Shu, Xiaokui 25 June 2016 (has links) Program attacks are one of the oldest and fundamental cyber threats. They compromise the confidentiality of data, the integrity of program logic, and the availability of services. This threat becomes even severer when followed by other malicious activities such as data exfiltration. The integration of primitive attacks constructs comprehensive attack vectors and forms advanced persistent threats. Along with the rapid development of defense mechanisms, program attacks and data leak threats survive and evolve. Stealthy program attacks can hide in long execution paths to avoid being detected. Sensitive data transformations weaken existing leak detection mechanisms. New adversaries, e.g., semi-honest service provider, emerge and form threats. This thesis presents theoretical analysis and practical detection mechanisms against stealthy program attacks and data leaks. The thesis presents a unified framework for understanding different branches of program anomaly detection and sheds light on possible future program anomaly detection directions. The thesis investigates modern stealthy program attacks hidden in long program executions and develops a program anomaly detection approach with data mining techniques to reveal the attacks. The thesis advances network-based data leak detection mechanisms by relaxing strong requirements in existing methods. The thesis presents practical solutions to outsource data leak detection procedures to semi-honest third parties and identify noisy or transformed data leaks in network traffic. / Ph. D. Cybersecurity Program Anomaly Detection Data Leak Detection
16	Discovery of Triggering Relations and Its Applications in Network Security and Android Malware Detection Zhang, Hao 30 November 2015 (has links) An increasing variety of malware, including spyware, worms, and bots, threatens data confidentiality and system integrity on computing devices ranging from backend servers to mobile devices. To address these threats, exacerbated by dynamic network traffic patterns and growing volumes, network security has been undergoing major changes to improve accuracy and scalability in the security analysis techniques. This dissertation addresses the problem of detecting the network anomalies on a single device by inferring the traffic dependence to ensure the root-triggers. In particular, we propose a dependence model for illustrating the network traffic causality. This model depicts the triggering relation of network requests, and thus can be used to reason about the occurrences of network events and pinpoint stealthy malware activities. The triggering relationships can be inferred by means of both rule-based and learning-based approaches. The rule-based approach originates from several heuristic algorithms based on the domain knowledge. The learning-based approach discovers the triggering relationship using a pairwise comparison operation that converts the requests into event pairs with comparable attributes. Machine learning classifiers predict the triggering relationship and further reason about the legitimacy of requests by enforcing their root-triggers. We apply our dependence model on the network traffic from a single host and a mobile device. Evaluated with real-world malware samples and synthetic attacks, our findings confirm that the traffic dependence model provides a significant source of semantic and contextual information that detects zero-day malicious applications. This dissertation also studies the usability of visualizing the traffic causality for domain experts. We design and develop a tool with a visual locality property. It supports different levels of visual based querying and reasoning required for the sensemaking process on complex network data. The significance of this dissertation research is in that it provides deep insights on the dependency of network requests, and leverages structural and semantic information, allowing us to reason about network behaviors and detect stealthy anomalies. / Ph. D. Network Security Stealthy Malware Anomaly Detection
17	Anomaly detection techniques for unsupervised machine learning Iivari, Albin January 2022 (has links) Anomalies in data can be of great importance as they often indicate faulty behaviour. Locating these can thus assist in finding the source of the issue. Isolation Forest, an unsupervised machine learning model used to detect anomalies, is evaluated against two other commonly used models. The data set used were log files from a company named Trimma. The log files contained information about different events that executed. Different types of event could differ in execution time. The models were then used to find logs where some event took longer than usual to execute. The feature created for the models was a percentual difference from the median of each job type. The comparison made on various data set sizes, using one feature, showed that Isolation Forest did not perform the best with regard to execution time among the models. Isolation Forest classified similar data points compared to the other models. However, the smallest classified anomaly differed a bit from the other models. This discrepancy was only seen in the smaller anomalies, the larger deviations were consistently classified as anomalies by all models. Machine learning unsupervised anomaly detection anomaly detection in logfiles isolation forest cblof Computer Sciences Datavetenskap (datalogi)
18	Botnet detection techniques: review, future trends, and issues Karim, A., Bin Salleh, R., Shiraz, M., Shah, S.A.A., Awan, Irfan U., Anuar, N.B. January 2014 (has links) No / In recent years, the Internet has enabled access to widespread remote services in the distributed computing environment; however, integrity of data transmission in the distributed computing platform is hindered by a number of security issues. For instance, the botnet phenomenon is a prominent threat to Internet security, including the threat of malicious codes. The botnet phenomenon supports a wide range of criminal activities, including distributed denial of service (DDoS) attacks, click fraud, phishing, malware distribution, spam emails, and building machines for illegitimate exchange of information/materials. Therefore, it is imperative to design and develop a robust mechanism for improving the botnet detection, analysis, and removal process. Currently, botnet detection techniques have been reviewed in different ways; however, such studies are limited in scope and lack discussions on the latest botnet detection techniques. This paper presents a comprehensive review of the latest state-of-the-art techniques for botnet detection and figures out the trends of previous and current research. It provides a thematic taxonomy for the classification of botnet detection techniques and highlights the implications and critical aspects by qualitatively analyzing such techniques. Related to our comprehensive review, we highlight future directions for improving the schemes that broadly span the entire botnet detection research field and identify the persistent and prominent research challenges that remain open. / University of Malaya, Malaysia (No. FP034-2012A) Botnet detection ; Anomaly detection ; Network security ; Attack ; Defense ; Taxonomy ; Remote-control behavior ; Anomaly detection ; Command ; Dark
19	AUTOMATED HEALTH OPERATIONS FOR THE SAPPHIRE SPACECRAFT Swartwout, Michael A., Kitts, Christopher A. 10 1900 (has links) International Telemetering Conference Proceedings / October 27-30, 1997 / Riviera Hotel and Convention Center, Las Vegas, Nevada / Stanford’s Space Systems Development Laboratory is developing methods for automated spacecraft health operations. Such operations greatly reduce the need for ground-space communication links and full-time operators. However, new questions emerge about how to supply operators with the spacecraft information that is no longer available. One solution is to introduce a low-bandwidth health beacon and to develop new approaches in on-board summarization of health data for telemetering. This paper reviews the development of beacon operations and data summary, describes the implementation of beacon-based health management on board SAPPHIRE, and explains the mission operations response to health emergencies. Additional information is provided on the role of SSDL’s academic partners in developing a worldwide network of beacon receiving stations. Health management Beacon Spacecraft operations Data summary Anomaly detection Automation
20	Exploitation of signal information for mobile speed estimation and anomaly detection Afgani, Mostafa Z. January 2011 (has links) Although the primary purpose of the signal received by amobile handset or smartphone is to enable wireless communication, the information extracted can be reused to provide a number of additional services. Two such services discussed in this thesis are: mobile speed estimation and signal anomaly detection. The proposed algorithms exploit the propagation environment specific information that is already imprinted on the received signal and therefore do not incur any additional signalling overhead. Speed estimation is useful for providing navigation and location based services in areas where global navigation satellite systems (GNSS) based devices are unusable while the proposed anomaly detection algorithms can be used to locate signal faults and aid spectrum sensing in cognitive radio systems. The speed estimation algorithms described within this thesis require a receiver with at least two antenna elements and a wideband radio frequency (RF) signal source. The channel transfer function observed at the antenna elements are compared to yield an estimate of the device speed. The basic algorithm is a one-dimensional and unidirectional two-antenna solution. The speed of the mobile receiver is estimated from a knowledge of the fixed inter-antenna distance and the time it takes for the trailing antenna to sense similar channel conditions previously observed at the leading antenna. A by-product of the algorithm is an environment specific spatial correlation function which may be combined with theoretical models of spatial correlation to extend and improve the accuracy of the algorithm. Results obtained via computer simulations are provided. The anomaly detection algorithms proposed in this thesis highlight unusual signal features while ignoring events that are nominal. When the test signal possesses a periodic frame structure, Kullback-Leibler divergence (KLD) analysis is employed to statistically compare successive signal frames. A method of automatically extracting the required frame period information from the signal is also provided. When the signal under test lacks a periodic frame structure, information content analysis of signal events can be used instead. Clean training data is required by this algorithm to initialise the reference event probabilities. In addition to the results obtained from extensive computer simulations, an architecture for field-programmable gate array (FPGA) based hardware implementations of the KLD based algorithm is provided. Results showing the performance of the algorithms against real test signals captured over the air are also presented. Both sets of algorithms are simple, effective and have low computational complexity – implying that real-time implementations on platforms with limited processing power and energy are feasible. This is an important quality since location based services are expected to be an integral part of next generation cognitive radio handsets. 621.382

Search results