111 |
Semi-supervised learning of bitmask pairs for an anomaly-based intrusion detection systemArdolino, Kyle R. January 2008 (has links)
Thesis (M.S.)--State University of New York at Binghamton, Thomas J. Watson School of Engineering and Applied Science, Department of Electrical Engineering, 2008. / Includes bibliographical references.
|
112 |
Applications of GUI usage analysisImsand, Eric Shaun. Hamilton, John A., January 2008 (has links) (PDF)
Thesis (Ph. D.)--Auburn University, 2008. / Abstract. Includes bibliographical references (p. 119-122).
|
113 |
Anomaly-based intrusion detection using using lightweight stateless payload inspectionNwanze, Nnamdi Chike. January 2009 (has links)
Thesis (Ph. D.)--State University of New York at Binghamton, Thomas J. Watson School of Engineering and Applied Science, Department of Electrical and Computer Engineering, 2009. / Includes bibliographical references.
|
114 |
On Learning from Collective DataXiong, Liang 01 December 2013 (has links)
In many machine learning problems and application domains, the data are naturally organized by groups. For example, a video sequence is a group of images, an image is a group of patches, a document is a group of paragraphs/words, and a community is a group of people. We call them the collective data. In this thesis, we study how and what we can learn from collective data. Usually, machine learning focuses on individual objects, each of which is described by a feature vector and studied as a point in some metric space. When approaching collective data, researchers often reduce the groups into vectors to which traditional methods can be applied. We, on the other hand, will try to develop machine learning methods that respect the collective nature of data and learn from them directly. Several different approaches were taken to address this learning problem. When the groups consist of unordered discrete data points, it can naturally be characterized by its sufficient statistics – the histogram. For this case we develop efficient methods to address the outliers and temporal effects in the data based on matrix and tensor factorization methods. To learn from groups that contain multi-dimensional real-valued vectors, we develop both generative methods based on hierarchical probabilistic models and discriminative methods using group kernels based on new divergence estimators. With these tools, we can accomplish various tasks such as classification, regression, clustering, anomaly detection, and dimensionality reduction on collective data. We further consider the practical side of the divergence based algorithms. To reduce their time and space requirements, we evaluate and find methods that can effectively reduce the size of the groups with little impact on the accuracy. We also proposed the conditional divergence along with an efficient estimator in order to correct the sampling biases that might be present in the data. Finally, we develop methods to learn in cases where some divergences are missing, caused by either insufficient computational resources or extreme sampling biases. In addition to designing new learning methods, we will use them to help the scientific discovery process. In our collaboration with astronomers and physicists, we see that the new techniques can indeed help scientists make the best of data.
|
115 |
Data gathering and anomaly detection in wireless sensors networks / Collecte de données et détection d’anomalies dans les réseaux de capteurs sans filMoussa, Mohamed Ali 10 November 2017 (has links)
L'utilisation des réseaux de capteurs sans fil (WSN) ne cesse d'augmenter au point de couvrir divers domaines et applications. Cette tendance est supportée par les avancements techniques achevés dans la conception des capteurs, qui ont permis de réduire le coût ainsi que la taille de ces composants. Toutefois, il reste plusieurs défis qui font face au déploiement et au bon fonctionnement de ce type de réseaux et qui parviennent principalement de la limitation des ressources de capteurs ainsi de l'imperfection des données collectées. Dans cette thèse, on adresse le problème de collecte de données et de détection d'anomalies dans les réseaux de capteurs. Nous visons à assurer ces deux fonctionnalités tout en économisant l'utilisation des ressources de capteurs et en prolongeant la durée de vie de réseaux. Tout au long de ce travail, nous présentons plusieurs solutions qui permettent une collecte efficace de données de capteurs ainsi que une bonne détection des éventuelles anomalies. Dans notre première contribution, nous décrivons une solution basée sur la technique Compressive Sensing (CS) qui permet d'équilibrer le trafic transmis par les nœuds dans le réseau. Notre approche diffère des solutions existantes par la prise en compte de la corrélation temporelle ainsi que spatiale dans le processus de décompression des données. De plus, nous proposons une nouvelle formulation pour détecter les anomalies. Les simulations réalisées sur des données réelles prouvent l'efficacité de notre approche en termes de reconstruction de données et de détection d'anomalies par rapport aux approches existantes. Pour mieux optimiser l'utilisation des ressources de WSNs, nous proposons dans une deuxième contribution une solution de collecte de données et de détection d'anomalies basée sur la technique Matrix Completion (MC) qui consiste à transmettre un sous ensemble aléatoire de données de capteurs. Nous développons un algorithme qui estime les mesures manquantes en se basant sur plusieurs propriétés des données. L'algorithme développé permet également de dissimuler les anomalies de la structure normale des données. Cette solution est améliorée davantage dans notre troisième contribution, où nous proposons une formulation différente du problème de collecte de données et de détection d'anomalies. Nous reformulons les connaissances a priori sur les données cibles par des contraintes convexes. Ainsi, les paramètres impliqués dans l'algorithme développé sont liés a certaines propriétés physiques du phénomène observé et sont faciles à ajuster. Nos deux approches montrent de bonnes performances en les simulant sur des données réelles. Enfin, nous proposons dans la dernière contribution une nouvelle technique de collecte de données qui consiste à envoyer que les positions les plus importantes dans la représentation parcimonieuse des données uniquement. Nous considérons dans cette approche le bruit qui peut s'additionner aux données reçues par le nœud collecteur. Cette solution permet aussi de détecter les pics dans les mesures prélevées. En outre, nous validons l'efficacité de notre solution par une analyse théorique corroborée par des simulations sur des données réelles / The use of Wireless Sensor Networks (WSN)s is steadily increasing to cover various applications and domains. This trend is supported by the technical advancements in sensor manufacturing process which allow a considerable reduction in the cost and size of these components. However, there are several challenges facing the deployment and the good functioning of this type of networks. Indeed, WSN's applications have to deal with the limited energy, memory and processing capacities of sensor nodes as well as the imperfection of the probed data. This dissertation addresses the problem of collecting data and detecting anomalies in WSNs. The aforementioned functionality needs to be achieved while ensuring a reliable data quality at the collector node, a good anomaly detection accuracy, a low false alarm rate as well as an efficient energy consumption solution. Throughout this work, we provide different solutions that allow to meet these requirements. Foremost, we propose a Compressive Sensing (CS) based solution that allows to equilibrate the traffic carried by nodes regardless their distance from the sink. This solution promotes a larger lifespan of the WSN since it balances the energy consumption between sensor nodes. Our approach differs from existing CS-based solutions by taking into account the sparsity of sensory representation in the temporal domain in addition to the spatial dimension. Moreover, we propose a new formulation to detect aberrant readings. The simulations carried on real datasets prove the efficiency of our approach in terms of data recovering and anomaly detection compared to existing solutions. Aiming to further optimize the use of WSN resources, we propose in our second contribution a Matrix Completion (MC) based data gathering and anomaly detection solution where an arbitrary subset of nodes contributes at the data gathering process at each operating period. To fill the missing values, we mainly relay on the low rank structure of sensory data as well as the sparsity of readings in some transform domain. The developed algorithm also allows to dissemble anomalies from the normal data structure. This solution is enhanced in our third contribution where we propose a constrained formulation of the data gathering and anomalies detection problem. We reformulate the textit{a prior} knowledge about the target data as hard convex constraints. Thus, the involved parameters into the developed algorithm become easy to adjust since they are related to some physical properties of the treated data. Both MC based approaches are tested on real datasets and demonstrate good capabilities in terms of data reconstruction quality and anomaly detection performance. Finally, we propose in the last contribution a position based compressive data gathering scheme where nodes cooperate to compute and transmit only the relevant positions of their sensory sparse representation. This technique provide an efficient tool to deal with the noisy nature of WSN environment as well as detecting spikes in the sensory data. Furthermore, we validate the efficiency of our solution by a theoretical analysis and corroborate it by a simulation evaluation
|
116 |
Concentration of measure, negative association, and machine learningRoot, Jonathan 07 December 2016 (has links)
In this thesis we consider concentration inequalities and the concentration
of measure phenomenon from a variety of angles. Sharp tail bounds on the deviation of Lipschitz functions of independent random variables about their mean are well known. We consider variations on this theme for dependent variables on the Boolean cube.
In recent years negatively associated probability distributions have been studied as potential generalizations of independent random variables. Results on this class of distributions have been sparse at best, even when restricting to the Boolean cube. We consider the class of negatively associated distributions topologically, as a subset of the general class of probability measures. Both the weak (distributional) topology and the total variation topology
are considered, and the simpler notion of negative correlation is investigated.
The concentration of measure phenomenon began with Milman's proof of Dvoretzky's theorem, and is therefore intimately connected to the field of high-dimensional convex geometry. Recently this field has found application in the area of compressed sensing. We consider these applications and in particular analyze the use of Gordon's min-max inequality in various compressed sensing frameworks, including the Dantzig selector and the matrix uncertainty selector.
Finally we consider the use of concentration inequalities in developing a theoretically sound anomaly detection algorithm. Our method uses a ranking procedure based on KNN graphs
of given data. We develop a max-margin learning-to-rank framework to train limited complexity models to imitate these KNN scores. The resulting anomaly detector is shown to be asymptotically optimal in that for any false alarm rate α, its decision region converges to the α-percentile minimum volume level set of the unknown
underlying density.
|
117 |
Signal Processing and Robust Statistics for Fault Detection in Photovoltaic ArraysJanuary 2012 (has links)
abstract: Photovoltaics (PV) is an important and rapidly growing area of research. With the advent of power system monitoring and communication technology collectively known as the "smart grid," an opportunity exists to apply signal processing techniques to monitoring and control of PV arrays. In this paper a monitoring system which provides real-time measurements of each PV module's voltage and current is considered. A fault detection algorithm formulated as a clustering problem and addressed using the robust minimum covariance determinant (MCD) estimator is described; its performance on simulated instances of arc and ground faults is evaluated. The algorithm is found to perform well on many types of faults commonly occurring in PV arrays. Among several types of detection algorithms considered, only the MCD shows high performance on both types of faults. / Dissertation/Thesis / M.S. Electrical Engineering 2012
|
118 |
DETECÇÃO DE INTRUSÃO ATRAVÉS DA ANÁLISE DE SÉRIES TEMPORAIS E CORRELAÇÃO DO TRÁFEGO DE REDE / INTRUSION DETECTION THROUGH TIME SERIES ANALYSIS AND NETWORK TRAFFIC CORRELATIONVogt, Francisco Carlos 09 December 2012 (has links)
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / This work presents a model to identify anomalies in the computer network behavior
applied to the problem of traffic management and security information. Due to the
feature of the traffic growth, some models do not differ an anomaly from an attack,
generating false positives that damage the security and quality service of the network. In
order to present an alternative, this work explores ARIMA model that allows turning
stationary the time series and the CUSUM algorithm that allows to detect anomalies.
This approach provides a way to evaluate the behavior and identification of an anomaly
with better quality through the traffic variables and its correlations. The results
demonstrate the approach demands a careful step of variables selection that can have
influence by interest s attacks. / Este trabalho apresenta um modelo para identificação de anomalias no
comportamento da rede de computadores, aplicado ao problema de gestão do
tráfego de redes e segurança da informação. Devido à característica de
crescimento de tráfego, alguns modelos não diferenciam anomalias de um
ataque, gerando falsos positivos prejudiciais a segurança da rede e
conseqüentemente a sua qualidade serviço. Com fim de apresentar uma
alternativa, este trabalho explora o modelo ARIMA, que permite tornar
estacionária a série temporal, e o algoritmo CUSUM, que permite detectar
anomalias. Esta abordagem possibilita avaliar com melhor qualidade o
comportamento e a identificação de uma anomalia a partir de variáveis
descritoras de tráfego e suas correlações. Os resultados demonstram que a
abordagem exige uma etapa criteriosa de seleção de variáveis que podem ser
influenciadas pelos ataques de interesse.
|
119 |
Energy Analytics for Infrastructure: An Application to Institutional BuildingsJanuary 2017 (has links)
abstract: Commercial buildings in the United States account for 19% of the total energy consumption annually. Commercial Building Energy Consumption Survey (CBECS), which serves as the benchmark for all the commercial buildings provides critical input for EnergyStar models. Smart energy management technologies, sensors, innovative demand response programs, and updated versions of certification programs elevate the opportunity to mitigate energy-related problems (blackouts and overproduction) and guides energy managers to optimize the consumption characteristics. With increasing advancements in technologies relying on the ‘Big Data,' codes and certification programs such as the American Society of Heating, Refrigerating and Air-Conditioning Engineers (ASHRAE), and the Leadership in Energy and Environmental Design (LEED) evaluates during the pre-construction phase. It is mostly carried out with the assumed quantitative and qualitative values calculated from energy models such as Energy Plus and E-quest. However, the energy consumption analysis through Knowledge Discovery in Databases (KDD) is not commonly used by energy managers to perform complete implementation, causing the need for better energy analytic framework.
The dissertation utilizes Interval Data (ID) and establishes three different frameworks to identify electricity losses, predict electricity consumption and detect anomalies using data mining, deep learning, and mathematical models. The process of energy analytics integrates with the computational science and contributes to several objectives which are to
1. Develop a framework to identify both technical and non-technical losses using clustering and semi-supervised learning techniques.
2. Develop an integrated framework to predict electricity consumption using wavelet based data transformation model and deep learning algorithms.
3. Develop a framework to detect anomalies using ensemble empirical mode decomposition and isolation forest algorithms.
With a thorough research background, the first phase details on performing data analytics on the demand-supply database to determine the potential energy loss reduction potentials. Data preprocessing and electricity prediction framework in the second phase integrates mathematical models and deep learning algorithms to accurately predict consumption. The third phase employs data decomposition model and data mining techniques to detect the anomalies of institutional buildings. / Dissertation/Thesis / Doctoral Dissertation Civil, Environmental and Sustainable Engineering 2017
|
120 |
Detecção de anomalias utilizando métodos paramétricos e múltiplos classificadores / Anomaly detection using parametric methods and multiple classifiersGabriel de Barros Paranhos da Costa 25 August 2014 (has links)
Anomalias ou outliers são exemplos ou grupo de exemplos que apresentam comportamento diferente do esperado. Na prática,esses exemplos podem representar doenças em um indivíduo ou em uma população, além de outros eventos como fraudes em operações bancárias e falhas em sistemas. Diversas técnicas existentes buscam identificar essas anomalias, incluindo adaptações de métodos de classificação e métodos estatísticos. Os principais desafios são o desbalanceamento do número de exemplos em cada uma das classes e a definição do comportamento normal associada à formalização de um modelo para esse comportamento. Nesta dissertação propõe-se a utilização de um novo espaço para realizar a detecção,esse espaço é chamado espaço de parâmetros. Um espaço de parâmetros é criado utilizando parâmetros estimados a partir da concatenação(encadeamento) de dois exemplos. Apresenta-se,então,um novo framework para realizar a detecção de anomalias através da fusão de detectores que utilizam fechos convexos em múltiplos espaços de parâmetros para realizar a detecção. O método é considerado um framework pois é possível escolher quais os espaços de parâmetros que serão utilizados pelo método de acordo como comportamento da base de dados alvo. Nesse trabalho utilizou-se,para experimentos,dois conjuntos de parâmetros(média e desvio padrão; média, variância, obliquidade e curtose) e os resultados obtidos foram comparados com alguns métodos comumente utilizados para detecção de anomalias. Os resultados atingidos foram comparáveis ou melhores aos obtidos pelos demais métodos. Além disso, acredita-se que a utilização de espaços de parâmetros cria uma grande flexibilidade do método proposto, já que o usuário pode escolher um espaço de parâmetros que se adeque a sua aplicação. Tanto a flexibilidade quanto a extensibilidade disponibilizada pelo espaço de parâmetros, em conjunto como bom desempenho do método proposto nos experimentos realizados, tornam atrativa a utilização de espaços de parâmetros e, mais especificamente, dos métodos apresentados na solução de problemas de detecção de anomalias. / Anomalies or outliers are examples or group of examples that have a behaviour different from the expected. These examples may represent diseases in individuals or populations,as well as other events such as fraud and failures in banking systems.Several existing techniques seek to identify these anomalies, including adaptations of classification methods, statistical methods and methods based on information theory. The main challenges are that the number of samples of each class is unbalanced, the cases when anomalies are disguised among normal samples and the definition of normal behaviour associated with the formalization of a model for this behaviour. In this dissertation,we propose the use of a new space to helpwith the detection task, this space is called parameter space. We also present a new framework to perform anomaly detection by using the fusion of convex hulls in multiple parameter spaces to perform the detection.The method is considered a framework because it is possible to choose which parameter spaces will be used by the method according to the behaviour of the target data set.For the experiments, two parameter spaces were used (mean and standard deviation; mean, variance, skewness and kurtosis) and the results were compared to some commonly used anomaly detection methods. The results achieved were comparable or better than those obtained by the other methods. Furthermore, we believe that a parameter space created great fexibility for the proposed method, since it allowed the user to choose a parameter space that best models the application. Both the flexibility and extensibility provided by the use of parameter spaces, together with the good performance achieved by the proposed method in the experiments, make parameter spaces and, more specifically, the proposed methods appealing when solving anomaly detection problems.
|
Page generated in 0.1101 seconds