• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 5
  • 2
  • 1
  • Tagged with
  • 9
  • 9
  • 7
  • 7
  • 4
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Featured anomaly detection methods and applications

Huang, Chengqiang January 2018 (has links)
Anomaly detection is a fundamental research topic that has been widely investigated. From critical industrial systems, e.g., network intrusion detection systems, to people’s daily activities, e.g., mobile fraud detection, anomaly detection has become the very first vital resort to protect and secure public and personal properties. Although anomaly detection methods have been under consistent development over the years, the explosive growth of data volume and the continued dramatic variation of data patterns pose great challenges on the anomaly detection systems and are fuelling the great demand of introducing more intelligent anomaly detection methods with distinct characteristics to cope with various needs. To this end, this thesis starts with presenting a thorough review of existing anomaly detection strategies and methods. The advantageous and disadvantageous of the strategies and methods are elaborated. Afterward, four distinctive anomaly detection methods, especially for time series, are proposed in this work aiming at resolving specific needs of anomaly detection under different scenarios, e.g., enhanced accuracy, interpretable results, and self-evolving models. Experiments are presented and analysed to offer a better understanding of the performance of the methods and their distinct features. To be more specific, the abstracts of the key contents in this thesis are listed as follows: 1) Support Vector Data Description (SVDD) is investigated as a primary method to fulfill accurate anomaly detection. The applicability of SVDD over noisy time series datasets is carefully examined and it is demonstrated that relaxing the decision boundary of SVDD always results in better accuracy in network time series anomaly detection. Theoretical analysis of the parameter utilised in the model is also presented to ensure the validity of the relaxation of the decision boundary. 2) To support a clear explanation of the detected time series anomalies, i.e., anomaly interpretation, the periodic pattern of time series data is considered as the contextual information to be integrated into SVDD for anomaly detection. The formulation of SVDD with contextual information maintains multiple discriminants which help in distinguishing the root causes of the anomalies. 3) In an attempt to further analyse a dataset for anomaly detection and interpretation, Convex Hull Data Description (CHDD) is developed for realising one-class classification together with data clustering. CHDD approximates the convex hull of a given dataset with the extreme points which constitute a dictionary of data representatives. According to the dictionary, CHDD is capable of representing and clustering all the normal data instances so that anomaly detection is realised with certain interpretation. 4) Besides better anomaly detection accuracy and interpretability, better solutions for anomaly detection over streaming data with evolving patterns are also researched. Under the framework of Reinforcement Learning (RL), a time series anomaly detector that is consistently trained to cope with the evolving patterns is designed. Due to the fact that the anomaly detector is trained with labeled time series, it avoids the cumbersome work of threshold setting and the uncertain definitions of anomalies in time series anomaly detection tasks.
2

Modèles statistiques non linéaires pour l'analyse de formes : application à l'imagerie cérébrale / Non-linear statistical models for shape analysis : application to brain imaging

Sfikas, Giorgos 07 September 2012 (has links)
Cette thèse a pour objet l'analyse statistique de formes, dans le contexte de l'imagerie médicale.Dans le champ de l'imagerie médicale, l'analyse de formes est utilisée pour décrire la variabilité morphologique de divers organes et tissus. Nous nous focalisons dans cette thèse sur la construction d'un modèle génératif et discriminatif, compact et non-linéaire, adapté à la représentation de formes.Ce modèle est évalué dans le contexte de l'étude d'une population de patients atteints de la maladie d'Alzheimer et d'une population de sujets contrôles sains. Notre intérêt principal ici est l'utilisationdu modèle discriminatif pour découvrir les différences morphologiques les plus discriminatives entre une classe de formes donnée et des formes n'appartenant pas à cette classe. L'innovation théorique apportée par notre modèle réside en deux points principaux : premièrement, nous proposons un outil pour extraire la différence discriminative dans le cadre Support Vector Data Description (SVDD) ; deuxièmement, toutes les reconstructions générées sont anatomiquementcorrectes. Ce dernier point est dû au caractère non-linéaire et compact du modèle, lié à l'hypothèse que les données (les formes) se trouvent sur une variété non-linéaire de dimension faible. Une application de notre modèle à des données médicales réelles montre des résultats cohérents avec les connaissances médicales. / This thesis addresses statistical shape analysis, in the context of medical imaging. In the field of medical imaging, shape analysis is used to describe the morphological variability of various organs and tissues. Our focus in this thesis is on the construction of a generative and discriminative, compact and non-linear model, suitable to the representation of shapes. This model is evaluated in the context of the study of a population of Alzheimer's disease patients and a population of healthy controls. Our principal interest here is using the discriminative model to discover morphological differences that are the most characteristic and discriminate best between a given shape class and forms not belonging in that class. The theoretical innovation of our work lies in two principal points first, we propose a tool to extract discriminative difference in the context of the Support Vector Data description (SVDD) framework ; second, all generated reconstructions are anatomicallycorrect. This latter point is due to the non-linear and compact character of the model, related to the hypothesis that the data (the shapes) lie on a low-dimensional, non-linear manifold. The application of our model on real medical data shows results coherent with well-known findings in related research.
3

Describing data patterns

Voß, Jakob 07 August 2013 (has links)
Diese Arbeit behandelt die Frage, wie Daten grundsätzlich strukturiert und beschrieben sind. Im Gegensatz zu vorhandenen Auseinandersetzungen mit Daten im Sinne von gespeicherten Beobachtungen oder Sachverhalten, werden Daten hierbei semiotisch als Zeichen aufgefasst. Diese Zeichen werden in Form von digitalen Dokumenten kommuniziert und sind mittels zahlreicher Standards, Formate, Sprachen, Kodierungen, Schemata, Techniken etc. strukturiert und beschrieben. Diese Vielfalt von Mitteln wird erstmals in ihrer Gesamtheit mit Hilfe der phenomenologischen Forschungsmethode analysiert. Ziel ist es dabei, durch eine genaue Erfahrung und Beschreibung von Mitteln zur Strukturierung und Beschreibung von Daten zum allgemeinen Wesen der Datenstrukturierung und -beschreibung vorzudringen. Die Ergebnisse dieser Arbeit bestehen aus drei Teilen. Erstens ergeben sich sechs Prototypen, die die beschriebenen Mittel nach ihrem Hauptanwendungszweck kategorisieren. Zweitens gibt es fünf Paradigmen, die das Verständnis und die Anwendung von Mitteln zur Strukturierung und Beschreibung von Daten grundlegend beeinflussen. Drittens legt diese Arbeit eine Mustersprache der Datenstrukturierung vor. In zwanzig Mustern werden typische Probleme und Lösungen dokumentiert, die bei der Strukturierung und Beschreibung von Daten unabhängig von konkreten Techniken immer wieder auftreten. Die Ergebnisse dieser Arbeit können dazu beitragen, das Verständnis von Daten --- das heisst digitalen Dokumente und ihre Metadaten in allen ihren Formen --- zu verbessern. Spezielle Anwendungsgebiete liegen unter Anderem in den Bereichen Datenarchäologie und Daten-Literacy. / Many methods, technologies, standards, and languages exist to structure and describe data. The aim of this thesis is to find common features in these methods to determine how data is actually structured and described. Existing studies are limited to notions of data as recorded observations and facts, or they require given structures to build on, such as the concept of a record or the concept of a schema. These presumed concepts have been deconstructed in this thesis from a semiotic point of view. This was done by analysing data as signs, communicated in form of digital documents. The study was conducted by a phenomenological research method. Conceptual properties of data structuring and description were first collected and experienced critically. Examples of such properties include encodings, identifiers, formats, schemas, and models. The analysis resulted in six prototypes to categorize data methods by their primary purpose. The study further revealed five basic paradigms that deeply shape how data is structured and described in practice. The third result consists of a pattern language of data structuring. The patterns show problems and solutions which occur over and over again in data, independent from particular technologies. Twenty general patterns were identified and described, each with its benefits, consequences, pitfalls, and relations to other patterns. The results can help to better understand data and its actual forms, both for consumption and creation of data. Particular domains of application include data archaeology and data literacy.
4

Novel Pattern Recognition Techniques for Improved Target Detection in Hyperspectral Imagery

Sakla, Wesam Adel 2009 December 1900 (has links)
A fundamental challenge in target detection in hyperspectral imagery is spectral variability. In target detection applications, we are provided with a pure target signature; we do not have a collection of samples that characterize the spectral variability of the target. Another problem is that the performance of stochastic detection algorithms such as the spectral matched filter can be detrimentally affected by the assumptions of multivariate normality of the data, which are often violated in practical situations. We address the challenge of lack of training samples by creating two models to characterize the target class spectral variability --the first model makes no assumptions regarding inter-band correlation, while the second model uses a first-order Markovbased scheme to exploit correlation between bands. Using these models, we present two techniques for meeting these challenges-the kernel-based support vector data description (SVDD) and spectral fringe-adjusted joint transform correlation (SFJTC). We have developed an algorithm that uses the kernel-based SVDD for use in full-pixel target detection scenarios. We have addressed optimization of the SVDD kernel-width parameter using the golden-section search algorithm for unconstrained optimization. We investigated a proper number of signatures N to generate for the SVDD target class and found that only a small number of training samples is required relative to the dimensionality (number of bands). We have extended decision-level fusion techniques using the majority vote rule for the purpose of alleviating the problem of selecting a proper value of s 2 for either of our target variability models. We have shown that heavy spectral variability may cause SFJTC-based detection to suffer and have addressed this by developing an algorithm that selects an optimal combination of the discrete wavelet transform (DWT) coefficients of the signatures for use as features for detection. For most scenarios, our results show that our SVDD-based detection scheme provides low false positive rates while maintaining higher true positive rates than popular stochastic detection algorithms. Our results also show that our SFJTC-based detection scheme using the DWT coefficients can yield significant detection improvement compared to use of SFJTC using the original signatures and traditional stochastic and deterministic algorithms.
5

Modelos de aprendizado supervisionado usando métodos kernel, conjuntos fuzzy e medidas de probabilidade / Supervised machine learning models using kernel methods, probability measures and fuzzy sets

Guevara Díaz, Jorge Luis 04 May 2015 (has links)
Esta tese propõe uma metodologia baseada em métodos de kernel, teoria fuzzy e probabilidade para tratar conjuntos de dados cujas observações são conjuntos de pontos. As medidas de probabilidade e os conjuntos fuzzy são usados para modelar essas observações. Posteriormente, graças a kernels definidos sobre medidas de probabilidade, ou em conjuntos fuzzy, é feito o mapeamento implícito dessas medidas de probabilidade, ou desses conjuntos fuzzy, para espaços de Hilbert com kernel reproduzível, onde a análise pode ser feita com algum método kernel. Usando essa metodologia, é possível fazer frente a uma ampla gamma de problemas de aprendizado para esses conjuntos de dados. Em particular, a tese apresenta o projeto de modelos de descrição de dados para observações modeladas com medidas de probabilidade. Isso é conseguido graças ao mergulho das medidas de probabilidade nos espaços de Hilbert, e a construção de esferas envolventes mínimas nesses espaços de Hilbert. A tese apresenta como esses modelos podem ser usados como classificadores de uma classe, aplicados na tarefa de detecção de anomalias grupais. No caso que as observações sejam modeladas por conjuntos fuzzy, a tese propõe mapear esses conjuntos fuzzy para os espaços de Hilbert com kernel reproduzível. Isso pode ser feito graças à projeção de novos kernels definidos sobre conjuntos fuzzy. A tese apresenta como esses novos kernels podem ser usados em diversos problemas como classificação, regressão e na definição de distâncias entre conjuntos fuzzy. Em particular, a tese apresenta a aplicação desses kernels em problemas de classificação supervisionada em dados intervalares e teste kernel de duas amostras para dados contendo atributos imprecisos. / This thesis proposes a methodology based on kernel methods, probability measures and fuzzy sets, to analyze datasets whose individual observations are itself sets of points, instead of individual points. Fuzzy sets and probability measures are used to model observations; and kernel methods to analyze the data. Fuzzy sets are used when the observation contain imprecise, vague or linguistic values. Whereas probability measures are used when the observation is given as a set of multidimensional points in a $D$-dimensional Euclidean space. Using this methodology, it is possible to address a wide range of machine learning problems for such datasets. Particularly, this work presents data description models when observations are modeled by probability measures. Those description models are applied to the group anomaly detection task. This work also proposes a new class of kernels, \\emph{the kernels on fuzzy sets}, that are reproducing kernels able to map fuzzy sets to a geometric feature spaces. Those kernels are similarity measures between fuzzy sets. We give from basic definitions to applications of those kernels in machine learning problems as supervised classification and a kernel two-sample test. Potential applications of those kernels include machine learning and patter recognition tasks over fuzzy data; and computational tasks requiring a similarity measure estimation between fuzzy sets.
6

Modèles statistiques non linéaires pour l'analyse de formes : application à l'imagerie cérébrale

Sfikas, Giorgos 07 September 2012 (has links) (PDF)
Cette thèse a pour objet l'analyse statistique de formes, dans le contexte de l'imagerie médicale.Dans le champ de l'imagerie médicale, l'analyse de formes est utilisée pour décrire la variabilité morphologique de divers organes et tissus. Nous nous focalisons dans cette thèse sur la construction d'un modèle génératif et discriminatif, compact et non-linéaire, adapté à la représentation de formes.Ce modèle est évalué dans le contexte de l'étude d'une population de patients atteints de la maladie d'Alzheimer et d'une population de sujets contrôles sains. Notre intérêt principal ici est l'utilisationdu modèle discriminatif pour découvrir les différences morphologiques les plus discriminatives entre une classe de formes donnée et des formes n'appartenant pas à cette classe. L'innovation théorique apportée par notre modèle réside en deux points principaux : premièrement, nous proposons un outil pour extraire la différence discriminative dans le cadre Support Vector Data Description (SVDD) ; deuxièmement, toutes les reconstructions générées sont anatomiquementcorrectes. Ce dernier point est dû au caractère non-linéaire et compact du modèle, lié à l'hypothèse que les données (les formes) se trouvent sur une variété non-linéaire de dimension faible. Une application de notre modèle à des données médicales réelles montre des résultats cohérents avec les connaissances médicales.
7

Modelos de aprendizado supervisionado usando métodos kernel, conjuntos fuzzy e medidas de probabilidade / Supervised machine learning models using kernel methods, probability measures and fuzzy sets

Jorge Luis Guevara Díaz 04 May 2015 (has links)
Esta tese propõe uma metodologia baseada em métodos de kernel, teoria fuzzy e probabilidade para tratar conjuntos de dados cujas observações são conjuntos de pontos. As medidas de probabilidade e os conjuntos fuzzy são usados para modelar essas observações. Posteriormente, graças a kernels definidos sobre medidas de probabilidade, ou em conjuntos fuzzy, é feito o mapeamento implícito dessas medidas de probabilidade, ou desses conjuntos fuzzy, para espaços de Hilbert com kernel reproduzível, onde a análise pode ser feita com algum método kernel. Usando essa metodologia, é possível fazer frente a uma ampla gamma de problemas de aprendizado para esses conjuntos de dados. Em particular, a tese apresenta o projeto de modelos de descrição de dados para observações modeladas com medidas de probabilidade. Isso é conseguido graças ao mergulho das medidas de probabilidade nos espaços de Hilbert, e a construção de esferas envolventes mínimas nesses espaços de Hilbert. A tese apresenta como esses modelos podem ser usados como classificadores de uma classe, aplicados na tarefa de detecção de anomalias grupais. No caso que as observações sejam modeladas por conjuntos fuzzy, a tese propõe mapear esses conjuntos fuzzy para os espaços de Hilbert com kernel reproduzível. Isso pode ser feito graças à projeção de novos kernels definidos sobre conjuntos fuzzy. A tese apresenta como esses novos kernels podem ser usados em diversos problemas como classificação, regressão e na definição de distâncias entre conjuntos fuzzy. Em particular, a tese apresenta a aplicação desses kernels em problemas de classificação supervisionada em dados intervalares e teste kernel de duas amostras para dados contendo atributos imprecisos. / This thesis proposes a methodology based on kernel methods, probability measures and fuzzy sets, to analyze datasets whose individual observations are itself sets of points, instead of individual points. Fuzzy sets and probability measures are used to model observations; and kernel methods to analyze the data. Fuzzy sets are used when the observation contain imprecise, vague or linguistic values. Whereas probability measures are used when the observation is given as a set of multidimensional points in a $D$-dimensional Euclidean space. Using this methodology, it is possible to address a wide range of machine learning problems for such datasets. Particularly, this work presents data description models when observations are modeled by probability measures. Those description models are applied to the group anomaly detection task. This work also proposes a new class of kernels, \\emph{the kernels on fuzzy sets}, that are reproducing kernels able to map fuzzy sets to a geometric feature spaces. Those kernels are similarity measures between fuzzy sets. We give from basic definitions to applications of those kernels in machine learning problems as supervised classification and a kernel two-sample test. Potential applications of those kernels include machine learning and patter recognition tasks over fuzzy data; and computational tasks requiring a similarity measure estimation between fuzzy sets.
8

Mahalanobis kernel-based support vector data description for detection of large shifts in mean vector

Nguyen, Vu 01 January 2015 (has links)
Statistical process control (SPC) applies the science of statistics to various process control in order to provide higher-quality products and better services. The K chart is one among the many important tools that SPC offers. Creation of the K chart is based on Support Vector Data Description (SVDD), a popular data classifier method inspired by Support Vector Machine (SVM). As any methods associated with SVM, SVDD benefits from a wide variety of choices of kernel, which determines the effectiveness of the whole model. Among the most popular choices is the Euclidean distance-based Gaussian kernel, which enables SVDD to obtain a flexible data description, thus enhances its overall predictive capability. This thesis explores an even more robust approach by incorporating the Mahalanobis distance-based kernel (hereinafter referred to as Mahalanobis kernel) to SVDD and compare it with SVDD using the traditional Gaussian kernel. Method's sensitivity is benchmarked by Average Run Lengths obtained from multiple Monte Carlo simulations. Data of such simulations are generated from multivariate normal, multivariate Student's (t), and multivariate gamma populations using R, a popular software environment for statistical computing. One case study is also discussed using a real data set received from Halberg Chronobiology Center. Compared to Gaussian kernel, Mahalanobis kernel makes SVDD and thus the K chart significantly more sensitive to shifts in mean vector, and also in covariance matrix.
9

Computer aided diagnosis of epilepsy lesions based on multivariate and multimodality data analysis / Recherche de biomarqueurs par l’analyse multivariée d’images paramétriques multimodales pour le bilan non-invasif préchirurgical de l’épilepsie focale pharmaco-résistante

El Azami, Meriem 23 September 2016 (has links)
Environ 150.000 personnes souffrent en France d'une épilepsie partielle réfractaire à tous les médicaments. La chirurgie, qui constitue aujourd’hui le meilleur recours thérapeutique nécessite un bilan préopératoire complexe. L'analyse de données d'imagerie telles que l’imagerie par résonance magnétique (IRM) anatomique et la tomographie d’émission de positons (TEP) au FDG (fluorodéoxyglucose) tend à prendre une place croissante dans ce protocole, et pourrait à terme limiter de recourir à l’électroencéphalographie intracérébrale (SEEG), procédure très invasive mais qui constitue encore la technique de référence. Pour assister les cliniciens dans leur tâche diagnostique, nous avons développé un système d'aide au diagnostic (CAD) reposant sur l'analyse multivariée de données d'imagerie. Compte tenu de la difficulté relative à la constitution de bases de données annotées et équilibrées entre classes, notre première contribution a été de placer l'étude dans le cadre méthodologique de la détection du changement. L'algorithme du séparateur à vaste marge adapté à ce cadre là (OC-SVM) a été utilisé pour apprendre, à partir de cartes multi-paramétriques extraites d'IRM T1 de sujets normaux, un modèle prédictif caractérisant la normalité à l'échelle du voxel. Le modèle permet ensuite de faire ressortir, dans les images de patients, les zones cérébrales suspectes s'écartant de cette normalité. Les performances du système ont été évaluées sur des lésions simulées ainsi que sur une base de données de patients. Trois extensions ont ensuite été proposées. D'abord un nouveau schéma de détection plus robuste à la présence de bruit d'étiquetage dans la base de données d'apprentissage. Ensuite, une stratégie de fusion optimale permettant la combinaison de plusieurs classifieurs OC-SVM associés chacun à une séquence IRM. Enfin, une généralisation de l'algorithme de détection d'anomalies permettant la conversion de la sortie du CAD en probabilité, offrant ainsi une meilleure interprétation de la sortie du système et son intégration dans le bilan pré-opératoire global. / One third of patients suffering from epilepsy are resistant to medication. For these patients, surgical removal of the epileptogenic zone offers the possibility of a cure. Surgery success relies heavily on the accurate localization of the epileptogenic zone. The analysis of neuroimaging data such as magnetic resonance imaging (MRI) and positron emission tomography (PET) is increasingly used in the pre-surgical work-up of patients and may offer an alternative to the invasive reference of Stereo-electro-encephalo -graphy (SEEG) monitoring. To assist clinicians in screening these lesions, we developed a computer aided diagnosis system (CAD) based on a multivariate data analysis approach. Our first contribution was to formulate the problem of epileptogenic lesion detection as an outlier detection problem. The main motivation for this formulation was to avoid the dependence on labelled data and the class imbalance inherent to this detection task. The proposed system builds upon the one class support vector machines (OC-SVM) classifier. OC-SVM was trained using features extracted from MRI scans of healthy control subjects, allowing a voxelwise assessment of the deviation of a test subject pattern from the learned patterns. System performance was evaluated using realistic simulations of challenging detection tasks as well as clinical data of patients with intractable epilepsy. The outlier detection framework was further extended to take into account the specificities of neuroimaging data and the detection task at hand. We first proposed a reformulation of the support vector data description (SVDD) method to deal with the presence of uncertain observations in the training data. Second, to handle the multi-parametric nature of neuroimaging data, we proposed an optimal fusion approach for combining multiple base one-class classifiers. Finally, to help with score interpretation, threshold selection and score combination, we proposed to transform the score outputs of the outlier detection algorithm into well calibrated probabilities.

Page generated in 0.2577 seconds