Global ETD Search

101	Apports bioinformatiques et statistiques à l'identification d'inhibiteurs du récepteur MET / Bioinformatics and statistical contributions to the identification of inhibitors for the MET receptor Apostol, Costin 21 December 2010 (has links) L’effet des polysaccharides sur l’interaction HGF-MET est étudié à l’aide d’un plan d’expérience comportant plusieurs puces à protéines sous différentes conditions d’expérimentation. Le but de l’analyse est la sélection des meilleurs polysaccharides inhibiteurs de l’interaction HGF-MET. D’un point de vue statistique c’est un problème de classification. Le traitement informatique et statistique des biopuces obtenues nécessite la mise en place de la plateforme PASE avec des plug-ins d’analyse statistique pour ce type de données. La principale caractéristique statistique de ces données est le caractère de répétition : l’expérience est répétée sur 5 puces et les polysaccharides, au sein d’une même puce, sont répliqués 3 fois. On n’est donc plus dans le cas classique des données indépendantes globalement, mais de celui d’une indépendance seulement au niveau intersujets et intrasujet. Nous proposons les modèles mixtes pour la normalisation des données et la représentation des sujets par la fonction de répartition empirique. L’utilisation de la statistique de Kolmogorov-Smirnov apparaît naturelle dans ce contexte et nous étudions son comportement dans les algorithmes de classification de type nuées dynamique et hiérarchique. Le choix du nombre de classes ainsi que du nombre de répétitions nécessaires pour une classification robuste sont traités en détail. L’efficacité de cette méthodologie est mesurée sur des simulations et appliquée aux données HGF-MET. Les résultats obtenus ont aidé au choix des meilleurs polysaccharides dans les essais effectués par les biologistes et les chimistes de l’Institut de Biologie de Lille. Certains de ces résultats ont aussi conforté l’intuition des ces chercheurs. Les scripts R implémentant cette méthodologie sont intégrés à la plateforme PASE. L’utilisation de l’analyse des données fonctionnelles sur ce type de données fait partie des perspectives immédiates de ce travail. / The effect of polysaccharides on HGF-MET interaction was studied using an experimental design with several microarrays under different experimental conditions. The purpose of the analysis is the selection of the best polysaccharides, inhibitors of HGF-MET interaction. From a statistical point of view this is a classification problem. Statistical and computer processing of the obtained microarrays requires the implementation of the PASE platform with statistical analysis plug-ins for this type of data. The main feature of these statistical data is the repeated measurements: the experiment was repeated on 5 microarrays and all studied polysaccharides are replicated 3 times on each microarray. We are no longer in the classical case of globally independent data, we only have independence at inter-subjects and intra-subject levels. We propose mixed models for data normalization and representation of subjects by the empirical cumulative distribution function. The use of the Kolmogorov-Smirnov statistic appears natural in this context and we study its behavior in the classification algorithms like hierarchical classification and k-means. The choice of the number of clusters and the number of repetitions needed for a robust classification are discussed in detail. The robustness of this methodology is measured by simulations and applied to HGF-MET data. The results helped the biologists and chemists from the Institute of Biology of Lille to choose the best polysaccharides in tests conducted by them. Some of these results also confirmed the intuition of the researchers. The R scripts implementing this methodology are integrated into the platform PASE. The use of functional data analysis on such data is part of the immediate future work. Classification des données répétées Mesures répétées Fonction de répartition Classification hiérarchique Clustering Cumulative distribution fonction K-means
102	Maskininlärning och bildtolkning för ökad tillförlitlighet i strömavtagarlarm Clase, Christian January 2018 (has links) This master´s degree project is carried out by Trafikverket and concerns machine learning and image detection of defective pantographs on trains. Today, Trafikverket has a system for detecting damages of the coal rail located on the pantograph. This coal rail lies against the contact wire and may become worn in such a way that damages are formed in the coal rail, which results in a risk of demolition of the contact wire which causes major interference and high costs. Today, approximately 10 demolitions of contact wire occur annually due to missed detection. Today's system is called KIKA2, developed during the year 2011 and incorporates a 12 MP digital camera, a target radar and detection of a damaged pantographs is done using various famous imaging techniques. The shortcomings of today's system are that the proportion of false alarms is high and on these occasions, a person must manually review the pictures. The purpose of this degree project is to propose improvements and explore the possibilities of working with TensorFlow machine learning. I have used different image processing techniques on the KIKA2 images for optimizing the images for TensorFlow machine learning. I realized after some TensorFlow classification tests on the raw images that preprocessing the images is necessary to obtain realistic values for the classification part. My plan was to clean the pictures from noise, in other words crop the coal rail and improve the contrast to make the damages in the coal rail more visible. I have used Fourier analyze and correlation techniques to crop the coal rail and the k-means classification algorithm to improve the contrast of the images. The Googles TensorFlow is an open source framework and to use pre-processed RGB images from today's system KIKA2 will give reasonable classification values. I have brought some IR images with an external heating camera (FLIR-E60) of the pantograph. I can see that the thermal camera provides very nice contours on the pantograph, which is very good for machine learning. My recommendation is that for further studies is to further evaluate the IR technique and use IR-images taken from different angles, distances and with different backgrounds. The segmentation of the images can be done with either Hu´s moment or Fourier analysis with correlation and refined with for example classification techniques. IR images could be used to complement today's systems, or machine learning together with today's RGB images. A robust and proven pre-treatment technique is very important for obtaining good results in machine learning and requires further studies and real life tests to handle different types of pantographs, different light conditions and other differences in the images. maskininlärning tensorflow bildanalys mönsterigenkänning k-means kluster Engineering and Technology Teknik och teknologier
103	Transportation Techniques for Geometric Clustering January 2020 (has links) abstract: This thesis introduces new techniques for clustering distributional data according to their geometric similarities. This work builds upon the optimal transportation (OT) problem that seeks global minimum cost for matching distributional data and leverages the connection between OT and power diagrams to solve different clustering problems. The OT formulation is based on the variational principle to differentiate hard cluster assignments, which was missing in the literature. This thesis shows multiple techniques to regularize and generalize OT to cope with various tasks including clustering, aligning, and interpolating distributional data. It also discusses the connections of the new formulation to other OT and clustering formulations to better understand their gaps and the means to close them. Finally, this thesis demonstrates the advantages of the proposed OT techniques in solving machine learning problems and their downstream applications in computer graphics, computer vision, and image processing. / Dissertation/Thesis / Doctoral Dissertation Computer Engineering 2020 Computer engineering clustering k-means optimal transportation variational method Wasserstein distance
104	Optimalité statistique du partitionnement par l'optimisation convexe / Statistically Optimal Clustering through Convex Optimisation Royer, Martin 16 November 2018 (has links) Ces travaux traitent de la problématique du partitionnement d'un ensemble d'observations ou de variables en groupes d'éléments similaires. Elle sert de nombreuses applications essentielles comme la classification de gènes en biologie ou l'apprentissage automatique en analyse d'image. Les travaux modélisent la notion de similarité entre éléments pour analyser les propriétés statistiques d'algorithmes de partitionnement, comme l'estimateur des K-moyennes. Ce dernier est équivalent au maximum de vraisemblance quand les groupes considérés sont homoscedastiques ; dans le cas contraire, on s'aperçoit que l'estimateur est biaisé, en ce qu'il tend à séparer les groupes ayant une plus grande dispersion. En utilisant une formulation équivalente qui fait intervenir l'optimisation semi-définie positive, on propose une correction opérationnelle de ce biais. On construit et étudie ainsi des algorithmes de complexité polynomiale qui sont quasi-minimax pour le partitionnement exact dans les deux contextes étudiés. Ces résultats s'interprètent dans le cadre de modèles standards comme le modèle de mélange ou le modèle à variables latentes, et s'étendent à de nouveaux modèles plus généraux et plus robustes, les modèles $G$-block. Les contrôles peuvent être adaptés au nombre intrinsèque de groupes, ainsi qu'à la dimension effective de l'espace des données. Ils apportent une meilleure compréhension d'estimateurs classiques du partitionnement comme les estimateurs spectraux. Ils sont appuyés par des expériences extensives sur données de synthèse, ainsi que sur des jeux de données réelles. Enfin lorsqu'on cherche à améliorer l'efficacité computationnelle des algorithmes étudiés, on peut utiliser une connexion forte avec le domaine de l'optimisation convexe et notamment exploiter des techniques de relaxation de faible rang motivées par des problématiques de grande dimension. / This work focuses on the problem of point and variable clustering, that is the grouping of either similar vectors or similar components of a vector in a metric space. This has applications in many relevant fields including pattern recognition in image analysis or gene expression data classification. Through adequate modeling of the similarity between points or variables within a cluster we analyse the statistical properties of known clustering algorithms such as K-means.When considering homoscedastic elements for all groups the K-means algorithm is equivalent to a maximum-likelihood procedure. Otherwise the algorithm shows bias in the sense that it tends to separate groups with larger dispersion, regardless of actual group separation. By using a semi definite positive reformulation of the estimator, we suggest a pattern of correction for the algorithm that leads to the construction of computational algorithm with quasiminimax properties for hard clustering of points or variables.Those results can be studied under the classical mixture model or latent variables model, and can be extended to more general and robust class of $G$-block models. The stochastic controls can be made adaptive to the unknown number of classes as well as to the effective dimension of the problem. They help understand the behavior of the class of spectral estimators that are also widely used for clustering problems. They are supported by extensive simulation studies as well as data analysis stemming from the biological field.When focus is brought on the computational aspect of those algorithms, we exploit ideas based on a strong connexion with the domain of convex optimisation and specifically the technique of low-rank relaxation, of importance when dealing with high dimensional problems. Partitionnement K-Moyennes Minimax Optimisation Semi-Defini positif Clustering K-Means Minimax Optimisation Semidefinite programs
105	Vocation Clustering for Heavy-Duty Vehicles Daniel Patrick Kobold Jr (9719936) 07 January 2021 (has links) <p>The identification of the vocation of an unknown heavy-duty vehicle is valuable to parts manufacturers who may not have otherwise access to this information on a consistent basis. This study proposes a methodology for vocation identification that is based on clustering techniques. Two clustering algorithms are considered: K-Means and Expectation Maximization. These algorithms are used to first construct the operating profile of each vocation from a set of vehicles with known vocations. The vocation of an unknown vehicle is then determined using different assignment methods.</p> <p> </p> <p>These methods fall under two main categories: one-versus-all and one-versus-one. The one-versus-all approach compares an unknown vehicle to all potential vocations. The one-versus-one approach compares the unknown vehicle to two vocations at a time in a tournament fashion. Two types of tournaments are investigated: round-robin and bracket. The accuracy and efficiency of each of the methods is evaluated using the NREL FleetDNA dataset.</p> <p> </p> <p>The study revealed that some of the vocations may have unique operating profiles and are therefore easily distinguishable from others. Other vocations, however, can have confounding profiles. This indicates that different vocations may benefit from profiles with varying number of clusters. Determining the optimal number of clusters for each vocation can not only improve the assignment accuracy, but also enhance the computational efficiency of the application. The optimal number of clusters for each vocation is determined using both static and dynamic techniques. Static approaches refer to methods that are completed prior to training and may require multiple iterations. Dynamic techniques involve clusters being split or removed during training. The results show that the accuracy of dynamic techniques is comparable to that of static approaches while benefiting from a reduced computational time.</p> Computer Engineering Heavy-Duty Vehicles Vocation Clustering Classification Expectation Maximization K-Means Vocation Clustering
106	Contributions to Optimal Experimental Design and Strategic Subdata Selection for Big Data January 2020 (has links) abstract: In this dissertation two research questions in the field of applied experimental design were explored. First, methods for augmenting the three-level screening designs called Definitive Screening Designs (DSDs) were investigated. Second, schemes for strategic subdata selection for nonparametric predictive modeling with big data were developed. Under sparsity, the structure of DSDs can allow for the screening and optimization of a system in one step, but in non-sparse situations estimation of second-order models requires augmentation of the DSD. In this work, augmentation strategies for DSDs were considered, given the assumption that the correct form of the model for the response of interest is quadratic. Series of augmented designs were constructed and explored, and power calculations, model-robustness criteria, model-discrimination criteria, and simulation study results were used to identify the number of augmented runs necessary for (1) effectively identifying active model effects, and (2) precisely predicting a response of interest. When the goal is identification of active effects, it is shown that supersaturated designs are sufficient; when the goal is prediction, it is shown that little is gained by augmenting beyond the design that is saturated for the full quadratic model. Surprisingly, augmentation strategies based on the I-optimality criterion do not lead to better predictions than strategies based on the D-optimality criterion. Computational limitations can render standard statistical methods infeasible in the face of massive datasets, necessitating subsampling strategies. In the big data context, the primary objective is often prediction but the correct form of the model for the response of interest is likely unknown. Here, two new methods of subdata selection were proposed. The first is based on clustering, the second is based on space-filling designs, and both are free from model assumptions. The performance of the proposed methods was explored visually via low-dimensional simulated examples; via real data applications; and via large simulation studies. In all cases the proposed methods were compared to existing, widely used subdata selection methods. The conditions under which the proposed methods provide advantages over standard subdata selection strategies were identified. / Dissertation/Thesis / Doctoral Dissertation Statistics 2020 Statistics Design augmentation k-means clustering Latin hypercube designs Model discrimination Model robustness Supersaturated designs
107	Abstractive Representation Modeling for Image Classification Li, Xin 05 October 2021 (has links) No description available. Artificial Intelligence Image Classification Explainability Convolutional Neural Network Abstraction K-Means Clustering
108	Statistická analýza anomálií v senzorových datech / Statistical Analysis of Anomalies in Sensor Data Gregorová, Kateřina January 2019 (has links) This thesis deals with the failure mode detection of aircraft engines. The main approach to the detection is searching for anomalies in the sensor data. In order to get a comprehensive idea of the system and the particular sensors, the description of the whole system, namely the aircraft engine HTF7000 as well as the description of the sensors, are dealt with at the beginning of the thesis. A proposal of the anomaly detection algorithm based on three different detection methods is discussed in the second chapter. The above-mentioned methods are SVM (Support Vector Machine), K-means a ARIMA (Autoregressive Integrated Moving Average). The implementation of the algorithm including graphical user interface proposal are elaborated on in the next part of the thesis. Finally, statistical analysis of the results,the comparison of efficiency particular models and the discussion of outputs of the proposed algorithm can be found at the end of the thesis.
109	Data-driven persona development for a knowledge management system Baldi, Annika January 2021 (has links) Generating personas based entirely on data has gained popularity. Personas describe characteristics of a user group in a human-like format. This project presents the persona creation process from raw data to evaluated personas for Zapiens’ knowledge management system. The objective of the personas is to learn about customer behavior and aid in customer communication. For the described methodology, platform log data was clustered to group the users. The quantitative approach is, thereby, fast, updatable, and scalable. The analysis was split into two different features of the Zapiens platform. Persona sets for the training component and the chatbot component of Zapiens were tried to be created. The group characteristics were then enhanced with data from user surveys. This approach proved to be only successful for the training analysis. The collected data is presented in a web-based persona template to make the personas easily accessible and sharable. The finished training persona set was evaluated using the Persona Perception Scale. The results showed three personas of satisfying quality. The project aims to provide a complete overview of the data-driven persona development process. Personas data-driven persona development k-means clustering Persona Perception Scale Media and Communication Technology Medieteknik
110	Segmentace řeči / Speech segmentation Kašpar, Ladislav January 2015 (has links) My diploma thesis is devoted to the problem of segmentation of speech. It includes the basic theory on this topic. The theory focuses on the calculation of parameters for seg- mentation of speech that are used in the practical part. An application for segmentation of speech has been written in Matlab. It uses techniques as segmentation of the signal, energy of the signal and zero crossing function. These parameters are used as input for the algorithm k–means.

Search results