Global ETD Search

111	Maskininlärning och bildtolkning för ökad tillförlitlighet i strömavtagarlarm Clase, Christian January 2018 (has links) This master´s degree project is carried out by Trafikverket and concerns machine learning and image detection of defective pantographs on trains. Today, Trafikverket has a system for detecting damages of the coal rail located on the pantograph. This coal rail lies against the contact wire and may become worn in such a way that damages are formed in the coal rail, which results in a risk of demolition of the contact wire which causes major interference and high costs. Today, approximately 10 demolitions of contact wire occur annually due to missed detection. Today's system is called KIKA2, developed during the year 2011 and incorporates a 12 MP digital camera, a target radar and detection of a damaged pantographs is done using various famous imaging techniques. The shortcomings of today's system are that the proportion of false alarms is high and on these occasions, a person must manually review the pictures. The purpose of this degree project is to propose improvements and explore the possibilities of working with TensorFlow machine learning. I have used different image processing techniques on the KIKA2 images for optimizing the images for TensorFlow machine learning. I realized after some TensorFlow classification tests on the raw images that preprocessing the images is necessary to obtain realistic values for the classification part. My plan was to clean the pictures from noise, in other words crop the coal rail and improve the contrast to make the damages in the coal rail more visible. I have used Fourier analyze and correlation techniques to crop the coal rail and the k-means classification algorithm to improve the contrast of the images. The Googles TensorFlow is an open source framework and to use pre-processed RGB images from today's system KIKA2 will give reasonable classification values. I have brought some IR images with an external heating camera (FLIR-E60) of the pantograph. I can see that the thermal camera provides very nice contours on the pantograph, which is very good for machine learning. My recommendation is that for further studies is to further evaluate the IR technique and use IR-images taken from different angles, distances and with different backgrounds. The segmentation of the images can be done with either Hu´s moment or Fourier analysis with correlation and refined with for example classification techniques. IR images could be used to complement today's systems, or machine learning together with today's RGB images. A robust and proven pre-treatment technique is very important for obtaining good results in machine learning and requires further studies and real life tests to handle different types of pantographs, different light conditions and other differences in the images. maskininlärning tensorflow bildanalys mönsterigenkänning k-means kluster Engineering and Technology Teknik och teknologier
112	ALGORITMOS DE CLUSTERING PARALELOS EN SISTEMAS DE RECUPERACIÓN DE INFORMACIÓN DISTRIBUIDOS Jiménez González, Daniel 20 July 2011 (has links) La información es útil si cuando se necesita está disponible y se puede hacer uso de ella. La disponibilidad suele darse fácilmente cuando la información está bien estructurada y ordenada, y además, no es muy extensa. Pero esta situación no es la más común, cada vez se tiende más a que la cantidad de información ofrecida crezca de forma desmesurada, que esté desestructurada y que no presente un orden claro. La estructuración u ordenación manual es inviable debido a las dimensiones de la información a manejar. Por todo ello se hace clara la utilidad, e incluso la necesidad, de buenos sistemas de recuperación de información (SRI). Además, otra característica también importante es que la información tiende a presentarse de forma natural de manera distribuida, lo cual implica la necesidad de SRI que puedan trabajar en entornos distribuidos y con técnicas de paralelización. Esta tesis aborda todos estos aspectos desarrollando y mejorando métodos que permitan obtener SRI con mejores prestaciones, tanto en calidad de recuperación como en eficiencia computacional, los cuales además permiten trabajar desde el enfoque de sistemas ya distribuidos. El principal objetivo de los SRI será proporcionar documentos relevantes y omitir los considerados irrelevantes respecto a una consulta dada. Algunos de los problemas más destacables de los SRI son: la polisemia y la sinonimia; las palabras relacionadas (palabras que juntas tienen un signi cado y separadas otro); la enormidad de la información a manejar; la heterogeneidad de los documentos; etc. De todos ellos esta tesis se centra en la polisemia y la sinonimia, las palabras relacionadas (indirectamente mediante la lematización semántica) y en la enormidad de la información a manejar. El desarrollo de un SRI comprende básicamente cuatro fases distintas: el preprocesamiento, la modelización, la evaluación y la utilización. El preprocesamiento que conlleva las acciones necesarias para transformar los documentos de la colección en una estructura de datos con la información relevante de los documentos ha sido una parte importante del estudio de esta tesis. En esta fase nos hemos centrado en la reducción de los datos y estructuras a manejar, maximizando la información contenida. La modelización, ha sido la fase más analizada y trabajada en esta tesis, es la que se encarga de defi nir la estructura y comportamiento del SRI. / Jiménez González, D. (2011). ALGORITMOS DE CLUSTERING PARALELOS EN SISTEMAS DE RECUPERACIÓN DE INFORMACIÓN DISTRIBUIDOS [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/11234 / Palancia Paralelo Distribuido Recuperación de información Clustering Dbscan Paralelismo Bisecting K-means Vdbscan Heuristica
113	Transportation Techniques for Geometric Clustering January 2020 (has links) abstract: This thesis introduces new techniques for clustering distributional data according to their geometric similarities. This work builds upon the optimal transportation (OT) problem that seeks global minimum cost for matching distributional data and leverages the connection between OT and power diagrams to solve different clustering problems. The OT formulation is based on the variational principle to differentiate hard cluster assignments, which was missing in the literature. This thesis shows multiple techniques to regularize and generalize OT to cope with various tasks including clustering, aligning, and interpolating distributional data. It also discusses the connections of the new formulation to other OT and clustering formulations to better understand their gaps and the means to close them. Finally, this thesis demonstrates the advantages of the proposed OT techniques in solving machine learning problems and their downstream applications in computer graphics, computer vision, and image processing. / Dissertation/Thesis / Doctoral Dissertation Computer Engineering 2020 Computer engineering clustering k-means optimal transportation variational method Wasserstein distance
114	Optimalité statistique du partitionnement par l'optimisation convexe / Statistically Optimal Clustering through Convex Optimisation Royer, Martin 16 November 2018 (has links) Ces travaux traitent de la problématique du partitionnement d'un ensemble d'observations ou de variables en groupes d'éléments similaires. Elle sert de nombreuses applications essentielles comme la classification de gènes en biologie ou l'apprentissage automatique en analyse d'image. Les travaux modélisent la notion de similarité entre éléments pour analyser les propriétés statistiques d'algorithmes de partitionnement, comme l'estimateur des K-moyennes. Ce dernier est équivalent au maximum de vraisemblance quand les groupes considérés sont homoscedastiques ; dans le cas contraire, on s'aperçoit que l'estimateur est biaisé, en ce qu'il tend à séparer les groupes ayant une plus grande dispersion. En utilisant une formulation équivalente qui fait intervenir l'optimisation semi-définie positive, on propose une correction opérationnelle de ce biais. On construit et étudie ainsi des algorithmes de complexité polynomiale qui sont quasi-minimax pour le partitionnement exact dans les deux contextes étudiés. Ces résultats s'interprètent dans le cadre de modèles standards comme le modèle de mélange ou le modèle à variables latentes, et s'étendent à de nouveaux modèles plus généraux et plus robustes, les modèles $G$-block. Les contrôles peuvent être adaptés au nombre intrinsèque de groupes, ainsi qu'à la dimension effective de l'espace des données. Ils apportent une meilleure compréhension d'estimateurs classiques du partitionnement comme les estimateurs spectraux. Ils sont appuyés par des expériences extensives sur données de synthèse, ainsi que sur des jeux de données réelles. Enfin lorsqu'on cherche à améliorer l'efficacité computationnelle des algorithmes étudiés, on peut utiliser une connexion forte avec le domaine de l'optimisation convexe et notamment exploiter des techniques de relaxation de faible rang motivées par des problématiques de grande dimension. / This work focuses on the problem of point and variable clustering, that is the grouping of either similar vectors or similar components of a vector in a metric space. This has applications in many relevant fields including pattern recognition in image analysis or gene expression data classification. Through adequate modeling of the similarity between points or variables within a cluster we analyse the statistical properties of known clustering algorithms such as K-means.When considering homoscedastic elements for all groups the K-means algorithm is equivalent to a maximum-likelihood procedure. Otherwise the algorithm shows bias in the sense that it tends to separate groups with larger dispersion, regardless of actual group separation. By using a semi definite positive reformulation of the estimator, we suggest a pattern of correction for the algorithm that leads to the construction of computational algorithm with quasiminimax properties for hard clustering of points or variables.Those results can be studied under the classical mixture model or latent variables model, and can be extended to more general and robust class of $G$-block models. The stochastic controls can be made adaptive to the unknown number of classes as well as to the effective dimension of the problem. They help understand the behavior of the class of spectral estimators that are also widely used for clustering problems. They are supported by extensive simulation studies as well as data analysis stemming from the biological field.When focus is brought on the computational aspect of those algorithms, we exploit ideas based on a strong connexion with the domain of convex optimisation and specifically the technique of low-rank relaxation, of importance when dealing with high dimensional problems. Partitionnement K-Moyennes Minimax Optimisation Semi-Defini positif Clustering K-Means Minimax Optimisation Semidefinite programs
115	Vocation Clustering for Heavy-Duty Vehicles Daniel Patrick Kobold Jr (9719936) 07 January 2021 (has links) <p>The identification of the vocation of an unknown heavy-duty vehicle is valuable to parts manufacturers who may not have otherwise access to this information on a consistent basis. This study proposes a methodology for vocation identification that is based on clustering techniques. Two clustering algorithms are considered: K-Means and Expectation Maximization. These algorithms are used to first construct the operating profile of each vocation from a set of vehicles with known vocations. The vocation of an unknown vehicle is then determined using different assignment methods.</p> <p> </p> <p>These methods fall under two main categories: one-versus-all and one-versus-one. The one-versus-all approach compares an unknown vehicle to all potential vocations. The one-versus-one approach compares the unknown vehicle to two vocations at a time in a tournament fashion. Two types of tournaments are investigated: round-robin and bracket. The accuracy and efficiency of each of the methods is evaluated using the NREL FleetDNA dataset.</p> <p> </p> <p>The study revealed that some of the vocations may have unique operating profiles and are therefore easily distinguishable from others. Other vocations, however, can have confounding profiles. This indicates that different vocations may benefit from profiles with varying number of clusters. Determining the optimal number of clusters for each vocation can not only improve the assignment accuracy, but also enhance the computational efficiency of the application. The optimal number of clusters for each vocation is determined using both static and dynamic techniques. Static approaches refer to methods that are completed prior to training and may require multiple iterations. Dynamic techniques involve clusters being split or removed during training. The results show that the accuracy of dynamic techniques is comparable to that of static approaches while benefiting from a reduced computational time.</p> Computer Engineering Heavy-Duty Vehicles Vocation Clustering Classification Expectation Maximization K-Means Vocation Clustering
116	Contributions to Optimal Experimental Design and Strategic Subdata Selection for Big Data January 2020 (has links) abstract: In this dissertation two research questions in the field of applied experimental design were explored. First, methods for augmenting the three-level screening designs called Definitive Screening Designs (DSDs) were investigated. Second, schemes for strategic subdata selection for nonparametric predictive modeling with big data were developed. Under sparsity, the structure of DSDs can allow for the screening and optimization of a system in one step, but in non-sparse situations estimation of second-order models requires augmentation of the DSD. In this work, augmentation strategies for DSDs were considered, given the assumption that the correct form of the model for the response of interest is quadratic. Series of augmented designs were constructed and explored, and power calculations, model-robustness criteria, model-discrimination criteria, and simulation study results were used to identify the number of augmented runs necessary for (1) effectively identifying active model effects, and (2) precisely predicting a response of interest. When the goal is identification of active effects, it is shown that supersaturated designs are sufficient; when the goal is prediction, it is shown that little is gained by augmenting beyond the design that is saturated for the full quadratic model. Surprisingly, augmentation strategies based on the I-optimality criterion do not lead to better predictions than strategies based on the D-optimality criterion. Computational limitations can render standard statistical methods infeasible in the face of massive datasets, necessitating subsampling strategies. In the big data context, the primary objective is often prediction but the correct form of the model for the response of interest is likely unknown. Here, two new methods of subdata selection were proposed. The first is based on clustering, the second is based on space-filling designs, and both are free from model assumptions. The performance of the proposed methods was explored visually via low-dimensional simulated examples; via real data applications; and via large simulation studies. In all cases the proposed methods were compared to existing, widely used subdata selection methods. The conditions under which the proposed methods provide advantages over standard subdata selection strategies were identified. / Dissertation/Thesis / Doctoral Dissertation Statistics 2020 Statistics Design augmentation k-means clustering Latin hypercube designs Model discrimination Model robustness Supersaturated designs
117	Abstractive Representation Modeling for Image Classification Li, Xin 05 October 2021 (has links) No description available. Artificial Intelligence Image Classification Explainability Convolutional Neural Network Abstraction K-Means Clustering
118	Statistická analýza anomálií v senzorových datech / Statistical Analysis of Anomalies in Sensor Data Gregorová, Kateřina January 2019 (has links) This thesis deals with the failure mode detection of aircraft engines. The main approach to the detection is searching for anomalies in the sensor data. In order to get a comprehensive idea of the system and the particular sensors, the description of the whole system, namely the aircraft engine HTF7000 as well as the description of the sensors, are dealt with at the beginning of the thesis. A proposal of the anomaly detection algorithm based on three different detection methods is discussed in the second chapter. The above-mentioned methods are SVM (Support Vector Machine), K-means a ARIMA (Autoregressive Integrated Moving Average). The implementation of the algorithm including graphical user interface proposal are elaborated on in the next part of the thesis. Finally, statistical analysis of the results,the comparison of efficiency particular models and the discussion of outputs of the proposed algorithm can be found at the end of the thesis.
119	Data-driven persona development for a knowledge management system Baldi, Annika January 2021 (has links) Generating personas based entirely on data has gained popularity. Personas describe characteristics of a user group in a human-like format. This project presents the persona creation process from raw data to evaluated personas for Zapiens’ knowledge management system. The objective of the personas is to learn about customer behavior and aid in customer communication. For the described methodology, platform log data was clustered to group the users. The quantitative approach is, thereby, fast, updatable, and scalable. The analysis was split into two different features of the Zapiens platform. Persona sets for the training component and the chatbot component of Zapiens were tried to be created. The group characteristics were then enhanced with data from user surveys. This approach proved to be only successful for the training analysis. The collected data is presented in a web-based persona template to make the personas easily accessible and sharable. The finished training persona set was evaluated using the Persona Perception Scale. The results showed three personas of satisfying quality. The project aims to provide a complete overview of the data-driven persona development process. Personas data-driven persona development k-means clustering Persona Perception Scale Media and Communication Technology Medieteknik
120	Segmentace řeči / Speech segmentation Kašpar, Ladislav January 2015 (has links) My diploma thesis is devoted to the problem of segmentation of speech. It includes the basic theory on this topic. The theory focuses on the calculation of parameters for seg- mentation of speech that are used in the practical part. An application for segmentation of speech has been written in Matlab. It uses techniques as segmentation of the signal, energy of the signal and zero crossing function. These parameters are used as input for the algorithm k–means.

Search results