Global ETD Search

201	Interprétation sémantique d'images hyperspectrales basée sur la réduction adaptative de dimensionnalité / Semantic interpretation of hyperspectral images based on the adaptative reduction of dimensionality Sellami, Akrem 11 December 2017 (has links) L'imagerie hyperspectrale permet d'acquérir des informations spectrales riches d'une scène dans plusieurs centaines, voire milliers de bandes spectrales étroites et contiguës. Cependant, avec le nombre élevé de bandes spectrales, la forte corrélation inter-bandes spectrales et la redondance de l'information spectro-spatiale, l'interprétation de ces données hyperspectrales massives est l'un des défis majeurs pour la communauté scientifique de la télédétection. Dans ce contexte, le grand défi posé est la réduction du nombre de bandes spectrales inutiles, c'est-à-dire de réduire la redondance et la forte corrélation de bandes spectrales tout en préservant l'information pertinente. Par conséquent, des approches de projection visent à transformer les données hyperspectrales dans un sous-espace réduit en combinant toutes les bandes spectrales originales. En outre, des approches de sélection de bandes tentent à chercher un sous-ensemble de bandes spectrales pertinentes. Dans cette thèse, nous nous intéressons d'abord à la classification d'imagerie hyperspectrale en essayant d'intégrer l'information spectro-spatiale dans la réduction de dimensions pour améliorer la performance de la classification et s'affranchir de la perte de l'information spatiale dans les approches de projection. De ce fait, nous proposons un modèle hybride permettant de préserver l'information spectro-spatiale en exploitant les tenseurs dans l'approche de projection préservant la localité (TLPP) et d'utiliser l'approche de sélection non supervisée de bandes spectrales discriminantes à base de contraintes (CBS). Pour modéliser l'incertitude et l'imperfection entachant ces approches de réduction et les classifieurs, nous proposons une approche évidentielle basée sur la théorie de Dempster-Shafer (DST). Dans un second temps, nous essayons d'étendre le modèle hybride en exploitant des connaissances sémantiques extraites à travers les caractéristiques obtenues par l'approche proposée auparavant TLPP pour enrichir la sélection non supervisée CBS. En effet, l'approche proposée permet de sélectionner des bandes spectrales pertinentes qui sont à la fois informatives, discriminantes, distinctives et peu redondantes. En outre, cette approche sélectionne les bandes discriminantes et distinctives en utilisant la technique de CBS en injectant la sémantique extraite par les techniques d'extraction de connaissances afin de sélectionner d'une manière automatique et adaptative le sous-ensemble optimal de bandes spectrales pertinentes. La performance de notre approche est évaluée en utilisant plusieurs jeux des données hyperspectrales réelles. / Hyperspectral imagery allows to acquire a rich spectral information of a scene in several hundred or even thousands of narrow and contiguous spectral bands. However, with the high number of spectral bands, the strong inter-bands spectral correlation and the redundancy of spectro-spatial information, the interpretation of these massive hyperspectral data is one of the major challenges for the remote sensing scientific community. In this context, the major challenge is to reduce the number of unnecessary spectral bands, that is, to reduce the redundancy and high correlation of spectral bands while preserving the relevant information. Therefore, projection approaches aim to transform the hyperspectral data into a reduced subspace by combining all original spectral bands. In addition, band selection approaches attempt to find a subset of relevant spectral bands. In this thesis, firstly we focus on hyperspectral images classification attempting to integrate the spectro-spatial information into dimension reduction in order to improve the classification performance and to overcome the loss of spatial information in projection approaches.Therefore, we propose a hybrid model to preserve the spectro-spatial information exploiting the tensor model in the locality preserving projection approach (TLPP) and to use the constraint band selection (CBS) as unsupervised approach to select the discriminant spectral bands. To model the uncertainty and imperfection of these reduction approaches and classifiers, we propose an evidential approach based on the Dempster-Shafer Theory (DST). In the second step, we try to extend the hybrid model by exploiting the semantic knowledge extracted through the features obtained by the previously proposed approach TLPP to enrich the CBS technique. Indeed, the proposed approach makes it possible to select a relevant spectral bands which are at the same time informative, discriminant, distinctive and not very redundant. In fact, this approach selects the discriminant and distinctive spectral bands using the CBS technique injecting the extracted rules obtained with knowledge extraction techniques to automatically and adaptively select the optimal subset of relevant spectral bands. The performance of our approach is evaluated using several real hyperspectral data. Réduction de dimension Apprentissage automatique Analyse des données Imagerie hyperspectrale Algèbre multi-Linéaire Sélection de bandes Extraction des caractéristiques Interprétation sémantique Dimensionality reduction Machine learning Data analytics Hyperspectral imagery Multi-Linear algebra Band selection Feature extraction Semantic interpretation 004
202	Využití pokročilých statistických metod pro zpracování obrazu fluorescenční emise rostlin ovlivněných lokálním biotickým stresem / Utilization of advanced statistical methods for processing of florescence emission of plants affected by local biotic stress MATOUŠ, Karel January 2008 (has links) Chlorophyll fluorescence imaging is noninvasive technique often used in plant physiology, molecular biology and precision farming. Captured sequences of images record the dynamic of chlorophyll fluorescence emission which contain the information about spatial and time changes of photosynthetic activity of plant. The goal of this Ph.D. thesis is to contribute to the development of chlorophyll fluorescence imaging by application of advanced statistical techniques. Methods of statistical pattern recognition allow to identify images in the captured sequence that are reach for information about observed biotic stress and to find small subsets of fluorescence images suitable for following analysis. I utilized only methods for identification of small sets of images providing high performance with realistic time consumptions.
203	Contributions to decision tree based learning / Contributions à l’apprentissage de l’arbre des décisions Qureshi, Taimur 08 July 2010 (has links) Advances in data collection methods, storage and processing technology are providing a unique challenge and opportunity for automated data learning techniques which aim at producing high-level information, or models, from data. A Typical knowledge discovery process consists of data selection, data preparation, data transformation, data mining and interpretation/validation of the results. Thus, we develop automatic learning techniques which contribute to the data preparation, transformation and mining tasks of knowledge discovery. In doing so, we try to improve the prediction accuracy of the overall learning process. Our work focuses on decision tree based learning and thus, we introduce various preprocessing and transformation techniques such as discretization, fuzzy partitioning and dimensionality reduction to improve this type of learning. However, these techniques can be used in other learning methods e.g. discretization can also be used for naive-bayes classifiers. The data preparation step represents almost 80 percent of the problem and is both time consuming and critical for the quality of modeling. Discretization of continuous features is an important problem that has effects on accuracy, complexity, variance and understandability of the induction models. In this thesis, we propose and develop resampling based aggregation techniques that improve the quality of discretization. Later, we validate by comparing with other discretization techniques and with an optimal partitioning method on 10 benchmark data sets.The second part of our thesis concerns with automatic fuzzy partitioning for soft decision tree induction. Soft or fuzzy decision tree is an extension of the classical crisp tree induction such that fuzzy logic is embedded into the induction process with the effect of more accurate models and reduced variance, but still interpretable and autonomous. We modify the above resampling based partitioning method to generate fuzzy partitions. In addition we propose, develop and validate another fuzzy partitioning method that improves the accuracy of the decision tree.Finally, we adopt a topological learning scheme and perform non-linear dimensionality reduction. We modify an existing manifold learning based technique and see whether it can enhance the predictive power and interpretability of classification. / La recherche avancée dans les méthodes d'acquisition de données ainsi que les méthodes de stockage et les technologies d'apprentissage, s'attaquent défi d'automatiser de manière systématique les techniques d'apprentissage de données en vue d'extraire des connaissances valides et utilisables.La procédure de découverte de connaissances s'effectue selon les étapes suivants: la sélection des données, la préparation de ces données, leurs transformation, le fouille de données et finalement l'interprétation et validation des résultats trouvés. Dans ce travail de thèse, nous avons développé des techniques qui contribuent à la préparation et la transformation des données ainsi qu'a des méthodes de fouille des données pour extraire les connaissances. A travers ces travaux, on a essayé d'améliorer l'exactitude de la prédiction durant tout le processus d'apprentissage. Les travaux de cette thèse se basent sur les arbres de décision. On a alors introduit plusieurs approches de prétraitement et des techniques de transformation; comme le discrétisation, le partitionnement flou et la réduction des dimensions afin d'améliorer les performances des arbres de décision. Cependant, ces techniques peuvent être utilisées dans d'autres méthodes d'apprentissage comme la discrétisation qui peut être utilisées pour la classification bayesienne.Dans le processus de fouille de données, la phase de préparation de données occupe généralement 80 percent du temps. En autre, elle est critique pour la qualité de la modélisation. La discrétisation des attributs continus demeure ainsi un problème très important qui affecte la précision, la complexité, la variance et la compréhension des modèles d'induction. Dans cette thèse, nous avons proposes et développé des techniques qui ce basent sur le ré-échantillonnage. Nous avons également étudié d'autres alternatives comme le partitionnement flou pour une induction floue des arbres de décision. Ainsi la logique floue est incorporée dans le processus d'induction pour augmenter la précision des modèles et réduire la variance, en maintenant l'interprétabilité.Finalement, nous adoptons un schéma d'apprentissage topologique qui vise à effectuer une réduction de dimensions non-linéaire. Nous modifions une technique d'apprentissage à base de variété topologiques `manifolds' pour savoir si on peut augmenter la précision et l'interprétabilité de la classification. Apprentissage Topologique Arbres de Décision Classification Discrétisation Fouille des Données Partitionnement Flou Préparation de Données Ré-échantillonnage Réduction de Dimensions Classification Data Mining Data Preprocessing Decision Trees Dimensionality Reduction Discretization Fuzzy Partitioning Resampling Topological Learning
204	Análise da influência de funções de distância para o processamento de consultas por similaridade em recuperação de imagens por conteúdo / Analysis of the influence of distance functions to answer similarity queries in content-based image retrieval. Pedro Henrique Bugatti 16 April 2008 (has links) A recuperação de imagens baseada em conteúdo (Content-based Image Retrieval - CBIR) embasa-se sobre dois aspectos primordiais, um extrator de características o qual deve prover as características intrínsecas mais significativas dos dados e uma função de distância a qual quantifica a similaridade entre tais dados. O grande desafio é justamente como alcançar a melhor integração entre estes dois aspectos chaves com intuito de obter maior precisão nas consultas por similaridade. Apesar de inúmeros esforços serem continuamente despendidos para o desenvolvimento de novas técnicas de extração de características, muito pouca atenção tem sido direcionada à importância de uma adequada associação entre a função de distância e os extratores de características. A presente Dissertação de Mestrado foi concebida com o intuito de preencher esta lacuna. Para tal, foi realizada a análise do comportamento de diferentes funções de distância com relação a tipos distintos de vetores de características. Os três principais tipos de características intrínsecas às imagens foram analisados, com respeito a distribuição de cores, textura e forma. Além disso, foram propostas duas novas técnicas para realização de seleção de características com o desígnio de obter melhorias em relação à precisão das consultas por similaridade. A primeira técnica emprega regras de associação estatísticas e alcançou um ganho de até 38% na precisão, enquanto que a segunda técnica utilizando a entropia de Shannon alcançou um ganho de aproximadamente 71% ao mesmo tempo em que reduz significantemente a dimensionalidade dos vetores de características. O presente trabalho também demonstra que uma adequada utilização das funções de distância melhora efetivamente os resultados das consultas por similaridade. Conseqüentemente, desdobra novos caminhos para realçar a concepção de sistemas CBIR / The retrieval of images by visual content relies on a feature extractor to provide the most meaningful intrinsic characteristics (features) from the data, and a distance function to quantify the similarity between them. A challenge in this field supporting content-based image retrieval (CBIR) to answer similarity queries is how to best integrate these two key aspects. There are plenty of researching on algorithms for feature extraction of images. However, little attention have been paid to the importance of the use of a well-suited distance function associated to a feature extractor. This Master Dissertation was conceived to fill in this gap. Therefore, herein it was investigated the behavior of different distance functions regarding distinct feature vector types. The three main types of image features were evaluated, regarding color distribution, texture and shape. It was also proposed two new techniques to perform feature selection over the feature vectors, in order to improve the precision when answering similarity queries. The first technique employed statistical association rules and achieve up to 38% gain in precision, while the second one employing the Shannon entropy achieved 71%, while siginificantly reducing the size of the feature vector. This work also showed that the proper use of a distance function effectively improves the similarity query results. Therefore, it opens new ways to enhance the acceptance of CBIR systems Consultas por Similaridade Extração de Características Funções de Distância Imagens Médicas Redução da Dimensionalidade Content-Based Image Retrieval Dimensionality Reduction Distance Functions Feature Extraction Medical Images Similarity Queries
205	Analýza kvality ovzduší v kancelářských a obytných prostorech / Air Quality Analysis in Office and Residential Areas Tisovčík, Peter January 2019 (has links) The goal of the thesis was to study the indoor air quality measurement focusing on the concentration of carbon dioxide. Within the theoretical part, data mining including basic classification methods and approaches to dimensionality reduction was introduced. In addition, the principles of the developed system within IoTCloud project and available possibilities for measurement of necessary quantities were studied. In the practical part, the suitable sensors for given rooms were selected and long-term measurement was performed. Measured data was used to create the system for window opening detection and for the design of appropriate way of air change regulation in a room. The aim of regulation was to improve air quality using natural ventilation.
206	Investigating the Correlation Between Marketing Emails and Receivers Using Unsupervised Machine Learning on Limited Data : A comprehensive study using state of the art methods for text clustering and natural language processing / Undersökning av samband mellan marknadsföringsemail och dess mottagare med hjälp av oövervakad maskininlärning på begränsad data Pettersson, Christoffer January 2016 (has links) The goal of this project is to investigate any correlation between marketing emails and their receivers using machine learning and only a limited amount of initial data. The data consists of roughly 1200 emails and 98.000 receivers of these. Initially, the emails are grouped together based on their content using text clustering. They contain no information regarding prior labeling or categorization which creates a need for an unsupervised learning approach using solely the raw text based content as data. The project investigates state-of-the-art concepts like bag-of-words for calculating term importance and the gap statistic for determining an optimal number of clusters. The data is vectorized using term frequency - inverse document frequency to determine the importance of terms relative to the document and to all documents combined. An inherit problem of this approach is high dimensionality which is reduced using latent semantic analysis in conjunction with singular value decomposition. Once the resulting clusters have been obtained, the most frequently occurring terms for each cluster are analyzed and compared. Due to the absence of initial labeling an alternative approach is required to evaluate the clusters validity. To do this, the receivers of all emails in each cluster who actively opened an email is collected and investigated. Each receiver have different attributes regarding their purpose of using the service and some personal information. Once gathered and analyzed, conclusions could be drawn that it is possible to find distinguishable connections between the resulting email clusters and their receivers but to a limited extent. The receivers from the same cluster did show similar attributes as each other which were distinguishable from the receivers of other clusters. Hence, the resulting email clusters and their receivers are specific enough to distinguish themselves from each other but too general to handle more detailed information. With more data, this could become a useful tool for determining which users of a service should receive a particular email to increase the conversion rate and thereby reach out to more relevant people based on previous trends. / Målet med detta projekt att undersöka eventuella samband mellan marknadsföringsemail och dess mottagare med hjälp av oövervakad maskininlärning på en brgränsad mängd data. Datan består av ca 1200 email meddelanden med 98.000 mottagare. Initialt så gruperas alla meddelanden baserat på innehåll via text klustering. Meddelandena innehåller ingen information angående tidigare gruppering eller kategorisering vilket skapar ett behov för ett oövervakat tillvägagångssätt för inlärning där enbart det råa textbaserade meddelandet används som indata. Projektet undersöker moderna tekniker så som bag-of-words för att avgöra termers relevans och the gap statistic för att finna ett optimalt antal kluster. Datan vektoriseras med hjälp av term frequency - inverse document frequency för att avgöra relevansen av termer relativt dokumentet samt alla dokument kombinerat. Ett fundamentalt problem som uppstår via detta tillvägagångssätt är hög dimensionalitet, vilket reduceras med latent semantic analysis tillsammans med singular value decomposition. Då alla kluster har erhållits så analyseras de mest förekommande termerna i vardera kluster och jämförs. Eftersom en initial kategorisering av meddelandena saknas så krävs ett alternativt tillvägagångssätt för evaluering av klustrens validitet. För att göra detta så hämtas och analyseras alla mottagare för vardera kluster som öppnat något av dess meddelanden. Mottagarna har olika attribut angående deras syfte med att använda produkten samt personlig information. När de har hämtats och undersökts kan slutsatser dras kring hurvida samband kan hittas. Det finns ett klart samband mellan vardera kluster och dess mottagare, men till viss utsträckning. Mottagarna från samma kluster visade likartade attribut som var urskiljbara gentemot mottagare från andra kluster. Därav kan det sägas att de resulterande klustren samt dess mottagare är specifika nog att urskilja sig från varandra men för generella för att kunna handera mer detaljerad information. Med mer data kan detta bli ett användbart verktyg för att bestämma mottagare av specifika emailutskick för att på sikt kunna öka öppningsfrekvensen och därmed nå ut till mer relevanta mottagare baserat på tidigare resultat. Machine learning Unsupervised Natural language processing nlp clustering centroid based k-means text clustering limited data email clustering lsa svd tf-idf dimensionality reduction the gap statistic Lloyd's algorithm vectorization feature extraction Computer Sciences Datavetenskap (datalogi)
207	Linear and Nonlinear Dimensionality-Reduction-Based Surrogate Models for Real-Time Design Space Exploration of Structural Responses Bird, Gregory David 03 August 2020 (has links) Design space exploration (DSE) is a tool used to evaluate and compare designs as part of the design selection process. While evaluating every possible design in a design space is infeasible, understanding design behavior and response throughout the design space may be accomplished by evaluating a subset of designs and interpolating between them using surrogate models. Surrogate modeling is a technique that uses low-cost calculations to approximate the outcome of more computationally expensive calculations or analyses, such as finite element analysis (FEA). While surrogates make quick predictions, accuracy is not guaranteed and must be considered. This research addressed the need to improve the accuracy of surrogate predictions in order to improve DSE of structural responses. This was accomplished by performing comparative analyses of linear and nonlinear dimensionality-reduction-based radial basis function (RBF) surrogate models for emulating various FEA nodal results. A total of four dimensionality reduction methods were investigated, namely principal component analysis (PCA), kernel principal component analysis (KPCA), isometric feature mapping (ISOMAP), and locally linear embedding (LLE). These methods were used in conjunction with surrogate modeling to predict nodal stresses and coordinates of a compressor blade. The research showed that using an ISOMAP-based dual-RBF surrogate model for predicting nodal stresses decreased the estimated mean error of the surrogate by 35.7% compared to PCA. Using nonlinear dimensionality-reduction-based surrogates did not reduce surrogate error for predicting nodal coordinates. A new metric, the manifold distance ratio (MDR), was introduced to measure the nonlinearity of the data manifolds. When applied to the stress and coordinate data, the stress space was found to be more nonlinear than the coordinate space for this application. The upfront training cost of the nonlinear dimensionality-reduction-based surrogates was larger than that of their linear counterparts but small enough to remain feasible. After training, all the dual-RBF surrogates were capable of making real-time predictions. This same process was repeated for a separate application involving the nodal displacements of mode shapes obtained from a FEA modal analysis. The modal assurance criterion (MAC) calculation was used to compare the predicted mode shapes, as well as their corresponding true mode shapes obtained from FEA, to a set of reference modes. The research showed that two nonlinear techniques, namely LLE and KPCA, resulted in lower surrogate error in the more complex design spaces. Using a RBF kernel, KPCA achieved the largest average reduction in error of 13.57%. The results also showed that surrogate error was greatly affected by mode shape reversal. Four different approaches of identifying reversed mode shapes were explored, all of which resulted in varying amounts of surrogate error. Together, the methods explored in this research were shown to decrease surrogate error when performing DSE of a turbomachine compressor blade. As surrogate accuracy increases, so does the ability to correctly make engineering decisions and judgements throughout the design process. Ultimately, this will help engineers design better turbomachines. design space exploration surrogate modeling dimensionality reduction principal component analysis kernel principal component analysis isometric feature mapping locally linear embedding finite element analysis modal analysis modal assurance criterion turbomachinery compressor blades Engineering
208	Evaluation of Archetypal Analysis and Manifold Learning for Phenotyping of Acute Kidney Injury Dylan M Rodriquez (10695618) 07 May 2021 (has links) Disease subtyping has been a critical aim of precision and personalized medicine. With the potential to improve patient outcomes, unsupervised and semi-supervised methods for determining phenotypes of subtypes have emerged with a recent focus on matrix and tensor factorization. However, interpretability of proposed models is debatable. Principal component analysis (PCA), a traditional method of dimensionality reduction, does not impose non-negativity constraints. Thus coefficients of the principal components are, in cases, difficult to translate to real physical units. Non-negative matrix factorization (NMF) constrains the factorization to positive numbers such that representative types resulting from the factorization are additive. Archetypal analysis (AA) extends this idea and seeks to identify pure types, archetypes, at the extremes of the data from which all other data can be expressed as a convex combination, or by proportion, of the archetypes. Using AA, this study sought to evaluate the sufficiency of AKI staging criteria through unsupervised subtyping. Archetype analysis failed to find a direct 1:1 mapping of archetypes to physician staging and also did not provide additional insight into patient outcomes. Several factors of the analysis such as quality of the data source and the difficulty in selecting features contributed to the outcome. Additionally, after performing feature selection with lasso across data subsets, it was determined that current staging criteria is sufficient to determine patient phenotype with serum creatinine at time of diagnosis to be a necessary factor. Applied Computer Science Archetypal Analysis Manifold Learning Dimensionality reduction Clinical Informatics Acute Kidney Injury Outcomes electronic health records (EHR) Cerner Health Facts database UMAP Disease subtype discovery
209	Deep Scenario Generation of Financial Markets / Djup scenario generering av finansiella marknader Carlsson, Filip, Lindgren, Philip January 2020 (has links) The goal of this thesis is to explore a new clustering algorithm, VAE-Clustering, and examine if it can be applied to find differences in the distribution of stock returns and augment the distribution of a current portfolio of stocks and see how it performs in different market conditions. The VAE-clustering method is as mentioned a newly introduced method and not widely tested, especially not on time series. The first step is therefore to see if and how well the clustering works. We first apply the algorithm to a dataset containing monthly time series of the power demand in Italy. The purpose in this part is to focus on how well the method works technically. When the model works well and generates proper results with the Italian Power Demand data, we move forward and apply the model on stock return data. In the latter application we are unable to find meaningful clusters and therefore unable to move forward towards the goal of the thesis. The results shows that the VAE-clustering method is applicable for time series. The power demand have clear differences from season to season and the model can successfully identify those differences. When it comes to the financial data we hoped that the model would be able to find different market regimes based on time periods. The model is though not able distinguish different time periods from each other. We therefore conclude that the VAE-clustering method is applicable on time series data, but that the structure and setting of the financial data in this thesis makes it to hard to find meaningful clusters. The major finding is that the VAE-clustering method can be applied to time series. We highly encourage further research to find if the method can be successfully used on financial data in different settings than tested in this thesis. / Syftet med den här avhandlingen är att utforska en ny klustringsalgoritm, VAE-Clustering, och undersöka om den kan tillämpas för att hitta skillnader i fördelningen av aktieavkastningar och förändra distributionen av en nuvarande aktieportfölj och se hur den presterar under olika marknadsvillkor. VAE-klusteringsmetoden är som nämnts en nyinförd metod och inte testad i stort, särskilt inte på tidsserier. Det första steget är därför att se om och hur klusteringen fungerar. Vi tillämpar först algoritmen på ett datasätt som innehåller månatliga tidsserier för strömbehovet i Italien. Syftet med denna del är att fokusera på hur väl metoden fungerar tekniskt. När modellen fungerar bra och ger tillfredställande resultat, går vi vidare och tillämpar modellen på aktieavkastningsdata. I den senare applikationen kan vi inte hitta meningsfulla kluster och kan därför inte gå framåt mot målet som var att simulera olika marknader och se hur en nuvarande portfölj presterar under olika marknadsregimer. Resultaten visar att VAE-klustermetoden är väl tillämpbar på tidsserier. Behovet av el har tydliga skillnader från säsong till säsong och modellen kan framgångsrikt identifiera dessa skillnader. När det gäller finansiell data hoppades vi att modellen skulle kunna hitta olika marknadsregimer baserade på tidsperioder. Modellen kan dock inte skilja olika tidsperioder från varandra. Vi drar därför slutsatsen att VAE-klustermetoden är tillämplig på tidsseriedata, men att strukturen på den finansiella data som undersöktes i denna avhandling gör det svårt att hitta meningsfulla kluster. Den viktigaste upptäckten är att VAE-klustermetoden kan tillämpas på tidsserier. Vi uppmuntrar ytterligare forskning för att hitta om metoden framgångsrikt kan användas på finansiell data i andra former än de testade i denna avhandling Variational Autoencoder Generative Models Latent Space Dimensionality Reduction Unsupervised Learning Clustering VAE-Clustering Scenario Generation Market Regime Variational Autoencoder generativa modeller latent rum dimensionsreducering klustring scenario generering Mathematics Matematik
210	Monitoring Vehicle Suspension Elements Using Machine Learning Techniques / Tillståndsövervakning av komponenter i fordonsfjädringssystem genom maskininlärningstekniker Karlsson, Henrik January 2019 (has links) Condition monitoring (CM) is widely used in industry, and there is a growing interest in applying CM on rail vehicle systems. Condition based maintenance has the possibility to increase system safety and availability while at the sametime reduce the total maintenance costs.This thesis investigates the feasibility of using condition monitoring of suspension element components, in this case dampers, in rail vehicles. There are different methods utilized to detect degradations, ranging from mathematicalmodelling of the system to pure "knowledge-based" methods, using only large amount of data to detect patterns on a larger scale. In this thesis the latter approach is explored, where acceleration signals are evaluated on severalplaces on the axleboxes, bogieframes and the carbody of a rail vehicle simulation model. These signals are picked close to the dampers that are monitored in this study, and frequency response functions (FRF) are computed between axleboxes and bogieframes as well as between bogieframes and carbody. The idea is that the FRF will change as the condition of the dampers change, and thus act as indicators of faults. The FRF are then fed to different classificationalgorithms, that are trained and tested to distinguish between the different damper faults.This thesis further investigates which classification algorithm shows promising results for the problem, and which algorithm performs best in terms of classification accuracy as well as two other measures. Another aspect explored is thepossibility to apply dimensionality reduction to the extracted indicators (features). This thesis is also looking into how the three performance measures used are affected by typical varying operational conditions for a rail vehicle,such as varying excitation and carbody mass. The Linear Support Vector Machine classifier using the whole feature space, and the Linear Discriminant Analysis classifier combined with Principal Component Analysis dimensionality reduction on the feature space both show promising results for the taskof correctly classifying upcoming damper degradations. / Tillståndsövervakning används brett inom industrin och det finns ett ökat intresse för att applicera tillståndsövervakning inom spårfordons olika system. Tillståndsbaserat underhåll kan potentiellt öka ett systems säkerhet och tillgänglighetsamtidigt som det kan minska de totala underhållskostnaderna.Detta examensarbete undersöker möjligheten att applicera tillståndsövervakning av komponenter i fjädringssystem, i detta fall dämpare, hos spårfordon. Det finns olika metoder för att upptäcka försämringar i komponenternas skick, från matematisk modellering av systemet till mer ”kunskaps-baserade” metodersom endast använder stora mängder data för att upptäcka mönster i en större skala. I detta arbete utforskas den sistnämnda metoden, där accelerationssignaler inhämtas från axelboxar, boggieramar samt vagnskorg från en simuleringsmodellav ett spårfordon. Dessa signaler är extraherade nära de dämpare som övervakas, och används för att beräkna frekvenssvarsfunktioner mellan axelboxar och boggieramar, samt mellan boggieramar och vagnskorg. Tanken är att frekvenssvarsfunktionerna förändras när dämparnas skick förändras ochpå så sätt fungera som indikatorer av dämparnas skick. Frekvenssvarsfunktionerna används sedan för att träna och testa olika klassificeringsalgoritmer för att kunna urskilja olika dämparfel.Detta arbete undersöker vidare vilka klassificeringsalgoritmer som visar lovande resultat för detta problem, och vilka av dessa som presterar bäst med avseende på noggrannheten i prediktionerna, samt två andra mått på algoritmernasprestanda. En annan aspekt som undersöks är möjligheten att applicera dimensionalitetsminskning på de extraherade indikatorerna. Detta arbete undersöker också hur de tre prestandamåtten som används påverkas av typiska förändringar i driftsförhållanden för ett spårfordon såsom varierande exciteringfrån spåret och vagnkorgsmassa. Resultaten visar lovande prestanda för klassificeringsalgoritmen ”Linear Support Vector Machine” som använder hela rymden med felindikatorer, samt algoritmen ”Linear Discriminant Analysis” i kombination med ”Principal Component Analysis” dimensionalitetsreducering. Condition monitoring condition based maintenance FDI diagnostics machine learning classification algorithms dimensionality reduction feature selection feature transformation frequency response functions. Tillståndsövervakning tillståndsbaserat underhåll FDI diagnostik maskininlärning klassificeringsalgoritmer dimensionalitetsreducering särdragsextrahering särdragstransformering frekvenssvarsfunktioner. Vehicle Engineering Farkostteknik

Search results