Global ETD Search

221	Modified Kernel Principal Component Analysis and Autoencoder Approaches to Unsupervised Anomaly Detection Merrill, Nicholas Swede 01 June 2020 (has links) Unsupervised anomaly detection is the task of identifying examples that differ from the normal or expected pattern without the use of labeled training data. Our research addresses shortcomings in two existing anomaly detection algorithms, Kernel Principal Component Analysis (KPCA) and Autoencoders (AE), and proposes novel solutions to improve both of their performances in the unsupervised settings. Anomaly detection has several useful applications, such as intrusion detection, fault monitoring, and vision processing. More specifically, anomaly detection can be used in autonomous driving to identify obscured signage or to monitor intersections. Kernel techniques are desirable because of their ability to model highly non-linear patterns, but they are limited in the unsupervised setting due to their sensitivity of parameter choices and the absence of a validation step. Additionally, conventionally KPCA suffers from a quadratic time and memory complexity in the construction of the gram matrix and a cubic time complexity in its eigendecomposition. The problem of tuning the Gaussian kernel parameter, $sigma$, is solved using the mini-batch stochastic gradient descent (SGD) optimization of a loss function that maximizes the dispersion of the kernel matrix entries. Secondly, the computational time is greatly reduced, while still maintaining high accuracy by using an ensemble of small, textit{skeleton} models and combining their scores. The performance of traditional machine learning approaches to anomaly detection plateaus as the volume and complexity of data increases. Deep anomaly detection (DAD) involves the applications of multilayer artificial neural networks to identify anomalous examples. AEs are fundamental to most DAD approaches. Conventional AEs rely on the assumption that a trained network will learn to reconstruct normal examples better than anomalous ones. In practice however, given sufficient capacity and training time, an AE will generalize to reconstruct even very rare examples. Three methods are introduced to more reliably train AEs for unsupervised anomaly detection: Cumulative Error Scoring (CES) leverages the entire history of training errors to minimize the importance of early stopping and Percentile Loss (PL) training aims to prevent anomalous examples from contributing to parameter updates. Lastly, early stopping via Knee detection aims to limit the risk of over training. Ultimately, the two new modified proposed methods of this research, Unsupervised Ensemble KPCA (UE-KPCA) and the modified training and scoring AE (MTS-AE), demonstrates improved detection performance and reliability compared to many baseline algorithms across a number of benchmark datasets. / Master of Science / Anomaly detection is the task of identifying examples that differ from the normal or expected pattern. The challenge of unsupervised anomaly detection is distinguishing normal and anomalous data without the use of labeled examples to demonstrate their differences. This thesis addresses shortcomings in two anomaly detection algorithms, Kernel Principal Component Analysis (KPCA) and Autoencoders (AE) and proposes new solutions to apply them in the unsupervised setting. Ultimately, the two modified methods, Unsupervised Ensemble KPCA (UE-KPCA) and the Modified Training and Scoring AE (MTS-AE), demonstrates improved detection performance and reliability compared to many baseline algorithms across a number of benchmark datasets. Machine learning Deep learning (Machine learning) Anomaly Detection Autoencoder Kernel Principal Component Analysis
222	Effects of Manufacturing Deviations on Core Compressor Blade Performance De Losier, Clayton Ray 20 April 2009 (has links) There has been recent incentive for understanding the possible deleterious effects that manufacturing deviations can have on compressor blade performance. This is of particular importance in today's age, as compressor designs are pushing operating limits by employing fewer stages with higher loadings and are designed to operate at ever higher altitudes. Deviations in these advanced, as well as legacy designs, could negatively affect the performance and operation of a core compressor; thus, a numerical investigation to quantify manufacturing deviations and their effects is undertaken. Data from three radial sections of every compressor blade in a single row of a production compressor is used as the basis for this investigation. Deviations from the compressor blade design intent to the as-manufactured blades are quantified with a statistical method known as principle component analysis (PCA). MISES, an Euler solver coupled with integral boundary-layer calculations, is used to analyze the effects that the aforementioned deviations have on compressor blade performance when the inlet flow conditions produce a Mach number of approximately 0.7 and a Reynolds number of approximately 6.5e5. It was found that the majority of manufacturing deviations were within a range of plus or minus 4 percent of the design intent, and deviations at the leading edge had a critical effect on performance. Of particular interest is the fact that deviations at the leading edge not only degraded performance but significantly changed the boundary-layer behavior from that of the design case. / Master of Science manufacturing deviations MISES gas turbine compressor leading edge principal component analysis
223	A machine learning approach for ethnic classification: the British Pakistani face Khalid Jilani, Shelina, Ugail, Hassan, Bukar, Ali M., Logan, Andrew J., Munshi, Tasnim January 2017 (has links) No / Ethnicity is one of the most salient clues to face identity. Analysis of ethnicity-specific facial data is a challenging problem and predominantly carried out using computer-based algorithms. Current published literature focusses on the use of frontal face images. We addressed the challenge of binary (British Pakistani or other ethnicity) ethnicity classification using profile facial images. The proposed framework is based on the extraction of geometric features using 10 anthropometric facial landmarks, within a purpose-built, novel database of 135 multi-ethnic and multi-racial subjects and a total of 675 face images. Image dimensionality was reduced using Principle Component Analysis and Partial Least Square Regression. Classification was performed using Linear Support Vector Machine. The results of this framework are promising with 71.11% ethnic classification accuracy using a PCA algorithm + SVM as a classifier, and 76.03% using PLS algorithm + SVM as a classifier. Face Principal component analysis Support vector machines Feature extraction Classification algorithms Algorithm design and analysis Databases
224	Unsupervised Learning for Efficient Underwriting Dalla Torre, Elena January 2024 (has links) In the field of actuarial science, statistical methods have been extensively studied toestimate the risk of insurance. These methods are good at estimating the risk of typicalinsurance policies, as historical data is available. However, their performance can be pooron unique insurance policies, which require the manual assessment of an underwriter. Aclassification of insurance policies on a unique/typical scale would help insurance companiesallocate manual resources more efficiently and validate the goodness of fit of thepricing models on unique objects. The aim of this thesis is to use outlier detection methodsto identify unique non-life insurance policies. The many categorical nominal variablespresent in insurance policy data sets represent a challenge when applying outlier detectionmethods. Therefore, we also explore different ways to derive informative numericalrepresentations of categorical nominal variables. First, as a baseline, we use the principalcomponent analysis of mixed data to find a numerical representation of categorical nominalvariables and the principal component analysis to identify unique insurances. Then,we see whether better performance can be achieved using autoencoders which can capturecomplex non-linearities. In particular, we learn a numerical representation of categoricalnominal variables using the encoder layer of an autoencoder, and we use a different autoencoderto identify unique insurances. Since we are in an unsupervised setting, the twomethods are compared by performing a simulation study and using the NLS-KDD dataset. The analysis shows autoencoders are superior at identifying unique objects than principalcomponent analysis. We conclude that the ability of autoencoders to model complexnon-linearities between the variables allows for this class of methods to achieve superiorperformance. Datadriven Underwriting Outlier Detection Autoencoders Principal Component Analysis Representation Learning Probability Theory and Statistics Sannolikhetsteori och statistik
225	Emprego de técnicas de análise exploratória de dados utilizados em Química Medicinal / Use of different techniques for exploratory data analysis in Medicinal Chemistry Gertrudes, Jadson Castro 10 September 2013 (has links) Pesquisas na área de Química Medicinal têm direcionado esforços na busca por métodos que acelerem o processo de descoberta de novos medicamentos. Dentre as diversas etapas relacionadas ao longo do processo de descoberta de substâncias bioativas está a análise das relações entre a estrutura química e a atividade biológica de compostos. Neste processo, os pesquisadores da área de Química Medicinal analisam conjuntos de dados que são caracterizados pela alta dimensionalidade e baixo número de observações. Dentro desse contexto, o presente trabalho apresenta uma abordagem computacional que visa contribuir para a análise de dados químicos e, consequentemente, a descoberta de novos medicamentos para o tratamento de doenças crônicas. As abordagens de análise exploratória de dados, utilizadas neste trabalho, combinam técnicas de redução de dimensionalidade e de agrupamento para detecção de estruturas naturais que reflitam a atividade biológica dos compostos analisados. Dentre as diversas técnicas existentes para a redução de dimensionalidade, são discutidas o escore de Fisher, a análise de componentes principais e a análise de componentes principais esparsas. Quanto aos algoritmos de aprendizado, são avaliados o k-médias, fuzzy c-médias e modelo de misturas ICA aperfeiçoado. No desenvolvimento deste trabalho foram utilizados quatro conjuntos de dados, contendo informações de substâncias bioativas, sendo que dois conjuntos foram relacionados ao tratamento da diabetes mellitus e da síndrome metabólica, o terceiro conjunto relacionado a doenças cardiovasculares e o último conjunto apresenta substâncias que podem ser utilizadas no tratamento do câncer. Nos experimentos realizados, os resultados alcançados sugerem a utilização das técnicas de redução de dimensionalidade juntamente com os algoritmos não supervisionados para a tarefa de agrupamento dos dados químicos, uma vez que nesses experimentos foi possível descrever níveis de atividade biológica dos compostos estudados. Portanto, é possível concluir que as técnicas de redução de dimensionalidade e de agrupamento podem possivelmente ser utilizadas como guias no processo de descoberta e desenvolvimento de novos compostos na área de Química Medicinal. / Researches in Medicinal Chemistry\'s area have focused on the search of methods that accelerate the process of drug discovery. Among several steps related to the process of discovery of bioactive substances there is the analysis of the relationships between chemical structure and biological activity of compounds. In this process, researchers of medicinal chemistry analyze data sets that are characterized by high dimensionality and small number of observations. Within this context, this work presents a computational approach that aims to contribute to the analysis of chemical data and, consequently, the discovery of new drugs for the treatment of chronic diseases. Approaches used in exploratory data analysis, employed in this work, combine techniques of dimensionality reduction and clustering for detecting natural structures that reflect the biological activity of the analyzed compounds. Among several existing techniques for dimensionality reduction, we have focused the Fisher\'s score, principal component analysis and sparse principal component analysis. For the clustering procedure, this study evaluated k-means, fuzzy c-means and enhanced ICA mixture model. In order to perform experiments, we used four data sets, containing information of bioactive substances. Two sets are related to the treatment of diabetes mellitus and metabolic syndrome, the third set is related to cardiovascular disease and the latter set has substances that can be used in cancer treatment. In the experiments, the obtained results suggest the use of dimensionality reduction techniques along with clustering algorithms for the task of clustering chemical data, since from these experiments, it was possible to describe different levels of biological activity of the studied compounds. Therefore, we conclude that the techniques of dimensionality reduction and clustering can be used as guides in the process of discovery and development of new compounds in the field of Medicinal Chemistry Agrupamento de dados Análise de componentes principais Clustering Dimensionality reduction Principal component analysis Redução de dimensionalidade Seleção de variáveis Sparse principal component analysis Structure activity relationship Variable selection
226	O uso de recursos linguísticos para mensurar a semelhança semântica entre frases curtas através de uma abordagem híbrida Silva, Allan de Barcelos 14 December 2017 (has links) Submitted by JOSIANE SANTOS DE OLIVEIRA (josianeso) on 2018-04-04T11:46:54Z No. of bitstreams: 1 Allan de Barcelos Silva_.pdf: 2298557 bytes, checksum: dc876b1dd44e7a7095219195e809bb88 (MD5) / Made available in DSpace on 2018-04-04T11:46:55Z (GMT). No. of bitstreams: 1 Allan de Barcelos Silva_.pdf: 2298557 bytes, checksum: dc876b1dd44e7a7095219195e809bb88 (MD5) Previous issue date: 2017-12-14 / Nenhuma / Na área de Processamento de Linguagem Natural, a avaliação da similaridade semântica textual é considerada como um elemento importante para a construção de recursos em diversas frentes de trabalho, tais como a recuperação de informações, a classificação de textos, o agrupamento de documentos, as aplicações de tradução, a interação através de diálogos, entre outras. A literatura da área descreve aplicações e técnicas voltadas, em grande parte, para a língua inglesa. Além disso, observa-se o uso prioritário de recursos probabilísticos, enquanto os aspectos linguísticos são utilizados de forma incipiente. Trabalhos na área destacam que a linguística possui um papel fundamental na avaliação de similaridade semântica textual, justamente por ampliar o potencial dos métodos exclusivamente probabilísticos e evitar algumas de suas falhas, que em boa medida são resultado da falta de tratamento mais aprofundado de aspectos da língua. Este contexto é potencializado no tratamento de frases curtas, que consistem no maior campo de utilização das técnicas de similaridade semântica textual, pois este tipo de sentença é composto por um conjunto reduzido de informações, diminuindo assim a capacidade de tratamento probabilístico eficiente. Logo, considera-se vital a identificação e aplicação de recursos a partir do estudo mais aprofundado da língua para melhor compreensão dos aspectos que definem a similaridade entre sentenças. O presente trabalho apresenta uma abordagem para avaliação da similaridade semântica textual em frases curtas no idioma português brasileiro. O principal diferencial apresentado é o uso de uma abordagem híbrida, na qual tanto os recursos de representação distribuída como os aspectos léxicos e linguísticos são utilizados. Para a consolidação do estudo, foi definida uma metodologia que permite a análise de diversas combinações de recursos, possibilitando a avaliação dos ganhos que são introduzidos com a ampliação de aspectos linguísticos e também através de sua combinação com o conhecimento gerado por outras técnicas. A abordagem proposta foi avaliada com relação a conjuntos de dados conhecidos na literatura (evento PROPOR 2016) e obteve bons resultados. / One of the areas of Natural language processing (NLP), the task of assessing the Semantic Textual Similarity (STS) is one of the challenges in NLP and comes playing an increasingly important role in related applications. The STS is a fundamental part of techniques and approaches in several areas, such as information retrieval, text classification, document clustering, applications in the areas of translation, check for duplicates and others. The literature describes the experimentation with almost exclusive application in the English language, in addition to the priority use of probabilistic resources, exploring the linguistic ones in an incipient way. Since the linguistic plays a fundamental role in the analysis of semantic textual similarity between short sentences, because exclusively probabilistic works fails in some way (e.g. identification of far or close related sentences, anaphora) due to lack of understanding of the language. This fact stems from the few non-linguistic information in short sentences. Therefore, it is vital to identify and apply linguistic resources for better understand what make two or more sentences similar or not. The current work presents a hybrid approach, in which are used both of distributed, lexical and linguistic aspects for an evaluation of semantic textual similarity between short sentences in Brazilian Portuguese. We evaluated proposed approach with well-known and respected datasets in the literature (PROPOR 2016) and obtained good results. Processamento de linguagem natural Similaridade semântica textual Linguística Aprendizagem de máquina Support vector machines Word embeddings Principal component analysis Natural language processing Semantic textual similarity Linguistic Machine learning Support vector machines Word embeddings Principal component analysis
227	Emprego de técnicas de análise exploratória de dados utilizados em Química Medicinal / Use of different techniques for exploratory data analysis in Medicinal Chemistry Jadson Castro Gertrudes 10 September 2013 (has links) Pesquisas na área de Química Medicinal têm direcionado esforços na busca por métodos que acelerem o processo de descoberta de novos medicamentos. Dentre as diversas etapas relacionadas ao longo do processo de descoberta de substâncias bioativas está a análise das relações entre a estrutura química e a atividade biológica de compostos. Neste processo, os pesquisadores da área de Química Medicinal analisam conjuntos de dados que são caracterizados pela alta dimensionalidade e baixo número de observações. Dentro desse contexto, o presente trabalho apresenta uma abordagem computacional que visa contribuir para a análise de dados químicos e, consequentemente, a descoberta de novos medicamentos para o tratamento de doenças crônicas. As abordagens de análise exploratória de dados, utilizadas neste trabalho, combinam técnicas de redução de dimensionalidade e de agrupamento para detecção de estruturas naturais que reflitam a atividade biológica dos compostos analisados. Dentre as diversas técnicas existentes para a redução de dimensionalidade, são discutidas o escore de Fisher, a análise de componentes principais e a análise de componentes principais esparsas. Quanto aos algoritmos de aprendizado, são avaliados o k-médias, fuzzy c-médias e modelo de misturas ICA aperfeiçoado. No desenvolvimento deste trabalho foram utilizados quatro conjuntos de dados, contendo informações de substâncias bioativas, sendo que dois conjuntos foram relacionados ao tratamento da diabetes mellitus e da síndrome metabólica, o terceiro conjunto relacionado a doenças cardiovasculares e o último conjunto apresenta substâncias que podem ser utilizadas no tratamento do câncer. Nos experimentos realizados, os resultados alcançados sugerem a utilização das técnicas de redução de dimensionalidade juntamente com os algoritmos não supervisionados para a tarefa de agrupamento dos dados químicos, uma vez que nesses experimentos foi possível descrever níveis de atividade biológica dos compostos estudados. Portanto, é possível concluir que as técnicas de redução de dimensionalidade e de agrupamento podem possivelmente ser utilizadas como guias no processo de descoberta e desenvolvimento de novos compostos na área de Química Medicinal. / Researches in Medicinal Chemistry\'s area have focused on the search of methods that accelerate the process of drug discovery. Among several steps related to the process of discovery of bioactive substances there is the analysis of the relationships between chemical structure and biological activity of compounds. In this process, researchers of medicinal chemistry analyze data sets that are characterized by high dimensionality and small number of observations. Within this context, this work presents a computational approach that aims to contribute to the analysis of chemical data and, consequently, the discovery of new drugs for the treatment of chronic diseases. Approaches used in exploratory data analysis, employed in this work, combine techniques of dimensionality reduction and clustering for detecting natural structures that reflect the biological activity of the analyzed compounds. Among several existing techniques for dimensionality reduction, we have focused the Fisher\'s score, principal component analysis and sparse principal component analysis. For the clustering procedure, this study evaluated k-means, fuzzy c-means and enhanced ICA mixture model. In order to perform experiments, we used four data sets, containing information of bioactive substances. Two sets are related to the treatment of diabetes mellitus and metabolic syndrome, the third set is related to cardiovascular disease and the latter set has substances that can be used in cancer treatment. In the experiments, the obtained results suggest the use of dimensionality reduction techniques along with clustering algorithms for the task of clustering chemical data, since from these experiments, it was possible to describe different levels of biological activity of the studied compounds. Therefore, we conclude that the techniques of dimensionality reduction and clustering can be used as guides in the process of discovery and development of new compounds in the field of Medicinal Chemistry Agrupamento de dados Análise de componentes principais Redução de dimensionalidade Seleção de variáveis Clustering Dimensionality reduction Principal component analysis Sparse principal component analysis Structure activity relationship Variable selection
228	Analys av punktmoln i tre dimensioner Rasmussen, Johan, Nilsson, David January 2017 (has links) Syfte: Att ta fram en metod för att hjälpa mindre sågverk att bättre tillvarata mesta möjliga virke från en timmerstock. Metod: En kvantitativ studie där tre iterationer genomförts enligt Design Science. Resultat: För att skapa en effektiv algoritm som ska utföra volymberäkningar i ett punktmoln som består av cirka två miljoner punkter i ett industriellt syfte ligger fokus i att algoritmen är snabb och visar rätt data. Det primära målet för att göra algoritmen snabb är att bearbeta punktmolnet ett minimalt antal gånger. Den algoritm som uppfyller delmålen i denna studie är Algoritm C. Algoritmen är både snabb och har en låg standardavvikelse på mätfelen. Algoritm C har komplexiteten O(n) vid analys av delpunktmoln. Implikationer: Med utgångspunkt från denna studies algoritm skulle det vara möjligt att använda stereokamerateknik för att hjälpa mindre sågverk att bättre tillvarata mesta möjliga virke från en timmerstock. Begränsningar: Studiens algoritm har utgått från att inga punkter har skapats inuti stocken vilket skulle kunna leda till felplacerade punkter. Om en stock skulle vara krokig överensstämmer inte stockens centrum med z-axelns placering. Detta är något som skulle kunna innebära att z-värdet hamnar utanför stocken, i extremfall, vilket algoritmen inte kan hantera. / Purpose: To develop a method that can help smaller sawmills to better utilize the greatest possible amount of wood from a log. Method: A quantitative study where three iterations has been made using Design Science. Findings: To create an effective algorithm that will perform volume calculations in a point cloud consisting of about two million points for an industrial purpose, the focus is on the algorithm being fast and that it shows the correct data. The primary goal of making the algorithm quick is to process the point cloud a minimum number of times. The algorithm that meets the goals in this study is Algorithm C. The algorithm is both fast and has a low standard deviation of the measurement errors. Algorithm C has the complexity O(n) in the analysis of sub-point clouds. Implications: Based on this study’s algorithm, it would be possible to use stereo camera technology to help smaller sawmills to better utilize the most possible amount of wood from a log. Limitations: The study’s algorithm assumes that no points have been created inside the log, which could lead to misplaced points. If a log would be crooked, the center of the log would not match the z-axis position. This is something that could mean that the z-value is outside of the log, in extreme cases, which the algorithm cannot handle. Point Cloud Principal Component Analysis Point Cloud Library Stereoscopic camera technology Octree Punktmoln Principal Component Analysis Point Cloud Library Stereoskopiska kameror Octree Computer and Information Sciences Data- och informationsvetenskap Computer Sciences Datavetenskap (datalogi)
229	Odlišení pozadí a pohybujících se objektů ve videosekvenci / Separation of background and moving objects in videosequence Martincová, Lucia January 2017 (has links) This diploma thesis deals with separation of backgroud and moving objects in video. Video can be represented as series of frames and each frame represented as low - rank structure - matrix. This thesis describe sparse representation of signals and robust principal component analysis. It also presents and implements algorithms - models for reconstruction of real video.
230	Odlišení pozadí a pohybujících se objektů ve videosekvenci / Separation of background and moving objects in videosequence Komůrková, Lucia January 2018 (has links) This diploma thesis deals with separation of backgroud and moving objects in video. Video can be represented as series of frames and each frame represented as low - rank structure - matrix. This thesis describe sparse representation of signals and robust principal component analysis. It also presents and implements algorithms - models for reconstruction of real video.

Search results