Global ETD Search

101	Sparse Bayesian Time-Varying Covariance Estimation in Many Dimensions Kastner, Gregor 18 September 2016 (has links) (PDF) Dynamic covariance estimation for multivariate time series suffers from the curse of dimensionality. This renders parsimonious estimation methods essential for conducting reliable statistical inference. In this paper, the issue is addressed by modeling the underlying co-volatility dynamics of a time series vector through a lower dimensional collection of latent time-varying stochastic factors. Furthermore, we apply a Normal-Gamma prior to the elements of the factor loadings matrix. This hierarchical shrinkage prior effectively pulls the factor loadings of unimportant factors towards zero, thereby increasing parsimony even more. We apply the model to simulated data as well as daily log-returns of 300 S&P 500 stocks and demonstrate the effectiveness of the shrinkage prior to obtain sparse loadings matrices and more precise correlation estimates. Moreover, we investigate predictive performance and discuss different choices for the number of latent factors. Additionally to being a stand-alone tool, the algorithm is designed to act as a "plug and play" extension for other MCMC samplers; it is implemented in the R package factorstochvol. (author's abstract) / Series: Research Report Series / Department of Statistics and Mathematics JEL C32; C51; C58
102	Multi-Label Dimensionality Reduction January 2011 (has links) abstract: Multi-label learning, which deals with data associated with multiple labels simultaneously, is ubiquitous in real-world applications. To overcome the curse of dimensionality in multi-label learning, in this thesis I study multi-label dimensionality reduction, which extracts a small number of features by removing the irrelevant, redundant, and noisy information while considering the correlation among different labels in multi-label learning. Specifically, I propose Hypergraph Spectral Learning (HSL) to perform dimensionality reduction for multi-label data by exploiting correlations among different labels using a hypergraph. The regularization effect on the classical dimensionality reduction algorithm known as Canonical Correlation Analysis (CCA) is elucidated in this thesis. The relationship between CCA and Orthonormalized Partial Least Squares (OPLS) is also investigated. To perform dimensionality reduction efficiently for large-scale problems, two efficient implementations are proposed for a class of dimensionality reduction algorithms, including canonical correlation analysis, orthonormalized partial least squares, linear discriminant analysis, and hypergraph spectral learning. The first approach is a direct least squares approach which allows the use of different regularization penalties, but is applicable under a certain assumption; the second one is a two-stage approach which can be applied in the regularization setting without any assumption. Furthermore, an online implementation for the same class of dimensionality reduction algorithms is proposed when the data comes sequentially. A Matlab toolbox for multi-label dimensionality reduction has been developed and released. The proposed algorithms have been applied successfully in the Drosophila gene expression pattern image annotation. The experimental results on some benchmark data sets in multi-label learning also demonstrate the effectiveness and efficiency of the proposed algorithms. / Dissertation/Thesis / Ph.D. Computer Science 2011 Computer Science canonical correlation analysis dimensionality reduction hypergraph spectral learning multi-label learning partial least squares
103	Dimensionality Reduction and Fusion Strategies for the Design of Parametric Signal Classifiers Kota, Srinivas 01 December 2010 (has links) This dissertation focuses on two specific problems related to the design of parametric signal classifiers: dimensionality reduction to overcome the curse of dimensionality and information fusion to improve classification by exploiting complementary information from multiple sensors or multiple classifiers. Dimensionality reduction is achieved by introducing a strategy to rank and select a subset of principal component transform (PCT) coefficients that carry the most useful discriminatory information. The criteria considered for ranking transform coefficients include magnitude, variance, inter-class separation, and classification accuracies of individual transform coefficients. The ranking strategy not only facilitates overcoming the dimensionality curse for multivariate classifier implementation but also provides a means to further select, out of a rank-ordered set, a smaller set of features that give the best classification accuracies. Because the class-conditional densities of transform feature vectors are often assumed to be multivariate Gaussian, the dimensionality reduction strategy focuses on overcoming the specific problems encountered in the design of practical multivariate Gaussian classifiers using transform feature vectors. Through experiments with event related potentials (ERPs) and ear pressure signals, it is shown that the dimension of the feature space can be decreased quite significantly by means of the feature ranking and selection strategy. Furthermore, the resulting Gaussian classifiers yield higher classification accuracies than those reported in previous classification studies on the same signal sets. Amongst the four feature selection criteria, Gaussian classifiers using the maximum magnitude and maximum variance selection criteria gave the best classification accuracies across the two sets of classification experiments. For the multisensor case, dimensionality reduction is achieved by introducing a spatio-temporal array model to observe the signals across channels and time, simultaneously. A two-step process which uses the Kolmogrov-Smirnov test and the Lilliefors test is formulated to select the array elements which have different Gaussian densities across all signal categories. Selecting spatio-temporal elements that fit the assumed model and also statistically differ across the signal categories not only decreases the dimensionality significantly but also ensures high classification accuracies. The selection is dynamic in the sense that selecting spatio-temporal array elements corresponds to selecting samples of different sensors at different time-instants. Each selected array element is classified using a univariate Gaussian classifier and the resulting decisions are fused into a decision fusion vector which is classified using a discrete Bayes classifier. The application of the resulting dynamic channel selection-based classification strategy is demonstrated by designing and testing classifiers for multi-channel ERPs and it is shown that strategy yields high classification accuracies. Most noteworthy of the two dimensionality reduction strategies is the fact that the multivariate Gaussian signal classifiers developed can be implemented without having to collect a prohibitively large number of training signals simply to satisfy the dimensionality conditions. Consequently, the classification strategies can be beneficial for designing personalized human-machine-interface (HMI) signal classifiers for individuals from whom only a limited number of training signals can reliably be collected due to severe disabilities. The information fusion strategy introduced is aimed at improving the performance of signal classifiers by combining signals from multiple sensors or by combining decisions of multiple classifiers. Fusion classifiers with diverse components (classifiers or data sets) outperform those with less diverse components. Determining component diversity, therefore, is of the utmost importance in the design of fusion classifiers which are often employed in clinical diagnostic and numerous other pattern recognition problems. A new pairwise diversity-based ranking strategy is introduced to select a subset of ensemble components, which when combined, will be more diverse than any other component subset of the same size. The strategy is unified in the sense that the components can be either polychotomous classifiers or polychotomous data sets. Classifier fusion and data fusion systems are formulated based on the diversity selection strategy and the application of the two fusion strategies are demonstrated through the classification of multi-channel ERPs. From the results it is concluded that data fusion outperforms classifier fusion. It is also shown that the diversity-based data fusion system outperforms the system using randomly selected data components. Furthermore, it is demonstrated that the combination of data components that yield the best performance, in a relative sense, can be determined through the diversity selection strategy. Dimensionality Reduction Event Related Potentials Information Fusion Pattern Recognition Signal Processing Tongue-Movement Ear Pressure Signals
104	Um modelo neural de aprimoramento progressivo para redução de dimensionalidade / A Progressive Enhancement Neural Model for dimensionality reduction Camargo, Sandro da Silva January 2010 (has links) Nas últimas décadas, avanços em tecnologias de geração, coleta e armazenamento de dados têm contribuído para aumentar o tamanho dos bancos de dados nas diversas áreas de conhecimento humano. Este aumento verifica-se não somente em relação à quantidade de amostras de dados, mas principalmente em relação à quantidade de características descrevendo cada amostra. A adição de características causa acréscimo de dimensões no espaço matemático, conduzindo ao crescimento exponencial do hipervolume dos dados, problema denominado “maldição da dimensionalidade”. A maldição da dimensionalidade tem sido um problema rotineiro para cientistas que, a fim de compreender e explicar determinados fenômenos, têm se deparado com a necessidade de encontrar estruturas significativas ocultas, de baixa dimensão, dentro de dados de alta dimensão. Este processo denomina-se redução de dimensionalidade dos dados (RDD). Do ponto de vista computacional, a conseqüência natural da RDD é uma diminuição do espaço de busca de hipóteses, melhorando o desempenho e simplificando os resultados da modelagem de conhecimento em sistemas autônomos de aprendizado. Dentre as técnicas utilizadas atualmente em sistemas autônomos de aprendizado, as redes neurais artificiais (RNAs) têm se tornado particularmente atrativas para modelagem de sistemas complexos, principalmente quando a modelagem é difícil ou quando a dinâmica do sistema não permite o controle on-line. Apesar de serem uma poderosa técnica, as RNAs têm seu desempenho afetado pela maldição da dimensionalidade. Quando a dimensão do espaço de entradas é alta, as RNAs podem utilizar boa parte de seus recursos para representar porções irrelevantes do espaço de busca, dificultando o aprendizado. Embora as RNAs, assim como outras técnicas de aprendizado de máquina, consigam identificar características mais informativas para um processo de modelagem, a utilização de técnicas de RDD frequentemente melhora os resultados do processo de aprendizado. Este trabalho propõe um wrapper que implementa um modelo neural de aprimoramento progressivo para RDD em sistemas autônomos de aprendizado supervisionado visando otimizar o processo de modelagem. Para validar o modelo neural de aprimoramento progressivo, foram realizados experimentos com bancos de dados privados e de repositórios públicos de diferentes domínios de conhecimento. A capacidade de generalização dos modelos criados é avaliada por meio de técnicas de validação cruzada. Os resultados obtidos demonstram que o modelo neural de aprimoramento progressivo consegue identificar características mais informativas, permitindo a RDD, e tornando possível criar modelos mais simples e mais precisos. A implementação da abordagem e os experimentos foram realizados no ambiente Matlab, utilizando o toolbox de RNAs. / In recent decades, advances on data generation, collection and storing technologies have contributed to increase databases size in different knowledge areas. This increase is seen not only regarding samples amount, but mainly regarding dimensionality, i.e. the amount of features describing each sample. Features adding causes dimension increasing in mathematical space, leading to an exponential growth of data hypervolume. This problem is called “the curse of dimensionality”. The curse of dimensionality has been a routine problem for scientists, that in order to understand and explain some phenomena, have faced with the demand to find meaningful low dimensional structures hidden in high dimensional search spaces. This process is called data dimensionality reduction (DDR). From computational viewpoint, DDR natural consequence is a reduction of hypothesis search space, improving performance and simplifying the knowledge modeling results in autonomous learning systems. Among currently used techniques in autonomous learning systems, artificial neural networks (ANNs) have becoming particularly attractive to model complex systems, when modeling is hard or when system dynamics does not allow on-line control. Despite ANN being a powerful tool, their performance is affected by the curse of dimensionality. When input space dimension is high, ANNs can use a significant part of their resources to represent irrelevant parts of input space making learning process harder. Although ANNs, and other machine learning techniques, can identify more informative features for a modeling process, DDR techniques often improve learning results. This thesis proposes a wrapper which implements a Progressive Enhancement Neural Model to DDR in supervised autonomous learning systems in order to optimize the modeling process. To validate the proposed approach, experiments were performed with private and public databases, from different knowledge domains. The generalization ability of developed models is evaluated by means of cross validation techniques. Obtained results demonstrate that the proposed approach can identify more informative features, allowing DDR, and becoming possible to create simpler and more accurate models. The implementation of the proposed approach and related experiments were performed in Matlab Environment, using ANNs toolbox. Redes neurais Inteligência artificial Heurística Lógica modal Heuristics Wrapper Dimensionality reduction Feature selection Neural modeling
105	CLASSIFICATION OF ONE-DIMENSIONAL AND TWO-DIMENSIONAL SIGNALS Kanneganti, Raghuveer 01 August 2014 (has links) This dissertation focuses on the classification of one-dimensional and two-dimensional signals. The one-dimensional signal classification problem involves the classification of brain signals for identifying the emotional responses of human subjects under given drug conditions. A strategy is developed to accurately classify ERPs in order to identify human emotions based on brain reactivity to emotional, neutral, and cigarette-related stimuli in smokers. A multichannel spatio-temporal model is employed to overcome the curse of dimensionality that plagues the design of parametric multivariate classifiers for multi-channel ERPs. The strategy is tested on the ERPs of 156 smokers who participated in a smoking cessation program. One half of the subjects were given nicotine patches and the other half were given placebo patches. ERPs were collected from 29 channel in response to the presentation of the pictures with emotional (pleasant and unpleasant), neutral/boring, and cigarette-related content. It is shown that human emotions can be classified accurately and the results also show that smoking cessation causes a drop in the classification accuracies of emotions in the placebo group, but not in the nicotine patch group. Given that individual brain patterns were compared with group average brain patterns, the findings support the view that individuals tend to have similar brain reactions to different types of emotional stimuli. Overall, this new classification approach to identify differential brain responses to different emotional types could lead to new knowledge concerning brain mechanisms associated with emotions common to most or all people. This novel classification technique for identifying emotions in the present study suggests that smoking cessation without nicotine replacement results in poorer differentiation of brain responses to different emotional stimuli. Future, directions in this area would be to use these methods to assess individual differences in responses to emotional stimuli and to different drug treatments. Advantages of this and other brain-based assessment include temporal precision (e.g, 400-800 ms post stimulus), and the elimination of biases related to self-report measures. The two-dimensional signal classification problems include the detection of graphite in testing documents and the detection of fraudulent bubbles in test sheets. A strategy is developed to detect graphite responses in optical mark recognition (OMR) documents using inexpensive visible light scanners. The main challenge in the formulation of the strategy is that the detection should be invariant to the numerous background colors and artwork in typical optical mark recognition documents. A test document is modeled as a superposition of a graphite response image and a background image. The background image in turn is modeled as superposition of screening artwork, lines, and machine text components. A sequence of image processing operations and a pattern recognition algorithm are developed to estimate the graphite response image from a test document by systematically removing the components of the background image. The proposed strategy is tested on a wide range of scanned documents and it is shown that the estimated graphite response images are visually similar to those scanned by very expensive infra-red scanners currently employed for optical mark recognition. The robustness of the detection strategy is also demonstrated by testing a large number of simulated test documents. A procedure is also developed to autonomously determine if cheating has occurred by detecting the presence of aberrant responses in scanned OMR test books. The challenges introduced by the significant imbalance in the numbers of typical and aberrant bubbles were identified. The aberrant bubble detection problem is formulated as an outlier detection problem. A feature based outlier detection procedure in conjunction with a one-class SVM classifier is developed. A multi-criteria rank-of-rank-sum technique is introduced to rank and select a subset of features from a pool of candidate features. Using the data set of 11 individuals, it is shown that a detection accuracy of over 90% is possible. Experiments conducted on three real test books flagged for suspected cheating showed that the proposed strategy has the potential to be deployed in practice. dimensionality reduction electroencephalogram forensic analysis optical mark recognition outlier detection pattern recognition
106	Redução de dimensionalidade aplicada à diarização de locutor / Dimensionality reduction applied to speaker diarization Silva, Sérgio Montazzolli January 2013 (has links) Atualmente existe uma grande quantidade de dados multimídia sendo geradas todos os dias. Estes dados são oriundos de diversas fontes, como transmissões de rádio ou televisão, gravações de palestras, encontros, conversas telefônicas, vídeos e fotos capturados por celular, entre outros. Com isto, nos últimos anos o interesse pela transcrição de dados multimídia tem crescido, onde, no processamento de voz, podemos destacar as áreas de Reconhecimento de Locutor, Reconhecimento de Fala, Diarização de Locutor e Rastreamento de Locutores. O desenvolvimento destas áreas vem sendo impulsionado e direcionado pelo NIST, que periodicamente realiza avaliações sobre o estado-da-arte. Desde 2000, a tarefa de Diarização de Locutor tem se destacado como uma das principáis frentes de pesquisa em transcrição de dados de voz, tendo sido avaliada pelo NIST por diversas vezes na última década. O objetivo desta tarefa é encontrar o número de locutores presentes em um áudio, e rotular seus respectivos trechos de fala, sem que nenhuma informação tenha sido previamente fornecida. Em outras palavras, costuma-se dizer que o objetivo é responder a questão "Quem falou e quando?". Um dos grandes problemas nesta área é se conseguir obter um bom modelo para cada locutor presente no áudio, dada a pouca quantidade de informações e a alta dimensionalidade dos dados. Neste trabalho, além da criação de um Sistema de Diarização de Locutor, iremos tratar este problema mediante à redução de dimensionalidade através de análises estatísticas. Usaremos a Análise de Componentes Principáis, a Análise de Discriminantes Lineares e a recém apresentada Análise de Semi-Discriminantes Lineares. Esta última utiliza um método de inicialização estático, iremos propor o uso de um método dinâmico, através da detecção de pontos de troca de locutor. Também investigaremos o comportamento destas análises sob o uso simultâneo de múltiplas parametrizações de curto prazo do sinal acústico. Os resultados obtidos mostram que é possível preservar - ou até melhorar - o desempenho do sistema, mesmo reduzindo substâncialmente o número de dimensões. Isto torna mais rápida a execução de algoritmos de Aprendizagem de Máquina e reduz a quantidade de memória necessária para armezenar os dados. / Currently, there is a large amount of multimedia data being generated everyday. These data come from various sources, such as radio or television, recordings of lectures and meetings, telephone conversations, videos and photos captured by mobile phone, among others. Because of this, interest in automatic multimedia data transcription has grown in recent years, where, for voice processing, we can highlight the areas of Speaker Recognition, Speech Recognition, Speaker Diarization and Speaker Tracking. The development of such areas is being conducted by NIST, which periodically promotes state-of-the-art evaluations. Since 2000, the task of Speaker Diarization has emerged as one of the main research fields in voice data transcription, having been evaluated by NIST several times in the last decade. The objective of this task is to find the number of speakers in an audio recording, and properly label their speech segments without the use of any training information. In other words , it is said that the goal of Speaker Diarization is to answer the question "Who spoke when?". A major problem in this area is to obtain a good speaker model from the audio, given the limited amount of information available and the high dimensionality of the data. In the current work, we will describe how our Speaker Diarization System was built, and we will address the problem mentioned by lowering the dimensionality of the data through statistical analysis. We will use the Principal Component Analysis, the Linear Discriminant Analysis and the newly presented Fisher Linear Semi-Discriminant Analysis. The latter uses a static method for initialization, and here we propose the use of a dynamic method by the use of a speaker change points detection algorithm. We also investigate the behavior of these data analysis techniques under the simultaneous use of multiple short term features. Our results show that it is possible to maintain - and even improve - the system performance, by substantially reducing the number of dimensions. As a consequence, the execution of Machine Learning algorithms is accelerated while reducing the amount of memory required to store the data. Processamento : Linguagem natural Voz computacional Speaker diarization Discriminant analysis Dimensionality reduction
107	Evaluating immersive approaches to multidimensional information visualization / Avaliando abordagens imersivas para visualização de informações multidimensionais Wagner Filho, Jorge Alberto January 2018 (has links) O uso de novos recursos de display e interação para suportar a visualização imersiva de dados e incrementar o raciocínio analítico é uma tendência de pesquisa em Visualização de Informações. Neste trabalho, avaliamos o uso de ambientes baseados em HMD para a exploração de dados multidimensionais, representados em scatterplots 3D como resultado de redução de dimensionalidade. Nós apresentamos uma nova modelagem para o problema de avaliação neste contexto, levando em conta os dois fatores cuja interação determina o impacto no desempenho total nas tarefas: a diferença nos erros introduzidos ao se realizar redução de dimensionalidade para 2D ou 3D, e a diferença nos erros de percepção humana sob diferentes condições de visualização. Este framework em duas etapas oferece uma abordagem simples para estimar os benefícios de se utilizar um setup 3D imersivo para um dado conjunto de dados. Como caso de uso, os erros de redução de dimensionalidade para uma série de conjuntos de dados de votações na Câmara dos Deputados, ao se utilizar duas ou três dimensões, são avaliados por meio de uma abordagem empírica baseada em tarefas. O erro de percepção e o desempenho geral de tarefa, por sua vez, são avaliados através de estudos controlados comparativos com usuários. Comparando-se visualizações baseadas em desktop (2D e 3D) e em HMD (3D), resultados iniciais indicaram que os erros de percepção foram baixos e similares em todas abordagens, resultando em benefícios para o desempenho geral em ambas técnicas 3D A condição imersiva, no entanto, demonstrou requerer menor esforço para encontrar as informações e menos navegação, além de prover percepções subjetivas de precisão e engajamento muito maiores. Todavia, o uso de navegação por voo livre resultou em tempos ineficientes e frequente desconforto nos usuários. Em um segundo momento, implementamos e avaliamos uma abordagem alternativa de exploração de dados, onde o usuário permanece sentado e mudanças no ponto de vista só são possíveis por meio de movimentos físicos. Toda a manipulação é realizada diretamente por gestos aéreos naturais, com os dados sendo renderizados ao alcance dos braços. A reprodução virtual de uma cópia exata da mesa de trabalho do analista visa aumentar a imersão e possibilitar a interação tangível com controles e informações bidimensionais associadas. Um segundo estudo com usuários foi conduzido em comparação a uma versão equivalente baseada em desktop, explorando um conjunto de 9 tarefas representativas de percepção e interação, baseadas em literatura prévia. Nós demonstramos que o nosso protótipo, chamado VirtualDesk, apresentou resultados excelentes em relação a conforto e imersão, e desempenho equivalente ou superior em todas tarefas analíticas, enquanto adicionando pouco ou nenhum tempo extra e ampliando a exploração dos dados. / The use of novel displays and interaction resources to support immersive data visualization and improve the analytical reasoning is a research trend in Information Visualization. In this work, we evaluate the use of HMD-based environments for the exploration of multidimensional data, represented in 3D scatterplots as a result of dimensionality reduction. We present a new modelling for the evaluation problem in such a context, accounting for the two factors whose interplay determine the impact on the overall task performance: the difference in errors introduced by performing dimensionality reduction to 2D or 3D, and the difference in human perception errors under different visualization conditions. This two-step framework offers a simple approach to estimate the benefits of using an immersive 3D setup for a particular dataset. As use case, the dimensionality reduction errors for a series of roll calls datasets when using two or three dimensions are evaluated through an empirical task-based approach. The perception error and overall task performance are assessed through controlled comparative user studies. When comparing desktop-based (2D and 3D) with an HMD-based (3D) visualization, initial results indicated that perception errors were low and similar in all approaches, resulting in overall performance benefits in both 3D techniques. The immersive condition, however, was found to require less effort to find information and less navigation, besides providing much larger subjective perception of accuracy and engagement. Nonetheless, the use of flying navigation resulted in inefficient times and frequent user discomfort In a second moment, we implemented and evaluated an alternative data exploration approach where the user remains seated and viewpoint change is only realisable through physical movements. All manipulation is done directly by natural mid-air gestures, with the data being rendered at arm’s reach. The virtual reproduction of an exact copy of the analyst’s desk aims to increase immersion and enable tangible interaction with controls and two dimensional associated information. A second user study was carried out comparing this scenario to a desktop-based equivalent, exploring a set of 9 representative perception and interaction tasks based on previous literature. We demonstrate that our prototype setup, named VirtualDesk, presents excellent results regarding user comfort and immersion, and performs equally or better in all analytical tasks, while adding minimal or no time overhead and amplifying data exploration. 3D Visualização Immersive visualization Abstract information visualization Dimensionality reduction 3D scatterplots Virtual reality
108	Átomos hidrogenoides em espaços com dimensionalidade D6=3: os casos não relativístico e relativístico / Hydrogenic atoms in space with dimensionality D6=3 : cases non-relativistic and relativistic Jordan Martins 16 December 2014 (has links) Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / A possibilidade da existência de átomos de hidrogênio estáveis em dimensões superiores a três é abordada. O problema da dimensionalidade é visto como um problema de Física, no qual relacionam-se algumas leis físicas com a dimensão espacial. A base da análise deste trabalho faz uso das equações de Schrödinger (não relativística) e de Dirac (relativística). Nos dois casos, utiliza-se a generalização tanto do setor cinemático bem como o setor de interação coulombiana para variar o parâmetro topológico dimensão. Para o caso não relativístico, os auto-valores de energia e as auto-funções são obtidas através do método numérico de Numerov. Embora existam soluções em espaços com dimensões superiores, os resultados obtidos no presente trabalho indicam que a natureza deve, de alguma maneira, se manifestar em um espaço tridimensional. / The question of whether hydrogen atoms can exist or not in at spaces with a number of dimensions D greater than 3 is revisited. The dimensionality problem is taken as physical one, which physics laws are related to the space dimension. The main framework of the analysis through out this tesis are the Schrödinger Equation (non relativistic) and the Dirac Equation (relativistic). Both cases use the kinematic and Coulombian interaction sectors generalized in order to vary the topological paramenter dimension. For the non relativistic case, eigenvalues of energy and eigenfunctions are evaluated using the Numerov numerical method. Although there are solutions in higher dimensions, the results obtained in this tesis indicate that nature should somehow prefer a tridimensional space. Espaço Dimensionalidade Átomo de hidrogênio Space Dimensionality Hydrogen atom
109	Um modelo neural de aprimoramento progressivo para redução de dimensionalidade / A Progressive Enhancement Neural Model for dimensionality reduction Camargo, Sandro da Silva January 2010 (has links) Nas últimas décadas, avanços em tecnologias de geração, coleta e armazenamento de dados têm contribuído para aumentar o tamanho dos bancos de dados nas diversas áreas de conhecimento humano. Este aumento verifica-se não somente em relação à quantidade de amostras de dados, mas principalmente em relação à quantidade de características descrevendo cada amostra. A adição de características causa acréscimo de dimensões no espaço matemático, conduzindo ao crescimento exponencial do hipervolume dos dados, problema denominado “maldição da dimensionalidade”. A maldição da dimensionalidade tem sido um problema rotineiro para cientistas que, a fim de compreender e explicar determinados fenômenos, têm se deparado com a necessidade de encontrar estruturas significativas ocultas, de baixa dimensão, dentro de dados de alta dimensão. Este processo denomina-se redução de dimensionalidade dos dados (RDD). Do ponto de vista computacional, a conseqüência natural da RDD é uma diminuição do espaço de busca de hipóteses, melhorando o desempenho e simplificando os resultados da modelagem de conhecimento em sistemas autônomos de aprendizado. Dentre as técnicas utilizadas atualmente em sistemas autônomos de aprendizado, as redes neurais artificiais (RNAs) têm se tornado particularmente atrativas para modelagem de sistemas complexos, principalmente quando a modelagem é difícil ou quando a dinâmica do sistema não permite o controle on-line. Apesar de serem uma poderosa técnica, as RNAs têm seu desempenho afetado pela maldição da dimensionalidade. Quando a dimensão do espaço de entradas é alta, as RNAs podem utilizar boa parte de seus recursos para representar porções irrelevantes do espaço de busca, dificultando o aprendizado. Embora as RNAs, assim como outras técnicas de aprendizado de máquina, consigam identificar características mais informativas para um processo de modelagem, a utilização de técnicas de RDD frequentemente melhora os resultados do processo de aprendizado. Este trabalho propõe um wrapper que implementa um modelo neural de aprimoramento progressivo para RDD em sistemas autônomos de aprendizado supervisionado visando otimizar o processo de modelagem. Para validar o modelo neural de aprimoramento progressivo, foram realizados experimentos com bancos de dados privados e de repositórios públicos de diferentes domínios de conhecimento. A capacidade de generalização dos modelos criados é avaliada por meio de técnicas de validação cruzada. Os resultados obtidos demonstram que o modelo neural de aprimoramento progressivo consegue identificar características mais informativas, permitindo a RDD, e tornando possível criar modelos mais simples e mais precisos. A implementação da abordagem e os experimentos foram realizados no ambiente Matlab, utilizando o toolbox de RNAs. / In recent decades, advances on data generation, collection and storing technologies have contributed to increase databases size in different knowledge areas. This increase is seen not only regarding samples amount, but mainly regarding dimensionality, i.e. the amount of features describing each sample. Features adding causes dimension increasing in mathematical space, leading to an exponential growth of data hypervolume. This problem is called “the curse of dimensionality”. The curse of dimensionality has been a routine problem for scientists, that in order to understand and explain some phenomena, have faced with the demand to find meaningful low dimensional structures hidden in high dimensional search spaces. This process is called data dimensionality reduction (DDR). From computational viewpoint, DDR natural consequence is a reduction of hypothesis search space, improving performance and simplifying the knowledge modeling results in autonomous learning systems. Among currently used techniques in autonomous learning systems, artificial neural networks (ANNs) have becoming particularly attractive to model complex systems, when modeling is hard or when system dynamics does not allow on-line control. Despite ANN being a powerful tool, their performance is affected by the curse of dimensionality. When input space dimension is high, ANNs can use a significant part of their resources to represent irrelevant parts of input space making learning process harder. Although ANNs, and other machine learning techniques, can identify more informative features for a modeling process, DDR techniques often improve learning results. This thesis proposes a wrapper which implements a Progressive Enhancement Neural Model to DDR in supervised autonomous learning systems in order to optimize the modeling process. To validate the proposed approach, experiments were performed with private and public databases, from different knowledge domains. The generalization ability of developed models is evaluated by means of cross validation techniques. Obtained results demonstrate that the proposed approach can identify more informative features, allowing DDR, and becoming possible to create simpler and more accurate models. The implementation of the proposed approach and related experiments were performed in Matlab Environment, using ANNs toolbox. Redes neurais Inteligência artificial Heurística Lógica modal Heuristics Wrapper Dimensionality reduction Feature selection Neural modeling
110	Deep Learning based Classification of FDG-PET Data for Alzheimer's Disease January 2017 (has links) abstract: Alzheimer’s Disease (AD), a neurodegenerative disease is a progressive disease that affects the brain gradually with time and worsens. Reliable and early diagnosis of AD and its prodromal stages (i.e. Mild Cognitive Impairment(MCI)) is essential. Fluorodeoxyglucose (FDG) positron emission tomography (PET) measures the decline in the regional cerebral metabolic rate for glucose, offering a reliable metabolic biomarker even on presymptomatic AD patients. PET scans provide functional information that is unique and unavailable using other types of imaging. The computational efficacy of FDG-PET data alone, for the classification of various Alzheimer’s Diagnostic categories (AD, MCI (LMCI, EMCI), Control) has not been studied. This serves as motivation to correctly classify the various diagnostic categories using FDG-PET data. Deep learning has recently been applied to the analysis of structural and functional brain imaging data. This thesis is an introduction to a deep learning based classification technique using neural networks with dimensionality reduction techniques to classify the different stages of AD based on FDG-PET image analysis. This thesis develops a classification method to investigate the performance of FDG-PET as an effective biomarker for Alzheimer's clinical group classification. This involves dimensionality reduction using Probabilistic Principal Component Analysis on max-pooled data and mean-pooled data, followed by a Multilayer Feed Forward Neural Network which performs binary classification. Max pooled features result into better classification performance compared to results on mean pooled features. Additionally, experiments are done to investigate if the addition of important demographic features such as Functional Activities Questionnaire(FAQ), gene information helps improve performance. Classification results indicate that our designed classifiers achieve competitive results, and better with the additional of demographic features. / Dissertation/Thesis / Masters Thesis Computer Science 2017 Computer science Medical imaging Artificial intelligence Alzheimer's Deep Learning Dimensionality Reduction FDG-PET Multilayer Perceptron Pooling

Search results