61 |
Multilinear Subspace Learning for Face and Gait RecognitionLu, Haiping 19 January 2009 (has links)
Face and gait recognition problems are challenging due to largely varying appearances, highly complex pattern distributions, and insufficient training samples. This dissertation focuses on multilinear subspace learning for face and gait recognition, where low-dimensional representations are learned directly from tensorial face or gait objects.
This research introduces a unifying multilinear subspace learning framework for systematic treatment of the multilinear subspace learning problem. Three multilinear projections are categorized according to the input-output space mapping as: vector-to-vector projection, tensor-to-tensor projection, and tensor-to-vector projection. Techniques for subspace learning from tensorial data are then proposed and analyzed. Multilinear principal component analysis (MPCA) seeks a tensor-to-tensor projection that maximizes the variation captured in the projected space, and it is further combined with linear discriminant analysis and boosting for better recognition performance. Uncorrelated MPCA (UMPCA) solves for a tensor-to-vector projection that maximizes the captured variation in the projected space while enforcing the zero-correlation constraint. Uncorrelated multilinear discriminant analysis (UMLDA) aims to produce uncorrelated features through a tensor-to-vector projection that maximizes a ratio of the between-class scatter over the within-class scatter defined in the projected space. Regularization and aggregation are incorporated in the UMLDA solution for enhanced performance.
Experimental studies and comparative evaluations are presented and analyzed on the PIE and FERET face databases, and the USF gait database. The results indicate that the MPCA-based solution has achieved the best overall performance in various learning scenarios, the UMLDA-based solution has produced the most stable and competitive results with the same parameter setting, and the UMPCA algorithm is effective in unsupervised learning in low-dimensional subspace. Besides advancing the state-of-the-art of multilinear subspace learning for face and gait recognition, this dissertation also has potential impact in both the development of new multilinear subspace learning algorithms and other applications involving tensor objects.
|
62 |
Duomenų dimensijos mažinimas naudojant autoasociatyvinius neuroninius tinklus / Data dimensionality reduction using autoassociative neural networksBendinskienė, Janina 31 July 2012 (has links)
Šiame magistro darbe apžvelgiami daugiamačių duomenų dimensijos mažinimo (vizualizavimo) metodai, tarp kurių nagrinėjami dirbtiniai neuroniniai tinklai. Pateikiamos pagrindinės dirbtinių neuroninių tinklų sąvokos (biologinis neuronas ir dirbtinio neurono modelis, mokymo strategijos, daugiasluoksnis neuronas ir pan.). Nagrinėjami autoasociatyviniai neuroniniai tinklai. Darbo tikslas – išnagrinėti autoasociatyviųjų neuroninių tinklų taikymo daugiamačių duomenų dimensijos mažinimui ir vizualizavimui galimybes bei ištirti gaunamų rezultatų priklausomybę nuo skirtingų parametrų. Siekiant šio tikslo atlikti eksperimentai naudojant kelias daugiamačių duomenų aibes. Tyrimų metu nustatyti parametrai, įtakojantys autoasociatyvinio neuroninio tinklo veikimą. Be to, gauti rezultatai lyginti pagal dvi skirtingas tinklo daromas paklaidas – MDS ir autoasociatyvinę. MDS paklaida parodo, kaip gerai išlaikomi atstumai tarp analizuojamų taškų (vektorių) pereinant iš daugiamatės erdvės į mažesnės dimensijos erdvę. Autoasociatyvinio tinklo išėjimuose gautos reikšmės turi sutapti su įėjimo reikšmėmis, taigi autoasociatyvinė paklaida parodo, kaip gerai tai gaunama (vertinamas skirtumas tarp įėjimų ir išėjimų). Tirta, kaip paklaidas įtakoja šie autoasociatyvinio neuroninio tinklo parametrai: aktyvacijos funkcija, minimizuojama funkcija, mokymo funkcija, epochų skaičius, paslėptų neuronų skaičius ir dimensijos mažinimo skaičiaus pasirinkimas. / This thesis gives an overview of dimensionality reduction of multivariate data (visualization) techniques, including the issue of artificial neural networks. Presents the main concepts of artificial neural networks (biological and artificial neuron to neuron model, teaching strategies, multi-neuron and so on.). Autoassociative neural networks are analyzed. The aim of this work - to consider the application of autoassociative neural networks for multidimensional data visualization and dimension reduction and to explore the possibilities of the results obtained from the dependence of different parameters. To achieve this, several multidimensional data sets were used. In analysis determinate parameters influencing autoassociative neural network effect. In addition, the results obtained by comparing two different network made errors - MDS and autoassociative. MDS error shows how well maintained the distance between the analyzed points (vectors), in transition from the multidimensional space into a lower dimension space. Autoassociative network output values obtained should coincide with the input values, so autoassociative error shows how well it is received (evaluated the difference between inputs and outputs). Researched how autoassociative neural network errors are influenced by this parameters: the activation function, minimizing function, training function, the number of epochs, hidden neurons number and choices of the number of dimension reduction.
|
63 |
Isometry and convexity in dimensionality reductionVasiloglou, Nikolaos 30 March 2009 (has links)
The size of data generated every year follows an exponential growth. The number of data points as well as the dimensions have increased dramatically the past 15 years. The gap between the demand from the industry in data processing and the solutions provided by the machine learning community is increasing. Despite the growth in memory and computational power, advanced statistical processing on the order of gigabytes is beyond any possibility. Most sophisticated Machine Learning algorithms require at least quadratic complexity. With the current computer model architecture, algorithms with higher complexity than linear O(N) or O(N logN) are not considered practical. Dimensionality reduction is a challenging problem in machine learning. Often data represented as multidimensional points happen to have high dimensionality. It turns out that the information they carry can be expressed with much less dimensions. Moreover the reduced dimensions of the data can have better interpretability than the original ones. There is a great variety of dimensionality reduction algorithms under the theory of Manifold Learning. Most of the methods such as Isomap, Local Linear Embedding, Local Tangent Space Alignment, Diffusion Maps etc. have been extensively studied under the framework of Kernel Principal Component Analysis (KPCA). In this dissertation we study two current state of the art dimensionality reduction methods, Maximum Variance Unfolding (MVU) and Non-Negative Matrix Factorization (NMF). These two dimensionality reduction methods do not fit under the umbrella of Kernel PCA. MVU is cast as a Semidefinite Program, a modern convex nonlinear optimization algorithm, that offers more flexibility and power compared to iv KPCA. Although MVU and NMF seem to be two disconnected problems, we show that there is a connection between them. Both are special cases of a general nonlinear factorization algorithm that we developed. Two aspects of the algorithms are of particular interest: computational complexity and interpretability. In other words computational complexity answers the question of how fast we can find the best solution of MVU/NMF for large data volumes. Since we are dealing with optimization programs, we need to find the global optimum. Global optimum is strongly connected with the convexity of the problem. Interpretability is strongly connected with local isometry1 that gives meaning in relationships between data points. Another aspect of interpretability is association of data with labeled information. The contributions of this thesis are the following:
1. MVU is modified so that it can scale more efficient. Results are shown on 1 million speech datasets. Limitations of the method are highlighted.
2. An algorithm for fast computations for the furthest neighbors is presented for the first time in the literature.
3. Construction of optimal kernels for Kernel Density Estimation with modern convex programming is presented. For the first time we show that the Leave One Cross Validation (LOOCV) function is quasi-concave.
4. For the first time NMF is formulated as a convex optimization problem
5. An algorithm for the problem of Completely Positive Matrix Factorization is presented.
6. A hybrid algorithm of MVU and NMF the isoNMF is presented combining advantages of both methods.
7. The Isometric Separation Maps (ISM) a variation of MVU that contains classification information is presented.
8. Large scale nonlinear dimensional analysis on the TIMIT speech database is performed.
9. A general nonlinear factorization algorithm is presented based on sequential convex programming. Despite the efforts to scale the proposed methods up to 1 million data points in reasonable time, the gap between the industrial demand and the current state of the art is still orders of magnitude wide.
|
64 |
Métodos de redução de dimensionalidade aplicados na seleção genômica para características de carcaça em suínos / Dimensionality reduction methods applied to genomic selection for carcass traits in pigsAzevedo, Camila Ferreira 26 July 2012 (has links)
Made available in DSpace on 2015-03-26T13:32:15Z (GMT). No. of bitstreams: 1
texto completo.pdf: 1216352 bytes, checksum: 3e5fbc09a6f684ddf7dbb4442657ce1f (MD5)
Previous issue date: 2012-07-26 / The main contribution of molecular genetics is the direct use of DNA information to identify genetically superior individuals. Under this approach, genome-wide selection (GWS) can be used with this purpose. GWS consists in analyzing of a large number of SNP markers widely distributed in the genome, and due to the fact that the number of markers is much larger than the number of genotyped individuals (high dimensionality) and also to the fact that such markers are highly correlated (multicollinearity). However, the use of methodologies that address the adversities is fundamental to the success of genome wide selection. In view of, the aim of this dissertation was to propose the application of Independent Component Regression (ICR), Principal Component Regression (PCR), Partial Least Squares (PLS) and Random Regression Best Linear Unbiased Predictor, whereas carcass traits in an F2 population of pigs originated from the cross of two males from the naturalized Brazilian breed Piau with 18 females of a commercial line (Large White × Landrace × Pietrain), developed at the University Federal of Viçosa. The specific objectives were, to estimate Genomic Breeding Value (GBV) for each individual and estimate the effects of SNP markers in order to compare methods. The results showed that ICR method is more efficient, since provided most accurate genomic breeding values estimates for most carcass traits. / A principal contribuição da genética molecular no melhoramento animal é a utilização direta das informações de DNA no processo de identificação de animais geneticamente superiores. Sob esse enfoque, a seleção genômica ampla (Genome Wide Selection GWS), a qual consiste na análise de um grande número de marcadores SNPs (Single Nucleotide Polymorphisms) amplamente distribuídos no genoma, foi idealizada. A utilização dessas informações é um desafio, uma vez que o número de marcadores é muito maior que o número de animais genotipados (alta dimensionalidade) e tais marcadores são altamente correlacionados (multicolinearidade). No entanto, o sucesso da seleção genômica ampla deve-se a escolha de metodologias que contemplem essas adversidades. Diante do exposto, o presente trabalho teve por objetivo propor a aplicação dos métodos de regressão via Componentes Independentes (Independent Component Regression ICR), regressão via componentes principais (Principal Component Regression PCR), regressão via Quadrados Mínimos Parciais (Partial Least Squares PLSR) e RR-BLUP, considerando características de carcaça em uma população F2 de suínos proveniente do cruzamento de dois varrões da raça naturalizada brasileira Piau com 18 fêmeas de linhagem comercial (Landrace × Large White × Pietrain), desenvolvida na Universidade Federal de Viçosa. Os objetivos específicos foram estimar Valores Genéticos Genômicos (Genomic Breeding Values GBV) para cada indivíduo avaliado e estimar efeitos de marcadores SNPs, visando a comparação dos métodos. Os resultados indicaram que o método ICR se mostrou mais eficiente, uma vez que este proporcionou maiores valores de acurácia na estimação do GBV para a maioria das características de carcaça.
|
65 |
Multi-Label Dimensionality ReductionJanuary 2011 (has links)
abstract: Multi-label learning, which deals with data associated with multiple labels simultaneously, is ubiquitous in real-world applications. To overcome the curse of dimensionality in multi-label learning, in this thesis I study multi-label dimensionality reduction, which extracts a small number of features by removing the irrelevant, redundant, and noisy information while considering the correlation among different labels in multi-label learning. Specifically, I propose Hypergraph Spectral Learning (HSL) to perform dimensionality reduction for multi-label data by exploiting correlations among different labels using a hypergraph. The regularization effect on the classical dimensionality reduction algorithm known as Canonical Correlation Analysis (CCA) is elucidated in this thesis. The relationship between CCA and Orthonormalized Partial Least Squares (OPLS) is also investigated. To perform dimensionality reduction efficiently for large-scale problems, two efficient implementations are proposed for a class of dimensionality reduction algorithms, including canonical correlation analysis, orthonormalized partial least squares, linear discriminant analysis, and hypergraph spectral learning. The first approach is a direct least squares approach which allows the use of different regularization penalties, but is applicable under a certain assumption; the second one is a two-stage approach which can be applied in the regularization setting without any assumption. Furthermore, an online implementation for the same class of dimensionality reduction algorithms is proposed when the data comes sequentially. A Matlab toolbox for multi-label dimensionality reduction has been developed and released. The proposed algorithms have been applied successfully in the Drosophila gene expression pattern image annotation. The experimental results on some benchmark data sets in multi-label learning also demonstrate the effectiveness and efficiency of the proposed algorithms. / Dissertation/Thesis / Ph.D. Computer Science 2011
|
66 |
Dimensionality Reduction and Fusion Strategies for the Design of Parametric Signal ClassifiersKota, Srinivas 01 December 2010 (has links)
This dissertation focuses on two specific problems related to the design of parametric signal classifiers: dimensionality reduction to overcome the curse of dimensionality and information fusion to improve classification by exploiting complementary information from multiple sensors or multiple classifiers. Dimensionality reduction is achieved by introducing a strategy to rank and select a subset of principal component transform (PCT) coefficients that carry the most useful discriminatory information. The criteria considered for ranking transform coefficients include magnitude, variance, inter-class separation, and classification accuracies of individual transform coefficients. The ranking strategy not only facilitates overcoming the dimensionality curse for multivariate classifier implementation but also provides a means to further select, out of a rank-ordered set, a smaller set of features that give the best classification accuracies. Because the class-conditional densities of transform feature vectors are often assumed to be multivariate Gaussian, the dimensionality reduction strategy focuses on overcoming the specific problems encountered in the design of practical multivariate Gaussian classifiers using transform feature vectors. Through experiments with event related potentials (ERPs) and ear pressure signals, it is shown that the dimension of the feature space can be decreased quite significantly by means of the feature ranking and selection strategy. Furthermore, the resulting Gaussian classifiers yield higher classification accuracies than those reported in previous classification studies on the same signal sets. Amongst the four feature selection criteria, Gaussian classifiers using the maximum magnitude and maximum variance selection criteria gave the best classification accuracies across the two sets of classification experiments. For the multisensor case, dimensionality reduction is achieved by introducing a spatio-temporal array model to observe the signals across channels and time, simultaneously. A two-step process which uses the Kolmogrov-Smirnov test and the Lilliefors test is formulated to select the array elements which have different Gaussian densities across all signal categories. Selecting spatio-temporal elements that fit the assumed model and also statistically differ across the signal categories not only decreases the dimensionality significantly but also ensures high classification accuracies. The selection is dynamic in the sense that selecting spatio-temporal array elements corresponds to selecting samples of different sensors at different time-instants. Each selected array element is classified using a univariate Gaussian classifier and the resulting decisions are fused into a decision fusion vector which is classified using a discrete Bayes classifier. The application of the resulting dynamic channel selection-based classification strategy is demonstrated by designing and testing classifiers for multi-channel ERPs and it is shown that strategy yields high classification accuracies. Most noteworthy of the two dimensionality reduction strategies is the fact that the multivariate Gaussian signal classifiers developed can be implemented without having to collect a prohibitively large number of training signals simply to satisfy the dimensionality conditions. Consequently, the classification strategies can be beneficial for designing personalized human-machine-interface (HMI) signal classifiers for individuals from whom only a limited number of training signals can reliably be collected due to severe disabilities. The information fusion strategy introduced is aimed at improving the performance of signal classifiers by combining signals from multiple sensors or by combining decisions of multiple classifiers. Fusion classifiers with diverse components (classifiers or data sets) outperform those with less diverse components. Determining component diversity, therefore, is of the utmost importance in the design of fusion classifiers which are often employed in clinical diagnostic and numerous other pattern recognition problems. A new pairwise diversity-based ranking strategy is introduced to select a subset of ensemble components, which when combined, will be more diverse than any other component subset of the same size. The strategy is unified in the sense that the components can be either polychotomous classifiers or polychotomous data sets. Classifier fusion and data fusion systems are formulated based on the diversity selection strategy and the application of the two fusion strategies are demonstrated through the classification of multi-channel ERPs. From the results it is concluded that data fusion outperforms classifier fusion. It is also shown that the diversity-based data fusion system outperforms the system using randomly selected data components. Furthermore, it is demonstrated that the combination of data components that yield the best performance, in a relative sense, can be determined through the diversity selection strategy.
|
67 |
Um modelo neural de aprimoramento progressivo para redução de dimensionalidade / A Progressive Enhancement Neural Model for dimensionality reductionCamargo, Sandro da Silva January 2010 (has links)
Nas últimas décadas, avanços em tecnologias de geração, coleta e armazenamento de dados têm contribuído para aumentar o tamanho dos bancos de dados nas diversas áreas de conhecimento humano. Este aumento verifica-se não somente em relação à quantidade de amostras de dados, mas principalmente em relação à quantidade de características descrevendo cada amostra. A adição de características causa acréscimo de dimensões no espaço matemático, conduzindo ao crescimento exponencial do hipervolume dos dados, problema denominado “maldição da dimensionalidade”. A maldição da dimensionalidade tem sido um problema rotineiro para cientistas que, a fim de compreender e explicar determinados fenômenos, têm se deparado com a necessidade de encontrar estruturas significativas ocultas, de baixa dimensão, dentro de dados de alta dimensão. Este processo denomina-se redução de dimensionalidade dos dados (RDD). Do ponto de vista computacional, a conseqüência natural da RDD é uma diminuição do espaço de busca de hipóteses, melhorando o desempenho e simplificando os resultados da modelagem de conhecimento em sistemas autônomos de aprendizado. Dentre as técnicas utilizadas atualmente em sistemas autônomos de aprendizado, as redes neurais artificiais (RNAs) têm se tornado particularmente atrativas para modelagem de sistemas complexos, principalmente quando a modelagem é difícil ou quando a dinâmica do sistema não permite o controle on-line. Apesar de serem uma poderosa técnica, as RNAs têm seu desempenho afetado pela maldição da dimensionalidade. Quando a dimensão do espaço de entradas é alta, as RNAs podem utilizar boa parte de seus recursos para representar porções irrelevantes do espaço de busca, dificultando o aprendizado. Embora as RNAs, assim como outras técnicas de aprendizado de máquina, consigam identificar características mais informativas para um processo de modelagem, a utilização de técnicas de RDD frequentemente melhora os resultados do processo de aprendizado. Este trabalho propõe um wrapper que implementa um modelo neural de aprimoramento progressivo para RDD em sistemas autônomos de aprendizado supervisionado visando otimizar o processo de modelagem. Para validar o modelo neural de aprimoramento progressivo, foram realizados experimentos com bancos de dados privados e de repositórios públicos de diferentes domínios de conhecimento. A capacidade de generalização dos modelos criados é avaliada por meio de técnicas de validação cruzada. Os resultados obtidos demonstram que o modelo neural de aprimoramento progressivo consegue identificar características mais informativas, permitindo a RDD, e tornando possível criar modelos mais simples e mais precisos. A implementação da abordagem e os experimentos foram realizados no ambiente Matlab, utilizando o toolbox de RNAs. / In recent decades, advances on data generation, collection and storing technologies have contributed to increase databases size in different knowledge areas. This increase is seen not only regarding samples amount, but mainly regarding dimensionality, i.e. the amount of features describing each sample. Features adding causes dimension increasing in mathematical space, leading to an exponential growth of data hypervolume. This problem is called “the curse of dimensionality”. The curse of dimensionality has been a routine problem for scientists, that in order to understand and explain some phenomena, have faced with the demand to find meaningful low dimensional structures hidden in high dimensional search spaces. This process is called data dimensionality reduction (DDR). From computational viewpoint, DDR natural consequence is a reduction of hypothesis search space, improving performance and simplifying the knowledge modeling results in autonomous learning systems. Among currently used techniques in autonomous learning systems, artificial neural networks (ANNs) have becoming particularly attractive to model complex systems, when modeling is hard or when system dynamics does not allow on-line control. Despite ANN being a powerful tool, their performance is affected by the curse of dimensionality. When input space dimension is high, ANNs can use a significant part of their resources to represent irrelevant parts of input space making learning process harder. Although ANNs, and other machine learning techniques, can identify more informative features for a modeling process, DDR techniques often improve learning results. This thesis proposes a wrapper which implements a Progressive Enhancement Neural Model to DDR in supervised autonomous learning systems in order to optimize the modeling process. To validate the proposed approach, experiments were performed with private and public databases, from different knowledge domains. The generalization ability of developed models is evaluated by means of cross validation techniques. Obtained results demonstrate that the proposed approach can identify more informative features, allowing DDR, and becoming possible to create simpler and more accurate models. The implementation of the proposed approach and related experiments were performed in Matlab Environment, using ANNs toolbox.
|
68 |
CLASSIFICATION OF ONE-DIMENSIONAL AND TWO-DIMENSIONAL SIGNALSKanneganti, Raghuveer 01 August 2014 (has links)
This dissertation focuses on the classification of one-dimensional and two-dimensional signals. The one-dimensional signal classification problem involves the classification of brain signals for identifying the emotional responses of human subjects under given drug conditions. A strategy is developed to accurately classify ERPs in order to identify human emotions based on brain reactivity to emotional, neutral, and cigarette-related stimuli in smokers. A multichannel spatio-temporal model is employed to overcome the curse of dimensionality that plagues the design of parametric multivariate classifiers for multi-channel ERPs. The strategy is tested on the ERPs of 156 smokers who participated in a smoking cessation program. One half of the subjects were given nicotine patches and the other half were given placebo patches. ERPs were collected from 29 channel in response to the presentation of the pictures with emotional (pleasant and unpleasant), neutral/boring, and cigarette-related content. It is shown that human emotions can be classified accurately and the results also show that smoking cessation causes a drop in the classification accuracies of emotions in the placebo group, but not in the nicotine patch group. Given that individual brain patterns were compared with group average brain patterns, the findings support the view that individuals tend to have similar brain reactions to different types of emotional stimuli. Overall, this new classification approach to identify differential brain responses to different emotional types could lead to new knowledge concerning brain mechanisms associated with emotions common to most or all people. This novel classification technique for identifying emotions in the present study suggests that smoking cessation without nicotine replacement results in poorer differentiation of brain responses to different emotional stimuli. Future, directions in this area would be to use these methods to assess individual differences in responses to emotional stimuli and to different drug treatments. Advantages of this and other brain-based assessment include temporal precision (e.g, 400-800 ms post stimulus), and the elimination of biases related to self-report measures. The two-dimensional signal classification problems include the detection of graphite in testing documents and the detection of fraudulent bubbles in test sheets. A strategy is developed to detect graphite responses in optical mark recognition (OMR) documents using inexpensive visible light scanners. The main challenge in the formulation of the strategy is that the detection should be invariant to the numerous background colors and artwork in typical optical mark recognition documents. A test document is modeled as a superposition of a graphite response image and a background image. The background image in turn is modeled as superposition of screening artwork, lines, and machine text components. A sequence of image processing operations and a pattern recognition algorithm are developed to estimate the graphite response image from a test document by systematically removing the components of the background image. The proposed strategy is tested on a wide range of scanned documents and it is shown that the estimated graphite response images are visually similar to those scanned by very expensive infra-red scanners currently employed for optical mark recognition. The robustness of the detection strategy is also demonstrated by testing a large number of simulated test documents. A procedure is also developed to autonomously determine if cheating has occurred by detecting the presence of aberrant responses in scanned OMR test books. The challenges introduced by the significant imbalance in the numbers of typical and aberrant bubbles were identified. The aberrant bubble detection problem is formulated as an outlier detection problem. A feature based outlier detection procedure in conjunction with a one-class SVM classifier is developed. A multi-criteria rank-of-rank-sum technique is introduced to rank and select a subset of features from a pool of candidate features. Using the data set of 11 individuals, it is shown that a detection accuracy of over 90% is possible. Experiments conducted on three real test books flagged for suspected cheating showed that the proposed strategy has the potential to be deployed in practice.
|
69 |
Redução de dimensionalidade aplicada à diarização de locutor / Dimensionality reduction applied to speaker diarizationSilva, Sérgio Montazzolli January 2013 (has links)
Atualmente existe uma grande quantidade de dados multimídia sendo geradas todos os dias. Estes dados são oriundos de diversas fontes, como transmissões de rádio ou televisão, gravações de palestras, encontros, conversas telefônicas, vídeos e fotos capturados por celular, entre outros. Com isto, nos últimos anos o interesse pela transcrição de dados multimídia tem crescido, onde, no processamento de voz, podemos destacar as áreas de Reconhecimento de Locutor, Reconhecimento de Fala, Diarização de Locutor e Rastreamento de Locutores. O desenvolvimento destas áreas vem sendo impulsionado e direcionado pelo NIST, que periodicamente realiza avaliações sobre o estado-da-arte. Desde 2000, a tarefa de Diarização de Locutor tem se destacado como uma das principáis frentes de pesquisa em transcrição de dados de voz, tendo sido avaliada pelo NIST por diversas vezes na última década. O objetivo desta tarefa é encontrar o número de locutores presentes em um áudio, e rotular seus respectivos trechos de fala, sem que nenhuma informação tenha sido previamente fornecida. Em outras palavras, costuma-se dizer que o objetivo é responder a questão "Quem falou e quando?". Um dos grandes problemas nesta área é se conseguir obter um bom modelo para cada locutor presente no áudio, dada a pouca quantidade de informações e a alta dimensionalidade dos dados. Neste trabalho, além da criação de um Sistema de Diarização de Locutor, iremos tratar este problema mediante à redução de dimensionalidade através de análises estatísticas. Usaremos a Análise de Componentes Principáis, a Análise de Discriminantes Lineares e a recém apresentada Análise de Semi-Discriminantes Lineares. Esta última utiliza um método de inicialização estático, iremos propor o uso de um método dinâmico, através da detecção de pontos de troca de locutor. Também investigaremos o comportamento destas análises sob o uso simultâneo de múltiplas parametrizações de curto prazo do sinal acústico. Os resultados obtidos mostram que é possível preservar - ou até melhorar - o desempenho do sistema, mesmo reduzindo substâncialmente o número de dimensões. Isto torna mais rápida a execução de algoritmos de Aprendizagem de Máquina e reduz a quantidade de memória necessária para armezenar os dados. / Currently, there is a large amount of multimedia data being generated everyday. These data come from various sources, such as radio or television, recordings of lectures and meetings, telephone conversations, videos and photos captured by mobile phone, among others. Because of this, interest in automatic multimedia data transcription has grown in recent years, where, for voice processing, we can highlight the areas of Speaker Recognition, Speech Recognition, Speaker Diarization and Speaker Tracking. The development of such areas is being conducted by NIST, which periodically promotes state-of-the-art evaluations. Since 2000, the task of Speaker Diarization has emerged as one of the main research fields in voice data transcription, having been evaluated by NIST several times in the last decade. The objective of this task is to find the number of speakers in an audio recording, and properly label their speech segments without the use of any training information. In other words , it is said that the goal of Speaker Diarization is to answer the question "Who spoke when?". A major problem in this area is to obtain a good speaker model from the audio, given the limited amount of information available and the high dimensionality of the data. In the current work, we will describe how our Speaker Diarization System was built, and we will address the problem mentioned by lowering the dimensionality of the data through statistical analysis. We will use the Principal Component Analysis, the Linear Discriminant Analysis and the newly presented Fisher Linear Semi-Discriminant Analysis. The latter uses a static method for initialization, and here we propose the use of a dynamic method by the use of a speaker change points detection algorithm. We also investigate the behavior of these data analysis techniques under the simultaneous use of multiple short term features. Our results show that it is possible to maintain - and even improve - the system performance, by substantially reducing the number of dimensions. As a consequence, the execution of Machine Learning algorithms is accelerated while reducing the amount of memory required to store the data.
|
70 |
Evaluating immersive approaches to multidimensional information visualization / Avaliando abordagens imersivas para visualização de informações multidimensionaisWagner Filho, Jorge Alberto January 2018 (has links)
O uso de novos recursos de display e interação para suportar a visualização imersiva de dados e incrementar o raciocínio analítico é uma tendência de pesquisa em Visualização de Informações. Neste trabalho, avaliamos o uso de ambientes baseados em HMD para a exploração de dados multidimensionais, representados em scatterplots 3D como resultado de redução de dimensionalidade. Nós apresentamos uma nova modelagem para o problema de avaliação neste contexto, levando em conta os dois fatores cuja interação determina o impacto no desempenho total nas tarefas: a diferença nos erros introduzidos ao se realizar redução de dimensionalidade para 2D ou 3D, e a diferença nos erros de percepção humana sob diferentes condições de visualização. Este framework em duas etapas oferece uma abordagem simples para estimar os benefícios de se utilizar um setup 3D imersivo para um dado conjunto de dados. Como caso de uso, os erros de redução de dimensionalidade para uma série de conjuntos de dados de votações na Câmara dos Deputados, ao se utilizar duas ou três dimensões, são avaliados por meio de uma abordagem empírica baseada em tarefas. O erro de percepção e o desempenho geral de tarefa, por sua vez, são avaliados através de estudos controlados comparativos com usuários. Comparando-se visualizações baseadas em desktop (2D e 3D) e em HMD (3D), resultados iniciais indicaram que os erros de percepção foram baixos e similares em todas abordagens, resultando em benefícios para o desempenho geral em ambas técnicas 3D A condição imersiva, no entanto, demonstrou requerer menor esforço para encontrar as informações e menos navegação, além de prover percepções subjetivas de precisão e engajamento muito maiores. Todavia, o uso de navegação por voo livre resultou em tempos ineficientes e frequente desconforto nos usuários. Em um segundo momento, implementamos e avaliamos uma abordagem alternativa de exploração de dados, onde o usuário permanece sentado e mudanças no ponto de vista só são possíveis por meio de movimentos físicos. Toda a manipulação é realizada diretamente por gestos aéreos naturais, com os dados sendo renderizados ao alcance dos braços. A reprodução virtual de uma cópia exata da mesa de trabalho do analista visa aumentar a imersão e possibilitar a interação tangível com controles e informações bidimensionais associadas. Um segundo estudo com usuários foi conduzido em comparação a uma versão equivalente baseada em desktop, explorando um conjunto de 9 tarefas representativas de percepção e interação, baseadas em literatura prévia. Nós demonstramos que o nosso protótipo, chamado VirtualDesk, apresentou resultados excelentes em relação a conforto e imersão, e desempenho equivalente ou superior em todas tarefas analíticas, enquanto adicionando pouco ou nenhum tempo extra e ampliando a exploração dos dados. / The use of novel displays and interaction resources to support immersive data visualization and improve the analytical reasoning is a research trend in Information Visualization. In this work, we evaluate the use of HMD-based environments for the exploration of multidimensional data, represented in 3D scatterplots as a result of dimensionality reduction. We present a new modelling for the evaluation problem in such a context, accounting for the two factors whose interplay determine the impact on the overall task performance: the difference in errors introduced by performing dimensionality reduction to 2D or 3D, and the difference in human perception errors under different visualization conditions. This two-step framework offers a simple approach to estimate the benefits of using an immersive 3D setup for a particular dataset. As use case, the dimensionality reduction errors for a series of roll calls datasets when using two or three dimensions are evaluated through an empirical task-based approach. The perception error and overall task performance are assessed through controlled comparative user studies. When comparing desktop-based (2D and 3D) with an HMD-based (3D) visualization, initial results indicated that perception errors were low and similar in all approaches, resulting in overall performance benefits in both 3D techniques. The immersive condition, however, was found to require less effort to find information and less navigation, besides providing much larger subjective perception of accuracy and engagement. Nonetheless, the use of flying navigation resulted in inefficient times and frequent user discomfort In a second moment, we implemented and evaluated an alternative data exploration approach where the user remains seated and viewpoint change is only realisable through physical movements. All manipulation is done directly by natural mid-air gestures, with the data being rendered at arm’s reach. The virtual reproduction of an exact copy of the analyst’s desk aims to increase immersion and enable tangible interaction with controls and two dimensional associated information. A second user study was carried out comparing this scenario to a desktop-based equivalent, exploring a set of 9 representative perception and interaction tasks based on previous literature. We demonstrate that our prototype setup, named VirtualDesk, presents excellent results regarding user comfort and immersion, and performs equally or better in all analytical tasks, while adding minimal or no time overhead and amplifying data exploration.
|
Page generated in 0.1234 seconds