Global ETD Search

41	Extração de tópicos baseado em agrupamento de regras de associação / Topic extraction based on association rule clustering Santos, Fabiano Fernandes dos 29 May 2015 (has links) Uma representação estruturada dos documentos em um formato apropriado para a obtenção automática de conhecimento, sem que haja perda de informações relevantes em relação ao formato originalmente não-estruturado, é um dos passos mais importantes da mineração de textos, pois a qualidade dos resultados obtidos com as abordagens automáticas para obtenção de conhecimento de textos estão fortemente relacionados à qualidade dos atributos utilizados para representar a coleção de documentos. O Modelo de Espaço de Vetores (MEV) é um modelo tradicional para obter uma representação estruturada dos documentos. Neste modelo, cada documento é representado por um vetor de pesos correspondentes aos atributos do texto. O modelo bag-of-words é a abordagem de MEV mais utilizada devido a sua simplicidade e aplicabilidade. Entretanto, o modelo bag-of-words não trata a dependência entre termos e possui alta dimensionalidade. Diversos modelos para representação dos documentos foram propostos na literatura visando capturar a informação de relação entre termos, destacando-se os modelos baseados em frases ou termos compostos, o Modelo de Espaço de Vetores Generalizado (MEVG) e suas extensões, modelos de tópicos não-probabilísticos, como o Latent Semantic Analysis (LSA) ou o Non-negative Matrix Factorization (NMF), e modelos de tópicos probabilísticos, como o Latent Dirichlet Allocation (LDA) e suas extensões. A representação baseada em modelos de tópicos é uma das abordagens mais interessantes uma vez que elas fornece uma estrutura que descreve a coleção de documentos em uma forma que revela sua estrutura interna e as suas inter-relações. As abordagens de extração de tópicos também fornecem uma estratégia de redução da dimensionalidade visando a construção de novas dimensões que representam os principais tópicos ou assuntos identificados na coleção de documentos. Entretanto, a extração é eficiente de informações sobre as relações entre os termos para construção da representação de documentos ainda é um grande desafio de pesquisa. Os modelos para representação de documentos que exploram a correlação entre termos normalmente enfrentam um grande desafio para manter um bom equilíbrio entre (i) a quantidade de dimensões obtidas, (ii) o esforço computacional e (iii) a interpretabilidade das novas dimensões obtidas. Assim,é proposto neste trabalho o modelo para representação de documentos Latent Association Rule Cluster based Model (LARCM). Este é um modelo de extração de tópicos não-probabilístico que explora o agrupamento de regras de associação para construir uma representação da coleção de documentos com dimensionalidade reduzida tal que as novas dimensões são extraídas a partir das informações sobre as relações entre os termos. No modelo proposto, as regras de associação são extraídas para cada documento para obter termos correlacionados que formam expressões multi-palavras. Essas relações entre os termos formam o contexto local da relação entre termos. Em seguida, aplica-se um processo de agrupamento em todas as regras de associação para formar o contexto geral das relações entre os termos, e cada grupo de regras de associação obtido formará um tópico, ou seja, uma dimensão da representação. Também é proposto neste trabalho uma metodologia de avaliação que permite selecionar modelos que maximizam tanto os resultados na tarefa de classificação de textos quanto os resultados de interpretabilidade dos tópicos obtidos. O modelo LARCM foi comparado com o modelo LDA tradicional e o modelo LDA utilizando uma representação que inclui termos compostos (bag-of-related-words). Os resultados dos experimentos indicam que o modelo LARCM produz uma representação para os documentos que contribui significativamente para a melhora dos resultados na tarefa de classificação de textos, mantendo também uma boa interpretabilidade dos tópicos obtidos. O modelo LARCM também apresentou ótimo desempenho quando utilizado para extração de informação de contexto para aplicação em sistemas de recomendação sensíveis ao contexto. / A structured representation of documents in an appropriate format for the automatic knowledge extraction without loss of relevant information is one of the most important steps of text mining, since the quality of the results obtained with automatic approaches for the text knowledge extraction is strongly related to the quality of the selected attributes to represent the collection of documents. The Vector Space model (VSM) is a traditional structured representation of documents. In this model, each document is represented as a vector of weights that corresponds to the features of the document. The bag-of-words model is the most popular VSM approach because of its simplicity and general applicability. However, the bag-of-words model does not include dependencies of the terms and has a high dimensionality. Several models for document representation have been proposed in the literature in order to capture the dependence among the terms, especially models based on phrases or compound terms, the Generalized Vector Space Model (GVSM) and their extensions, non-probabilistic topic models as Latent Semantic Analysis (LSA) or Non-negative Matrix Factorization (NMF) and still probabilistic topic models as the Latent Dirichlet Allocation (LDA) and their extensions. The topic model representation is one of the most interesting approaches since it provides a structure that describes the collection of documents in a way that reveals their internal structure and their interrelationships. Also, this approach provides a dimensionality reduction strategy aiming to built new dimensions that represent the main topics or ideas of the document collection. However, the efficient extraction of information about the relations of terms for document representation is still a major research challenge nowadays. The document representation models that explore correlated terms usually face a great challenge of keeping a good balance among the (i) number of extracted features, (ii) the computational performance and (iii) the interpretability of new features. In this way, we proposed the Latent Association Rule Cluster based Model (LARCM). The LARCM is a non-probabilistic topic model that explores association rule clustering to build a document representation with low dimensionality in a way that each dimension is composed by information about the relations among the terms. In the proposed approach, the association rules are built for each document to extract the correlated terms that will compose the multi-word expressions. These relations among the terms are the local context of relations. Then, a clustering process is applied for all association rules to discover the general context of the relations, and each obtained cluster is an extracted topic or a dimension of the new document representation. This work also proposes in this work an evaluation methodology to select topic models that maximize the results in the text classification task as much as the interpretability of the obtained topics. The LARCM model was compared against both the traditional LDA model and the LDA model using a document representation that includes multi-word expressions (bag-of-related-words). The experimental results indicate that LARCM provides an document representation that improves the results in the text classification task and even retains a good interpretability of the extract topics. The LARCM model also achieved great results as a method to extract contextual information for context-aware recommender systems. Agrupamento de regras de associação Association rule clustering Dimensionality reduction Extração de tópicos Mineração de textos Redução de dimensionalidade Topic extraction
42	Mapeamento e visualização de dados em alta dimensão com mapas auto-organizados. / Mapping and visualization of high dimensional data with self-organized maps. Kitani, Edson Caoru 14 June 2013 (has links) Os seres vivos têm uma impressionante capacidade de lidar com ambientes complexos com grandes quantidades de informações de forma muito autônoma. Isto os torna um modelo ideal para o desenvolvimento de sistemas artificiais bioinspirados. A rede neural artificial auto-organizada de Kohonen é um excelente exemplo de um sistema baseado nos modelos biológicos. Esta tese discutirá ilustrativamente o reconhecimento e a generalização de padrões em alta dimensão nos sistemas biológicos e como eles lidam com redução de dimensionalidade para otimizar o armazenamento e o acesso às informações memorizadas para fins de reconhecimento e categorização de padrões, mas apenas para contextualizar o tema com as propostas desta tese. As novas propostas desenvolvidas nesta tese são úteis para aplicações de extração não supervisionada de conhecimento a partir dos mapas auto-organizados. Trabalha-se sobre o modelo da Rede Neural de Kohonen, mas algumas das metodologias propostas também são aplicáveis com outras abordagens de redes neurais auto-organizadas. Será apresentada uma técnica de reconstrução visual dos neurônios do Mapa de Kohonen gerado pelo método híbrido PCA+SOM. Essa técnica é útil quando se trabalha com banco de dados de imagens. Propõe-se também um método para melhorar a representação dos dados do mapa SOM e discute-se o resultado do mapeamento SOM como uma generalização das informações do espaço de dados. Finalmente, apresenta-se um método de exploração de espaço de dados em alta dimensão de maneira auto-organizada, baseado no manifold dos dados, cuja proposta foi denominada Self Organizing Manifold Mapping (SOMM). São apresentados os resultados computacionais de ensaios realizados com cada uma das propostas acima e eles são avaliados as com métricas de qualidade conhecidas, além de uma nova métrica que está sendo proposta neste trabalho. / Living beings have an amazing capacity to deal with complex environments with large amounts of information autonomously. They are the perfect model for bioinspired artificial system development. The artificial neural network developed by Kohonen is an excellent example of a system based on biological models. In this thesis, we will discuss illustratively pattern recognition and pattern generalization in high dimensional data space by biological system. Then, a brief discussion of how they manage dimensionality reduction to optimize memory space and speed up information access in order to categorize and recognize patterns. The new proposals developed in this thesis are useful for applications of unsupervised knowledge extraction using self-organizing maps. The proposals use Kohonens model. However, any self-organizing neural network in general can also use the proposed techniques. It will be presented a visual reconstruction technique for Kohonens neurons, which was generated by hybrid method PCA+SOM. This technique is useful when working with images database. It is also proposed a method for improving the representation of SOMs map and discussing the result of the SOMs mapping as a generalization of the information data space. Finally, it is proposed a method for exploring high dimension data space in a self-organized way on the data manifold. This new proposal was called Self Organizing Manifold Mapping (SOMM). We present the results of computational experiments on each of the above proposals and evaluate the results using known quality metrics, as well as a new metric that is being proposed in this thesis. Aprendizado de variedades Dimensionality reduction Manifold learning Mapas auto-organizados Redução de dimensionalidade Self Organizing Maps SOM SOM
43	Robust subspace estimation via low-rank and sparse decomposition and applications in computer vision Ebadi, Salehe Erfanian January 2018 (has links) Recent advances in robust subspace estimation have made dimensionality reduction and noise and outlier suppression an area of interest for research, along with continuous improvements in computer vision applications. Due to the nature of image and video signals that need a high dimensional representation, often storage, processing, transmission, and analysis of such signals is a difficult task. It is therefore desirable to obtain a low-dimensional representation for such signals, and at the same time correct for corruptions, errors, and outliers, so that the signals could be readily used for later processing. Major recent advances in low-rank modelling in this context were initiated by the work of Cand`es et al. [17] where the authors provided a solution for the long-standing problem of decomposing a matrix into low-rank and sparse components in a Robust Principal Component Analysis (RPCA) framework. However, for computer vision applications RPCA is often too complex, and/or may not yield desirable results. The low-rank component obtained by the RPCA has usually an unnecessarily high rank, while in certain tasks lower dimensional representations are required. The RPCA has the ability to robustly estimate noise and outliers and separate them from the low-rank component, by a sparse part. But, it has no mechanism of providing an insight into the structure of the sparse solution, nor a way to further decompose the sparse part into a random noise and a structured sparse component that would be advantageous in many computer vision tasks. As videos signals are usually captured by a camera that is moving, obtaining a low-rank component by RPCA becomes impossible. In this thesis, novel Approximated RPCA algorithms are presented, targeting different shortcomings of the RPCA. The Approximated RPCA was analysed to identify the most time consuming RPCA solutions, and replace them with simpler yet tractable alternative solutions. The proposed method is able to obtain the exact desired rank for the low-rank component while estimating a global transformation to describe camera-induced motion. Furthermore, it is able to decompose the sparse part into a foreground sparse component, and a random noise part that contains no useful information for computer vision processing. The foreground sparse component is obtained by several novel structured sparsity-inducing norms, that better encapsulate the needed pixel structure in visual signals. Moreover, algorithms for reducing complexity of low-rank estimation have been proposed that achieve significant complexity reduction without sacrificing the visual representation of video and image information. The proposed algorithms are applied to several fundamental computer vision tasks, namely, high efficiency video coding, batch image alignment, inpainting, and recovery, video stabilisation, background modelling and foreground segmentation, robust subspace clustering and motion estimation, face recognition, and ultra high definition image and video super-resolution. The algorithms proposed in this thesis including batch image alignment and recovery, background modelling and foreground segmentation, robust subspace clustering and motion segmentation, and ultra high definition image and video super-resolution achieve either state-of-the-art or comparable results to existing methods.
44	Evaluating immersive approaches to multidimensional information visualization / Avaliando abordagens imersivas para visualização de informações multidimensionais Wagner Filho, Jorge Alberto January 2018 (has links) O uso de novos recursos de display e interação para suportar a visualização imersiva de dados e incrementar o raciocínio analítico é uma tendência de pesquisa em Visualização de Informações. Neste trabalho, avaliamos o uso de ambientes baseados em HMD para a exploração de dados multidimensionais, representados em scatterplots 3D como resultado de redução de dimensionalidade. Nós apresentamos uma nova modelagem para o problema de avaliação neste contexto, levando em conta os dois fatores cuja interação determina o impacto no desempenho total nas tarefas: a diferença nos erros introduzidos ao se realizar redução de dimensionalidade para 2D ou 3D, e a diferença nos erros de percepção humana sob diferentes condições de visualização. Este framework em duas etapas oferece uma abordagem simples para estimar os benefícios de se utilizar um setup 3D imersivo para um dado conjunto de dados. Como caso de uso, os erros de redução de dimensionalidade para uma série de conjuntos de dados de votações na Câmara dos Deputados, ao se utilizar duas ou três dimensões, são avaliados por meio de uma abordagem empírica baseada em tarefas. O erro de percepção e o desempenho geral de tarefa, por sua vez, são avaliados através de estudos controlados comparativos com usuários. Comparando-se visualizações baseadas em desktop (2D e 3D) e em HMD (3D), resultados iniciais indicaram que os erros de percepção foram baixos e similares em todas abordagens, resultando em benefícios para o desempenho geral em ambas técnicas 3D A condição imersiva, no entanto, demonstrou requerer menor esforço para encontrar as informações e menos navegação, além de prover percepções subjetivas de precisão e engajamento muito maiores. Todavia, o uso de navegação por voo livre resultou em tempos ineficientes e frequente desconforto nos usuários. Em um segundo momento, implementamos e avaliamos uma abordagem alternativa de exploração de dados, onde o usuário permanece sentado e mudanças no ponto de vista só são possíveis por meio de movimentos físicos. Toda a manipulação é realizada diretamente por gestos aéreos naturais, com os dados sendo renderizados ao alcance dos braços. A reprodução virtual de uma cópia exata da mesa de trabalho do analista visa aumentar a imersão e possibilitar a interação tangível com controles e informações bidimensionais associadas. Um segundo estudo com usuários foi conduzido em comparação a uma versão equivalente baseada em desktop, explorando um conjunto de 9 tarefas representativas de percepção e interação, baseadas em literatura prévia. Nós demonstramos que o nosso protótipo, chamado VirtualDesk, apresentou resultados excelentes em relação a conforto e imersão, e desempenho equivalente ou superior em todas tarefas analíticas, enquanto adicionando pouco ou nenhum tempo extra e ampliando a exploração dos dados. / The use of novel displays and interaction resources to support immersive data visualization and improve the analytical reasoning is a research trend in Information Visualization. In this work, we evaluate the use of HMD-based environments for the exploration of multidimensional data, represented in 3D scatterplots as a result of dimensionality reduction. We present a new modelling for the evaluation problem in such a context, accounting for the two factors whose interplay determine the impact on the overall task performance: the difference in errors introduced by performing dimensionality reduction to 2D or 3D, and the difference in human perception errors under different visualization conditions. This two-step framework offers a simple approach to estimate the benefits of using an immersive 3D setup for a particular dataset. As use case, the dimensionality reduction errors for a series of roll calls datasets when using two or three dimensions are evaluated through an empirical task-based approach. The perception error and overall task performance are assessed through controlled comparative user studies. When comparing desktop-based (2D and 3D) with an HMD-based (3D) visualization, initial results indicated that perception errors were low and similar in all approaches, resulting in overall performance benefits in both 3D techniques. The immersive condition, however, was found to require less effort to find information and less navigation, besides providing much larger subjective perception of accuracy and engagement. Nonetheless, the use of flying navigation resulted in inefficient times and frequent user discomfort In a second moment, we implemented and evaluated an alternative data exploration approach where the user remains seated and viewpoint change is only realisable through physical movements. All manipulation is done directly by natural mid-air gestures, with the data being rendered at arm’s reach. The virtual reproduction of an exact copy of the analyst’s desk aims to increase immersion and enable tangible interaction with controls and two dimensional associated information. A second user study was carried out comparing this scenario to a desktop-based equivalent, exploring a set of 9 representative perception and interaction tasks based on previous literature. We demonstrate that our prototype setup, named VirtualDesk, presents excellent results regarding user comfort and immersion, and performs equally or better in all analytical tasks, while adding minimal or no time overhead and amplifying data exploration. 3D Visualização Immersive visualization Abstract information visualization Dimensionality reduction 3D scatterplots Virtual reality
45	Técnicas computacionais de apoio à classificação visual de imagens e outros dados / Computational techniques to support classification of images and other data Paiva, José Gustavo de Souza 20 December 2012 (has links) O processo automático de classificação de dados em geral, e em particular de classificação de imagens, é uma tarefa computacionalmente intensiva e variável em termos de precisão, sendo consideravelmente dependente da configuração do classificador e da representação dos dados utilizada. Muitos dos fatores que afetam uma adequada aplicação dos métodos de classificação ou categorização para imagens apontam para a necessidade de uma maior interferência do usuário no processo. Para isso são necessárias mais ferramentas de apoio às várias etapas do processo de classificação, tais como, mas não limitadas, a extração de características, a parametrização dos algoritmos de classificação e a escolha de instâncias de treinamento adequadas. Este doutorado apresenta uma metodologia para Classificação Visual de Imagens, baseada na inserção do usuário no processo de classificação automática através do uso de técnicas de visualização. A ideia é permitir que o usuário participe de todos os passos da classificação de determinada coleção, realizando ajustes e consequentemente melhorando os resultados de acordo com suas necessidades. Um estudo de diversas técnicas de visualização candidatas para a tarefa é apresentado, com destaque para as árvores de similaridade, sendo apresentadas melhorias do algoritmo de construção em termos de escalabilidade visual e de tempo de processamento. Adicionalmente, uma metodologia de redução de dimensionalidade visual semi-supervisionada é apresentada para apoiar, pela utilização de ferramentas visuais, a criação de espaços reduzidos que melhorem as características de segregação do conjunto original de características. A principal contribuição do trabalho é um sistema de classificação visual incremental que incorpora todos os passos da metodologia proposta, oferecendo ferramentas interativas e visuais que permitem a interferência do usuário na classificação de coleções incrementais com configuração de classes variável. Isso possibilita a utilização do conhecimento do ser humano na construção de classificadores que se adequem a diferentes necessidades dos usuários em diferentes cenários, produzindo resultados satisfatórios para coleções de dados diversas. O foco desta tese é em categorização de coleções de imagens, com exemplos também para conjuntos de dados textuais / Automatic data classification in general, and image classification in particular, are computationally intensive tasks with variable results concerning precision, being considerably dependent on the classifier´s configuration and data representation. Many of the factors that affect an adequate application of classification or categorization methods for images point to the need for more user interference in the process. To accomplish that, it is necessary to develop a larger set of supporting tools for the various stages of the classification set up, such as, but not limited to, feature extraction, parametrization of the classification algorithm and selection of adequate training instances. This doctoral Thesis presents a Visual Image Classification methodology based on the user´s insertion in the classification process through the use of visualization techniques. The idea is to allow the user to participate in all classification steps, adjusting several stages and consequently improving the results according to his or her needs. A study on several candidate visualization techniques is presented, with emphasis on similarity trees, and improvements of the tree construction algorithm, both in visual and time scalability, are shown. Additionally, a visual semi-supervised dimensionality reduction methodology was developed to support, through the use of visual tools, the creation of reduced spaces that improve segregation of the original feature space. The main contribution of this work is an incremental visual classification system incorporating all the steps of the proposed methodology, and providing interactive and visual tools that permit user controlled classification of an incremental collection with evolving class configuration. It allows the use of the human knowledge on the construction of classifiers that adapt to different user needs in different scenarios, producing satisfactory results for several data collections. The focus of this Thesis is image data sets, with examples also in classification of textual collections Classificação visual de dados Redução de dimensionalidade Visual data classification Visualização de informação
46	Mapeamento e visualização de dados em alta dimensão com mapas auto-organizados. / Mapping and visualization of high dimensional data with self-organized maps. Edson Caoru Kitani 14 June 2013 (has links) Os seres vivos têm uma impressionante capacidade de lidar com ambientes complexos com grandes quantidades de informações de forma muito autônoma. Isto os torna um modelo ideal para o desenvolvimento de sistemas artificiais bioinspirados. A rede neural artificial auto-organizada de Kohonen é um excelente exemplo de um sistema baseado nos modelos biológicos. Esta tese discutirá ilustrativamente o reconhecimento e a generalização de padrões em alta dimensão nos sistemas biológicos e como eles lidam com redução de dimensionalidade para otimizar o armazenamento e o acesso às informações memorizadas para fins de reconhecimento e categorização de padrões, mas apenas para contextualizar o tema com as propostas desta tese. As novas propostas desenvolvidas nesta tese são úteis para aplicações de extração não supervisionada de conhecimento a partir dos mapas auto-organizados. Trabalha-se sobre o modelo da Rede Neural de Kohonen, mas algumas das metodologias propostas também são aplicáveis com outras abordagens de redes neurais auto-organizadas. Será apresentada uma técnica de reconstrução visual dos neurônios do Mapa de Kohonen gerado pelo método híbrido PCA+SOM. Essa técnica é útil quando se trabalha com banco de dados de imagens. Propõe-se também um método para melhorar a representação dos dados do mapa SOM e discute-se o resultado do mapeamento SOM como uma generalização das informações do espaço de dados. Finalmente, apresenta-se um método de exploração de espaço de dados em alta dimensão de maneira auto-organizada, baseado no manifold dos dados, cuja proposta foi denominada Self Organizing Manifold Mapping (SOMM). São apresentados os resultados computacionais de ensaios realizados com cada uma das propostas acima e eles são avaliados as com métricas de qualidade conhecidas, além de uma nova métrica que está sendo proposta neste trabalho. / Living beings have an amazing capacity to deal with complex environments with large amounts of information autonomously. They are the perfect model for bioinspired artificial system development. The artificial neural network developed by Kohonen is an excellent example of a system based on biological models. In this thesis, we will discuss illustratively pattern recognition and pattern generalization in high dimensional data space by biological system. Then, a brief discussion of how they manage dimensionality reduction to optimize memory space and speed up information access in order to categorize and recognize patterns. The new proposals developed in this thesis are useful for applications of unsupervised knowledge extraction using self-organizing maps. The proposals use Kohonens model. However, any self-organizing neural network in general can also use the proposed techniques. It will be presented a visual reconstruction technique for Kohonens neurons, which was generated by hybrid method PCA+SOM. This technique is useful when working with images database. It is also proposed a method for improving the representation of SOMs map and discussing the result of the SOMs mapping as a generalization of the information data space. Finally, it is proposed a method for exploring high dimension data space in a self-organized way on the data manifold. This new proposal was called Self Organizing Manifold Mapping (SOMM). We present the results of computational experiments on each of the above proposals and evaluate the results using known quality metrics, as well as a new metric that is being proposed in this thesis. Aprendizado de variedades Mapas auto-organizados Redução de dimensionalidade SOM Dimensionality reduction Manifold learning Self Organizing Maps SOM
47	Random neural networks for dimensionality reduction and regularized supervised learning Hu, Renjie 01 August 2019 (has links) This dissertation explores Random Neural Networks (RNNs) in several aspects and their applications. First, Novel RNNs have been proposed for dimensionality reduction and visualization. Based on Extreme Learning Machines (ELMs) and Self-Organizing Maps (SOMs) a new method is created to identify the important variables and visualize the data. This technique reduces the curse of dimensionality and improves furthermore the interpretability of the visualization and is tested on real nursing survey datasets. ELM-SOM+ is an autoencoder created to preserves the intrinsic quality of SOM and also brings continuity to the projection using two ELMs. This new methodology shows considerable improvement over SOM on real datasets. Second, as a Supervised Learning method, ELMs has been applied to the hierarchical multiscale method to bridge the the molecular dynamics to continua. The method is tested on simulation data and proven to be efficient for passing the information from one scale to another. Lastly, the regularization of ELMs has been studied and a new regularization algorithm for ELMs is created using a modified Lanczos Algorithm. The Lanczos ELM on average divide computational time by 20 and reduce the Normalized MSE by 14% comparing with regular ELMs. Data Visualization Dimensionality Reduction Feature Selection Lanczos Algorithm Random Neural Networks Regularization Industrial Engineering
48	Machine learning with the cancer genome atlas head and neck squamous cell carcinoma dataset: improving usability by addressing inconsistency, sparsity, and high-dimensionality Rendleman, Michael 01 May 2019 (has links) In recent years, more data is becoming available for historical oncology case analysis. A large dataset that describes over 500 patient cases of Head and Neck Squamous Cell Carcinoma is a potential goldmine for finding ways to improve oncological decision support. Unfortunately, the best approaches for finding useful inferences are unknown. With so much information, from DNA and RNA sequencing to clinical records, we must use computational learning to find associations and biomarkers. The available data has sparsity, inconsistencies, and is very large for some datatypes. We processed clinical records with an expert oncologist and used complex modeling methods to substitute (impute) data for cases missing treatment information. We used machine learning algorithms to see if imputed data is useful for predicting patient survival. We saw no difference in ability to predict patient survival with the imputed data, though imputed treatment variables were more important to survival models. To deal with the large number of features in RNA expression data, we used two approaches: using all the data with High Performance Computers, and transforming the data into a smaller set of features (sparse principal components, or SPCs). We compared the performance of survival models with both datasets and saw no differences. However, the SPC models trained more quickly while also allowing us to pinpoint the biological processes each SPC is involved in to inform future biomarker discovery. We also examined ten processed molecular features for survival prediction ability and found some predictive power, though not enough to be clinically useful. cancer dimensionality reduction head and neck cancer imputation machine learning mutation significance Electrical and Computer Engineering
49	MULTIFACTOR DIMENSIONALITY REDUCTION WITH P RISK SCORES PER PERSON Li, Ye 01 January 2018 (has links) After reviewing Multifactor Dimensionality Reduction(MDR) and its extensions, an approach to obtain P(larger than 1) risk scores is proposed to predict the continuous outcome for each subject. We study the mean square error(MSE) of dimensionality reduced models fitted with sets of 2 risk scores and investigate the MSE for several special cases of the covariance matrix. A methodology is proposed to select a best set of P risk scores when P is specified a priori. Simulation studies based on true models of different dimensions(larger than 3) demonstrate that the selected set of P(larger than 1) risk scores outperforms the single aggregated risk score generated in AQMDR and illustrate that our methodology can determine a best set of P risk scores effectively. With different assumptions on the dimension of the true model, we considered the preferable set of risk scores from the best set of two risk scores and the best set of three risk scores. Further, we present a methodology to access a set of P risk scores when P is not given a priori. The expressions of asymptotic estimated mean square error of prediction(MSPE) are derived for a 1-dimensional model and 2-dimensional model. In the last main chapter, we apply the methodology of selecting a best set of risk scores where P has been specified a priori to Alzheimer’s Disease data and achieve a set of 2 risk scores and a set of three risk scores for each subject to predict measurements on biomarkers that are crucially involved in Alzheimer’s Disease. Multifactor Dimensionality Reduction Risk Score Continuous outcome Gene-gene Interaction Statistics and Probability
50	Bringing interpretability and visualization with artificial neural networks Gritsenko, Andrey 01 August 2017 (has links) Extreme Learning Machine (ELM) is a training algorithm for Single-Layer Feed-forward Neural Network (SLFN). The difference in theory of ELM from other training algorithms is in the existence of explicitly-given solution due to the immutability of initialed weights. In practice, ELMs achieve performance similar to that of other state-of-the-art training techniques, while taking much less time to train a model. Experiments show that the speedup of training ELM is up to the 5 orders of magnitude comparing to standard Error Back-propagation algorithm. ELM is a recently discovered technique that has proved its efficiency in classic regression and classification tasks, including multi-class cases. In this thesis, extensions of ELMs for non-typical for Artificial Neural Networks (ANNs) problems are presented. The first extension, described in the third chapter, allows to use ELMs to get probabilistic outputs for multi-class classification problems. The standard way of solving this type of problems is based 'majority vote' of classifier's raw outputs. This approach can rise issues if the penalty for misclassification is different for different classes. In this case, having probability outputs would be more useful. In the scope of this extension, two methods are proposed. Additionally, an alternative way of interpreting probabilistic outputs is proposed. ELM method prove useful for non-linear dimensionality reduction and visualization, based on repetitive re-training and re-evaluation of model. The forth chapter introduces adaptations of ELM-based visualization for classification and regression tasks. A set of experiments has been conducted to prove that these adaptations provide better visualization results that can then be used for perform classification or regression on previously unseen samples. Shape registration of 3D models with non-isometric distortion is an open problem in 3D Computer Graphics and Computational Geometry. The fifth chapter discusses a novel approach for solving this problem by introducing a similarity metric for spectral descriptors. Practically, this approach has been implemented in two methods. The first one utilizes Siamese Neural Network to embed original spectral descriptors into a lower dimensional metric space, for which the Euclidean distance provides a good measure of similarity. The second method uses Extreme Learning Machines to learn similarity metric directly for original spectral descriptors. Over a set of experiments, the consistency of the proposed approach for solving deformable registration problem has been proven. Big Data Data Visualization Dimensionality Reduction Extreme Learning Machines Probabilistic Classification Shape Registration Industrial Engineering

Search results