Global ETD Search

181	Composable, Distributed-state Models for High-dimensional Time Series Taylor, Graham William 03 March 2010 (has links) In this thesis we develop a class of nonlinear generative models for high-dimensional time series. The first key property of these models is their distributed, or "componential" latent state, which is characterized by binary stochastic variables which interact to explain the data. The second key property is the use of an undirected graphical model to represent the relationship between latent state (features) and observations. The final key property is composability: the proposed class of models can form the building blocks of deep networks by successively training each model on the features extracted by the previous one. We first propose a model based on the Restricted Boltzmann Machine (RBM) that uses an undirected model with binary latent variables and real-valued "visible" variables. The latent and visible variables at each time step receive directed connections from the visible variables at the last few time-steps. This "conditional" RBM (CRBM) makes on-line inference efficient and allows us to use a simple approximate learning procedure. We demonstrate the power of our approach by synthesizing various motion sequences and by performing on-line filling in of data lost during motion capture. We also explore CRBMs as priors in the context of Bayesian filtering applied to multi-view and monocular 3D person tracking. We extend the CRBM in a way that preserves its most important computational properties and introduces multiplicative three-way interactions that allow the effective interaction weight between two variables to be modulated by the dynamic state of a third variable. We introduce a factoring of the implied three-way weight tensor to permit a more compact parameterization. The resulting model can capture diverse styles of motion with a single set of parameters, and the three-way interactions greatly improve its ability to blend motion styles or to transition smoothly among them. In separate but related work, we revisit Products of Hidden Markov Models (PoHMMs). We show how the partition function can be estimated reliably via Annealed Importance Sampling. This enables us to demonstrate that PoHMMs outperform various flavours of HMMs on a variety of tasks and metrics, including log likelihood. machine learning time series neural networks unsupervised learning restricted Boltzmann machines hidden Markov models graphical models motion capture dynamical models generative models computer vision tracking 0984
182	Acoustic segment modeling and preference ranking for music information retrieval Reed, Jeremy T. 27 October 2010 (has links) This dissertation focuses on improving content-based recommendation systems for music. Specifically, progress in the development in music content-based recommendation systems has stalled in recent years due to some faulty assumptions: 1. most acoustic content-based systems for music information retrieval (MIR) assume a bag-of-frames model, where it is assumed that a song contains a simplistic, global audio texture 2. genre, style, mood, and authors are appropriate categories for machine-oriented recommendation 3. similarity is a universal construct and does not vary among different users The main contribution of this dissertation is to address these faulty assumptions by describing a novel approach in MIR that provides user-centric, content-based recommendations based on statistics of acoustic sound elements. First, this dissertation presents the acoustic segment modeling framework that describes a piece of music as a temporal sequence of acoustic segment models (ASMs), which represent individual polyphonic sound elements. A dictionary of ASMs generated in an unsupervised process defines a vocabulary of acoustic tokens that are able to transcribe new musical pieces. Next, standard text-based information retrieval algorithms use statistics of ASM counts to perform various retrieval tasks. Despite a simple feature set compared to other content-based genre recommendation algorithms, the acoustic segment modeling approach is highly competitive on standard genre classification databases. Fundamental to the success of the acoustic segment modeling approach is the ability to model acoustical semantics in a musical piece, which is demonstrated by the detection of musical attributes on temporal characteristics. Further, it is shown that the acoustic segment modeling procedure is able to capture the inherent structure of melody by providing near state-of-the-art performance on an automatic chord recognition task. This dissertation demonstrates that some classification tasks, such as genre, possess information that is not contained in the acoustic signal; therefore, attempts at modeling these categories using only the acoustic content is ill-fated. Further, notions of music similarity are personal in nature and are not derived from a universal ontology. Therefore, this dissertation addresses the second and third limitation of previous content-based retrieval approaches by presenting a user-centric preference rating algorithm. Individual users possess their own cognitive construct of similarity; therefore, retrieval algorithms must demonstrate this flexibility. The proposed rating algorithm is based on the principle of minimum classification error (MCE) training, which has been demonstrated to be robust against outliers and also minimizes the Parzen estimate of the theoretical classification risk. The outlier immunity property limits the effect of labels that arise from non-content-based sources. The MCE-based algorithm performs better than a similar ratings prediction algorithm. Further, this dissertation discusses extensions and future work. Acoustic modeling Music information retrieval Preference ranking Unsupervised learning Acoustic segment modeling Music and technology Music and the Internet Automatic speech recognition Acoustic models
183	Composable, Distributed-state Models for High-dimensional Time Series Taylor, Graham William 03 March 2010 (has links) In this thesis we develop a class of nonlinear generative models for high-dimensional time series. The first key property of these models is their distributed, or "componential" latent state, which is characterized by binary stochastic variables which interact to explain the data. The second key property is the use of an undirected graphical model to represent the relationship between latent state (features) and observations. The final key property is composability: the proposed class of models can form the building blocks of deep networks by successively training each model on the features extracted by the previous one. We first propose a model based on the Restricted Boltzmann Machine (RBM) that uses an undirected model with binary latent variables and real-valued "visible" variables. The latent and visible variables at each time step receive directed connections from the visible variables at the last few time-steps. This "conditional" RBM (CRBM) makes on-line inference efficient and allows us to use a simple approximate learning procedure. We demonstrate the power of our approach by synthesizing various motion sequences and by performing on-line filling in of data lost during motion capture. We also explore CRBMs as priors in the context of Bayesian filtering applied to multi-view and monocular 3D person tracking. We extend the CRBM in a way that preserves its most important computational properties and introduces multiplicative three-way interactions that allow the effective interaction weight between two variables to be modulated by the dynamic state of a third variable. We introduce a factoring of the implied three-way weight tensor to permit a more compact parameterization. The resulting model can capture diverse styles of motion with a single set of parameters, and the three-way interactions greatly improve its ability to blend motion styles or to transition smoothly among them. In separate but related work, we revisit Products of Hidden Markov Models (PoHMMs). We show how the partition function can be estimated reliably via Annealed Importance Sampling. This enables us to demonstrate that PoHMMs outperform various flavours of HMMs on a variety of tasks and metrics, including log likelihood. machine learning time series neural networks unsupervised learning restricted Boltzmann machines hidden Markov models graphical models motion capture dynamical models generative models computer vision tracking 0984
184	Étude de techniques d'apprentissage non-supervisé pour l'amélioration de l'entraînement supervisé de modèles connexionnistes Larochelle, Hugo January 2008 (has links) Thèse numérisée par la Division de la gestion de documents et des archives de l'Université de Montréal Apprentissage non-supervisé Réseau de neurones artificiel Machine de Boltzmann restreinte Autoassociateur Autoencodeur Architecture profonde Unsupervised learning Neural network Restricted Boltzmann machine Autoassociator Autoencoder Deep architecture Deep learning
185	Machine Learning Approaches to Refining Post-translational Modification Predictions and Protein Identifications from Tandem Mass Spectrometry Chung, Clement 11 December 2012 (has links) Tandem mass spectrometry (MS/MS) is the dominant approach for large-scale peptide sequencing in high-throughput proteomic profiling studies. The computational analysis of MS/MS spectra involves the identification of peptides from experimental spectra, especially those with post-translational modifications (PTMs), as well as the inference of protein composition based on the putative identified peptides. In this thesis, we tackled two major challenges associated with an MS/MS analysis: 1) the refinement of PTM predictions from MS/MS spectra and 2) the inference of protein composition based on peptide predictions. We proposed two PTM prediction refinement algorithms, PTMClust and its Bayesian nonparametric extension \emph{i}PTMClust, and a protein identification algorithm, pro-HAP, that is based on a novel two-layer hierarchical clustering approach that leverages prior knowledge about protein function. Individually, we show that our two PTM refinement algorithms outperform the state-of-the-art algorithms and our protein identification algorithm performs at par with the state of the art. Collectively, as a demonstration of our end-to-end MS/MS computational analysis of a human chromatin protein complex study, we show that our analysis pipeline can find high confidence putative novel protein complex members. Moreover, it can provide valuable insights into the formation and regulation of protein complexes by detailing the specificity of different PTMs for the members in each complex. Machine Learning Unsupervised Learning Clustering Mass Spectrometry Protein Identification Nonparameteric Bayesian method Hierarchical clustering 0800 0984 0715
186	Machine Learning Approaches to Refining Post-translational Modification Predictions and Protein Identifications from Tandem Mass Spectrometry Chung, Clement 11 December 2012 (has links) Tandem mass spectrometry (MS/MS) is the dominant approach for large-scale peptide sequencing in high-throughput proteomic profiling studies. The computational analysis of MS/MS spectra involves the identification of peptides from experimental spectra, especially those with post-translational modifications (PTMs), as well as the inference of protein composition based on the putative identified peptides. In this thesis, we tackled two major challenges associated with an MS/MS analysis: 1) the refinement of PTM predictions from MS/MS spectra and 2) the inference of protein composition based on peptide predictions. We proposed two PTM prediction refinement algorithms, PTMClust and its Bayesian nonparametric extension \emph{i}PTMClust, and a protein identification algorithm, pro-HAP, that is based on a novel two-layer hierarchical clustering approach that leverages prior knowledge about protein function. Individually, we show that our two PTM refinement algorithms outperform the state-of-the-art algorithms and our protein identification algorithm performs at par with the state of the art. Collectively, as a demonstration of our end-to-end MS/MS computational analysis of a human chromatin protein complex study, we show that our analysis pipeline can find high confidence putative novel protein complex members. Moreover, it can provide valuable insights into the formation and regulation of protein complexes by detailing the specificity of different PTMs for the members in each complex. Machine Learning Unsupervised Learning Clustering Mass Spectrometry Protein Identification Nonparameteric Bayesian method Hierarchical clustering 0800 0984 0715
187	Unsupervised Identification of the User’s Query Intent in Web Search Calderón-Benavides, Liliana 27 September 2011 (has links) This doctoral work focuses on identifying and understanding the intents that motivate a user to perform a search on the Web. To this end, we apply machine learning models that do not require more information than the one provided by the very needs of the users, which in this work are represented by their queries. The knowledge and interpretation of this invaluable information can help search engines to obtain resources especially relevant to users, and thus improve their satisfaction. By means of unsupervised learning techniques, which have been selected according to the context of the problem being solved, we show that is not only possible to identify the user’s intents, but that this process can be conducted automatically. The research conducted in this thesis has involved an evolutionary process that starts from the manual analysis of different sets of real user queries from a search engine. The work passes through the proposition of a new classification of user’s query intents; the application of different unsupervised learning techniques to identify those intents; up to determine that the user’s intents, rather than being considered as an uni–dimensional problem, should be conceived as a composition of several aspects, or dimensions (i.e., as a multi–dimensional problem), that contribute to clarify and to establish what the user’s intents are. Furthermore, from this last proposal, we have configured a framework for the on–line identification of the user’s query intent. Overall, the results from this research have shown to be effective for the problem of identifying user’s query intent. / Este trabajo doctoral se enfoca en identificar y entender las intenciones que motivan a los usuarios a realizar búsquedas en la Web a través de la aplicación de métodos de aprendizaje automático que no requieren datos adicionales más que las necesidades de información de los mismos usuarios, representadas a través de sus consultas. El conocimiento y la interpretación de esta información, de valor incalculable, puede ayudar a los sistemas de búsqueda Web a encontrar recursos particularmente relevantes y así mejorar la satisfacción de sus usuarios. A través del uso de técnicas de aprendizaje no supervisado, las cuales han sido seleccionadas dependiendo del contexto del problema a solucionar, y cuyos resultados han demostrado ser efectivos para cada uno de los problemas planteados, a lo largo de este trabajo se muestra que no solo es posible identificar las intenciones de los usuarios, sino que este es un proceso que se puede llevar a cabo de manera automática. La investigación desarrollada en esta tesis ha implicado un proceso evolutivo, el cual inicia con el análisis de la clasificación manual de diferentes conjuntos de consultas que usuarios reales han sometido a un motor de búsqueda. El trabajo pasa a través de la proposición de una nueva clasificación de las intenciones de consulta de usuarios, y el uso de diferentes técnicas de aprendizaje no supervisado para identificar dichas intenciones, llegando hasta establecer que éste no es un problema unidimensional, sino que debería ser considerado como un problema de múltiples dimensiones, donde cada una de dichas dimensiones, o facetas, contribuye a clarificar y establecer cuál es la intención del usuario. A partir de este último trabajo, hemos creado un modelo para la identificar la intención del usuario en un escenario on–line. Unsupervised Learning User’s query intent Web Usage Mining Data mining Seeking Behavior Aprendizaje No-Supervisado Intención de la consulta del usuario Minería de Uso Web Minería de Datos Comportamiento de Búsqueda 62
188	Towards Understanding ICU Procedures using Similarities in Patient Trajectories : An exploratory study on the MIMIC-III intensive care database Galozy, Alexander January 2018 (has links) Recent advancements in Artificial Intelligence has prompted a shearexplosion of new research initiatives and applications, improving notonly existing technologies, but also opening up opportunities for newand exiting applications. This thesis explores the MIMIC-III intensive care unit database and conducts experiment on an interpretable feature space based on sever-ty scores, defining a patient health state, commonly used to predict mortality in an ICU setting. Patient health state trajectories are clustered and correlated with administered medication and performed procedures to get a better understanding of the potential usefulness in evaluating treatments on their effect on said health state, where commonalities and deviations in treatment can be understood. Furthermore, medication and procedure classification is carried out to explore their predictability using the severity subscore feature space. ICU Severity Scores Patient Clustering Mortality Health State Patient Trajectory Clustering Unsupervised learning Classification Data Mining AI Engineering and Technology Teknik och teknologier
189	[en] HYBRID GENETIC ALGORITHM FOR THE MINIMUM SUM-OF-SQUARES CLUSTERING PROBLEM / [pt] ALGORITMO GENÉTICO HÍBRIDO PARA O PROBLEMA DE CLUSTERIZAÇÃO MINIMUM SUM-OF-SQUARES DANIEL LEMES GRIBEL 27 July 2017 (has links) [pt] Clusterização desempenha um papel importante em data mining, sendo útil em muitas áreas que lidam com a análise exploratória de dados, tais como recuperação de informações, extração de documentos e segmentação de imagens. Embora sejam essenciais em aplicações de data mining, a maioria dos algoritmos de clusterização são métodos ad-hoc. Eles carecem de garantias na qualidade da solução, que em muitos casos está relacionada a uma convergência prematura para um mínimo local no espaço de busca. Neste trabalho, abordamos o problema de clusterização a partir da perspectiva de otimização, onde propomos um algoritmo genético híbrido para resolver o problema Minimum Sum-of-Squares Clustering (MSSC, em inglês). A meta-heurística proposta é capaz de escapar de mínimos locais e gerar soluções quase ótimas para o problema MSSC. Os resultados mostram que o método proposto superou os resultados atuais da literatura – em termos de qualidade da solução – para quase todos os conjuntos de instâncias considerados para o problema MSSC. / [en] Clustering plays an important role in data mining, being useful in many fields that deal with exploratory data analysis, such as information retrieval, document extraction, and image segmentation. Although they are essential in data mining applications, most clustering algorithms are adhoc methods. They have a lack of guarantee on the solution quality, which in many cases is related to a premature convergence to a local minimum of the search space. In this research, we address the problem of data clustering from an optimization perspective, where we propose a hybrid genetic algorithm to solve the Minimum Sum-of-Squares Clustering (MSSC) problem. This meta-heuristic is capable of escaping from local minima and generating near-optimal solutions to the MSSC problem. Results show that the proposed method outperformed the best current literature results - in terms of solution quality - for almost all considered sets of benchmark instances for the MSSC objective. [pt] MINERACAO DE DADOS [en] DATA MINING [pt] META-HEURISTICAS [en] META-HEURISTICS [pt] CLUSTERIZACAO [en] CLUSTERING [pt] APRENDIZADO NAO-SUPERVISIONADO [en] UNSUPERVISED LEARNING [pt] MINIMA SOMA DOS QUADRADOS [en] MINIMUM SUMOF- SQUARES
190	Uma abordagem interativa guiada por semântica para identificação e recuperação de imagens / A semantic guided interactive image retrieval approach Gonçalves, Filipe Marcel Fernandes [UNESP] 17 August 2016 (has links) Submitted by Filipe Marcel Fernandes Gonçalves null (filipemfg@gmail.com) on 2016-10-13T22:19:26Z No. of bitstreams: 1 Dissertação_Mestrado_Filipe_Marcel_Fernandes_Gonçalves.pdf: 6479864 bytes, checksum: 4596171ab4ce8e8c1a6ce9723f335b36 (MD5) / Approved for entry into archive by Juliano Benedito Ferreira (julianoferreira@reitoria.unesp.br) on 2016-10-19T18:04:08Z (GMT) No. of bitstreams: 1 goncalves_fmf_me_sjrp.pdf: 6479864 bytes, checksum: 4596171ab4ce8e8c1a6ce9723f335b36 (MD5) / Made available in DSpace on 2016-10-19T18:04:08Z (GMT). No. of bitstreams: 1 goncalves_fmf_me_sjrp.pdf: 6479864 bytes, checksum: 4596171ab4ce8e8c1a6ce9723f335b36 (MD5) Previous issue date: 2016-08-17 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) / O grande volume de imagens disponível na Web gerado em diferentes domínios requer um conhecimento especializado para sua a análise e identificação. Nesse sentido, recentes avanços ocorreram com desenvolvimento de técnicas de recuperação de imagens baseadas nas características visuais. Entretanto, o gap semântico entre as características de baixo-nível das imagens e aquilo que a imagem representa ainda é um grande desafio. Uma solução para diminuir o gap semântico consiste em combinar a informação de características visuais das imagens com o conhecimento do domínio de tais imagens. Nesse sentido, ontologias podem auxiliar, já que estruturam o conhecimento. Desse modo, o presente trabalho apresenta uma nova abordagem denominada Recuperação Interativa de Imagens Guiada por Semântica (Semantic Interactive Image Retrieval – SIIR) que combina técnicas de recuperação de imagens baseadas no conteúdo (Content Based Image Retrieval – CBIR) e aprendizado não supervisionado, com o conhecimento definido em ontologias. Desse modo, o trabalho em questão propõe uma nova abordagem a fim de simular o papel dos biólogos na classificação de famílias de Angiospermas a partir de uma imagem e seu conteúdo. Para tanto, foi desenvolvida uma ontologia de estruturas e propriedades de plantas com flor e fruto, de modo a conceitualizar e relacionar tais atributos visando a classificação de famílias de Angiospermas. Para análise das características visuais foram utilizados métodos de extração de características de baixo-nível das imagens. Com relação ao aprendizado não supervisionado foi utilizado o algoritmo RL-Sim a fim de melhorar a eficácia da recuperação das imagens. A abordagem combina técnicas CBIR com ontologias ao utilizar um grafo bipartido e um grafo discriminativo de atributos. O grafo discriminativo de atributos permite a análise semântica utilizada para selecionar o atributo que melhor classifica a planta da imagem de busca. Os atributos selecionados são utilizados para formular uma interação com um usuário, de modo a melhorar a eficácia da recuperação e diminuir os esforços necessários na identificação da planta. O método proposto foi avaliado nos conjuntos de dados públicos Oxford Flowers 17 e 102 Classes, de modo que os resultados demonstram alta eficácia para ambos os conjuntos de dados quando comparados com outras abordagens. / A large amount of images is currently generated in many domains, thus requiring specialized knowledge on the identification and analysis. From one standpoint, many advances have been accomplished in the development of image retrieval techniques based on visual image properties. However, the semantic gap between low-level features and high level concepts still represents a challenge scenario. One another standpoint, knowledge has also been structured in many fields by ontologies. A promising solution for bridging the semantic gap consists in combining the information from low-level features with semantic knowledge. This work proposes a new approach denominated Semantic Interactive Image Retrieval (SIIR) which combines Content Based Image Retrieval (CBIR) and unsupervised learning with ontology techniques. We present a novel approach aiming to simulate the biologists role in the classification of Angiosperm families from image sources and their content. In order to achieve this goal, we developed a domain ontology from plant properties and structures, hence relating features from the Angiosperm families. In regard to Unsupervised Learning, we used the RL-Sim algorithm to improve image classification. The proposed approach combines CBIR techniques with ontologies using a bipartite graph and a discriminative attribute graph. Such graph structures allow a semantic analysis used for the selection of the attribute that best classify the plant. The selected attributes are used for formulating the user interactions, improving the effectiveness and reducing the user efforts required. The proposed method was evaluated on the popular Oxford Flowers 17 and 102 Classes datasets, yielding very high effectiveness results in both datasets when compared to other approaches. Aprendizado não supervisionado Gap Semântico Ontologias CBIR Angiospermas Unsupervised Learning Semantic Gap Ontology Angiosperms Information retrieval systems

Search results