Global ETD Search

1	Bayesian matrix factorisation : inference, priors, and data integration Brouwer, Thomas Alexander January 2017 (has links) In recent years the amount of biological data has increased exponentially. Most of these data can be represented as matrices relating two different entity types, such as drug-target interactions (relating drugs to protein targets), gene expression profiles (relating drugs or cell lines to genes), and drug sensitivity values (relating drugs to cell lines). Not only the size of these datasets is increasing, but also the number of different entity types that they relate. Furthermore, not all values in these datasets are typically observed, and some are very sparse. Matrix factorisation is a popular group of methods that can be used to analyse these matrices. The idea is that each matrix can be decomposed into two or more smaller matrices, such that their product approximates the original one. This factorisation of the data reveals patterns in the matrix, and gives us a lower-dimensional representation. Not only can we use this technique to identify clusters and other biological signals, we can also predict the unobserved entries, allowing us to prune biological experiments. In this thesis we introduce and explore several Bayesian matrix factorisation models, focusing on how to best use them for predicting these missing values in biological datasets. Our main hypothesis is that matrix factorisation methods, and in particular Bayesian variants, are an extremely powerful paradigm for predicting values in biological datasets, as well as other applications, and especially for sparse and noisy data. We demonstrate the competitiveness of these approaches compared to other state-of-the-art methods, and explore the conditions under which they perform the best. We consider several aspects of the Bayesian approach to matrix factorisation. Firstly, the effect of inference approaches that are used to find the factorisation on predictive performance. Secondly, we identify different likelihood and Bayesian prior choices that we can use for these models, and explore when they are most appropriate. Finally, we introduce a Bayesian matrix factorisation model that can be used to integrate multiple biological datasets, and hence improve predictions. This model hybridly combines different matrix factorisation models and Bayesian priors. Through these models and experiments we support our hypothesis and provide novel insights into the best ways to use Bayesian matrix factorisation methods for predictive purposes.
2	Super-resolution methods for fluorescence microscopy Mandula, Ondrej January 2013 (has links) Fluorescence microscopy is an important tool for biological research. However, the resolution of a standard fluorescence microscope is limited by diffraction, which makes it difficult to observe small details of a specimen’s structure. We have developed two fluorescence microscopy methods that achieve resolution beyond the classical diffraction limit. The first method represents an extension of localisation microscopy. We used nonnegative matrix factorisation (NMF) to model a noisy dataset of highly overlapping fluorophores with intermittent intensities. We can recover images of individual sources from the optimised model, despite their high mutual overlap in the original dataset. This allows us to consider blinking quantum dots as bright and stable fluorophores for localisation microscopy. Moreover, NMF allows recovery of sources each having a unique shape. Such a situation can arise, for example, when the sources are located in different focal planes, and NMF can potentially be used for three dimensional superresolution imaging. We discuss the practical aspects of applying NMF to real datasets, and show super-resolution images of biological samples labelled with quantum dots. It should be noted that this technique can be performed on any wide-field epifluorescence microscope equipped with a camera, which makes this super-resolution method very accessible to a wide scientific community. The second optical microscopy method we discuss in this thesis is a member of the growing family of structured illumination techniques. Our main goal is to apply structured illumination to thick fluorescent samples generating a large out-of-focus background. The out-of-focus fluorescence background degrades the illumination pattern, and the reconstructed images suffer from the influence of noise. We present a combination of structured illumination microscopy and line scanning. This technique reduces the out-of-focus fluorescence background, which improves the quality of the illumination pattern and therefore facilitates reconstruction. We present super-resolution, optically sectioned images of a thick fluorescent sample, revealing details of the specimen’s inner structure. In addition, in this thesis we also discuss a theoretical resolution limit for noisy and pixelated data. We correct a previously published expression for the so-called fundamental resolution measure (FREM) and derive FREM for two fluorophores with intermittent intensity. We show that the intensity intermittency of the sources (observed for quantum dots, for example) can increase the “resolution” defined in terms of FREM. 621.3
3	Exploiting piano acoustics in automatic transcription Cheng, Tian January 2016 (has links) In this thesis we exploit piano acoustics to automatically transcribe piano recordings into a symbolic representation: the pitch and timing of each detected note. To do so we use approaches based on non-negative matrix factorisation (NMF). To motivate the main contributions of this thesis, we provide two preparatory studies: a study of using a deterministic annealing EM algorithm in a matrix factorisation-based system, and a study of decay patterns of partials in real-word piano tones. Based on these studies, we propose two generative NMF-based models which explicitly model different piano acoustical features. The first is an attack/decay model, that takes into account the time-varying timbre and decaying energy of piano sounds. The system divides a piano note into percussive attack and harmonic decay stages, and separately models the two parts using two sets of templates and amplitude envelopes. The two parts are coupled by the note activations. We simplify the decay envelope by an exponentially decaying function. The proposed method improves the performance of supervised piano transcription. The second model aims at using the spectral width of partials as an independent indicator of the duration of piano notes. Each partial is represented by a Gaussian function, with the spectral width indicated by the standard deviation. The spectral width is large in the attack part, but gradually decreases to a stable value and remains constant in the decay part. The model provides a new aspect to understand the time-varying timbre of piano notes, but furtherinvestigation is needed to use it effectively to improve piano transcription. We demonstrate the utility of the proposed systems in piano music transcription and analysis. Results show that explicitly modelling piano acoustical features, especially temporal features, can improve the transcription performance. 786.2
4	Use and development of matrix factorisation techniques in the field of brain imaging Pearce, Matthew Craig January 2018 (has links) Matrix factorisation treats observations as linear combinations of basis vectors together with, possibly, additive noise. Notable techniques in this family are Principal Components Analysis and Independent Components Analysis. Applied to brain images, matrix factorisation provides insight into the spatial and temporal structure of data. We improve on current practice with methods that unify different stages of analysis simultaneously for all subjects in a dataset, including dimension estimation and reduction. This results in uncertainty information being carried coherently through the analysis. A computationally efficient approach to correlated multivariate normal distributions is set out. This enables spatial smoothing during the inference of basis vectors, to a level determined by the data. Applied to neuroimaging, this reduces the need for blurring of the data during preprocessing. Orthogonality constraints on the basis are relaxed, allowing for overlapping ‘networks’ of activity. We consider a nonparametric matrix factorisation model inferred using Markov Chain Monte Carlo (MCMC). This approach incorporates dimensionality estimation into the infer- ence process. Novel parallelisation strategies for MCMC on repeated graphs are provided to expedite inference. In simulations, modelling correlation structure is seen to improve source separation where latent basis vectors are not orthogonal. The Cambridge Centre for Ageing and Neuroscience (Cam-CAN) project obtained fMRI data while subjects watched a short film, on 30 of whose recordings we demonstrate the approach. To conduct inference on larger datasets, we provide a fixed dimension Structured Matrix Factorisation (SMF) model, inferred through Variational Bayes (VB). By modelling the components as a mixture, more general distributions can be expressed. The VB approach scaled to 600 subjects from Cam-CAN, enabling a comparison to, and validation of, the main findings of an earlier analysis; notably that subjects’ responses to movie watching became less synchronised with age. We discuss differences in results obtained under the MCMC and VB inferred models.
5	Generalised Bayesian matrix factorisation models Mohamed, Shakir January 2011 (has links) Factor analysis and related models for probabilistic matrix factorisation are of central importance to the unsupervised analysis of data, with a colourful history more than a century long. Probabilistic models for matrix factorisation allow us to explore the underlying structure in data, and have relevance in a vast number of application areas including collaborative filtering, source separation, missing data imputation, gene expression analysis, information retrieval, computational finance and computer vision, amongst others. This thesis develops generalisations of matrix factorisation models that advance our understanding and enhance the applicability of this important class of models. The generalisation of models for matrix factorisation focuses on three concerns: widening the applicability of latent variable models to the diverse types of data that are currently available; considering alternative structural forms in the underlying representations that are inferred; and including higher order data structures into the matrix factorisation framework. These three issues reflect the reality of modern data analysis and we develop new models that allow for a principled exploration and use of data in these settings. We place emphasis on Bayesian approaches to learning and the advantages that come with the Bayesian methodology. Our port of departure is a generalisation of latent variable models to members of the exponential family of distributions. This generalisation allows for the analysis of data that may be real-valued, binary, counts, non-negative or a heterogeneous set of these data types. The model unifies various existing models and constructs for unsupervised settings, the complementary framework to the generalised linear models in regression. Moving to structural considerations, we develop Bayesian methods for learning sparse latent representations. We define ideas of weakly and strongly sparse vectors and investigate the classes of prior distributions that give rise to these forms of sparsity, namely the scale-mixture of Gaussians and the spike-and-slab distribution. Based on these sparsity favouring priors, we develop and compare methods for sparse matrix factorisation and present the first comparison of these sparse learning approaches. As a second structural consideration, we develop models with the ability to generate correlated binary vectors. Moment-matching is used to allow binary data with specified correlation to be generated, based on dichotomisation of the Gaussian distribution. We then develop a novel and simple method for binary PCA based on Gaussian dichotomisation. The third generalisation considers the extension of matrix factorisation models to multi-dimensional arrays of data that are increasingly prevalent. We develop the first Bayesian model for non-negative tensor factorisation and explore the relationship between this model and the previously described models for matrix factorisation. 006.3
6	On the QR Decomposition of H-Matrices Benner, Peter, Mach, Thomas 28 August 2009 (has links) (PDF) The hierarchical (<i>H-</i>) matrix format allows storing a variety of dense matrices from certain applications in a special data-sparse way with linear-polylogarithmic complexity. Many operations from linear algebra like matrix-matrix and matrix-vector products, matrix inversion and LU decomposition can be implemented efficiently using the <i>H</i>-matrix format. Due to its importance in solving many problems in numerical linear algebra like least-squares problems, it is also desirable to have an efficient QR decomposition of <i>H</i>-matrices. In the past, two different approaches for this task have been suggested. We will review the resulting methods and suggest a new algorithm to compute the QR decomposition of an <i>H</i>-matrix. Like other <i>H</i>-arithmetic operations the <i>H</i>QR decomposition is of linear-polylogarithmic complexity. We will compare our new algorithm with the older ones by using two series of test examples and discuss benefits and drawbacks of the new approach. HQR decomposition QR decomposition least squares problem matrix factorisation ddc:510 Hierarchische Matrix Orthogonalisierung
7	Predicting future purchases with matrix factorization Hojlas, Azer, Paulsrud, August January 2022 (has links) This thesis aims to establish the efficacy of using matrix factorization to predict future purchases. Matrix factorisation is a machine learning method, commonly used to implement the collaborative filtering recommendation system. It finds items that a user may be interested in by comparing items that other similar users have rated, explicitly or implicitly, highly. To fulfill the purpose of the thesis, a qualitative and comparative approach was taken. First, three different implementations of matrix factorisation were created and trained on one year of purchase histories. Two generic methods of predicting future purchases, picking a random item and picking the top selling items, were also created to serve as a point of comparison. The ability to predict future purchases was established as the proportion of correct predictions a method could make. All five methods were then tested using a separate data set and the results compared. The results clearly show that matrix factorisation models are better at predicting future purchases than the generic models. However, the difference between the matrix factorization models was comparatively small. A notable discovery was that there was a decrease in the gap between all methods ability of predicting future purchases, as more predictions are made. The method of predicting a random item fared poorly, correctly predicting cumulatively less than one tenth of any other method. / Denna avhandling syftar till att fastställa matrisfaktoriseringens förmåga att förutsäga framtida köp. Matrisfaktorisering är en maskininlärningsmethod som vanligen används för att implementera rekommendationssystemet för kollaborativ filtrering. Den hittar artiklar som en användare kan vara intresserad av genom att jämföra artiklar som liknande användare har betygsatt högt, uttryckligen eller implicit. För att uppfylla avhandlingens syfte har en kvalitativ och jämförande studie genomförts. Först skapades tre olika matrisfaktoriserings modeler som tränades på ett års köphistorik. Två enkla metoder för att förutsäga framtida köp, att välja ett slumpmässigt föremål och välja de mest sålda föremålen, skapades också för att möjliggöra jämförelser. Möjligheten att förutsäga framtida köp fastställdes som andelen korrekta förutsägelser en metod kunde göra. Alla fem metoderna testades sedan med en separat datamängd och resultaten jämfördes. Resultaten visar tydligt att matrisfaktoriseringsmodeller är bättre på att förutsäga framtida köp än de enkla modellerna. Skillnaden mellan matrisfaktoriseringsmodellerna var dock jämförelsevis liten. En anmärkningsvärd upptäckt var att gapet mellan alla metoders förmåga att förutsäga framtida köp minskade, desto fler förutsägelser som gjordes. Metoden att förutsäga ett slumpmässigt objekt presterade dåligt, då kumulativa andelen korrekta förutsägelser var mindre än en tiondel av någon av de andra metoderna. Matrix factorisation machine learning recommendations systems Maskininlärning Matrisfaktorisering Rekommendationssystem Computer and Information Sciences Data- och informationsvetenskap
8	S³niffer : un système de recherche de service basé sur leur description / S3niffer : A text description-based service search system Caicedo-Castro, Isaac 12 May 2015 (has links) Dans cette recherche, nous abordons le problème de le recherche de services qui répondent à des besoins des utilisateurs exprimés sous forme de requête en texte libre. Notre objectif est de résoudre les problèmes qui affectent l'efficacité des modèles de recherche d'information existant lorsqu'ils sont appliqués à la recherche de services dans un corpus rassemblant des descriptions standard de ces services. Ces problèmes sont issus du fait que les descriptions des services sont brèves. En effet, les fournisseurs de services n'utilisent que quelques termes pour décrire les services souhaités. Ainsi, lorsque ces descriptions sont différentes des phrases dans les requêtes ce qui diminue l'efficacité des modèles classiques qui dépendent de traits observables au lieu de traits sémantiques latents du texte. Nous avons adapté une famille de modèles de recherche d'information (IR) dans le but de contribuer à accroître l'efficacité acquise avec les modèles existant concernant la découverte de services. En outre, nous avons mené des expériences systématiques afin de comparer notre famille de modèles IR avec ceux de l'état de l'art portant sur la découverte de service. Des résultats des expériences, nous concluons que notre modèle basé sur l'extension des requêtes via un thésaurus co-occurrence est plus efficace en terme des mesures classiques utilisées en IR que tous les modèles étudiés dans cette recherche. Par conséquent, nous avons mis en place ce modèle dans S3niffer, qui est un moteur de recherche de service basé sur leur description standard. / In this research, we address the problem of retrieving services which fulfil users' need expressed in query in free text. Our goal is to cope the term mismatch problems which affect the effectiveness of service retrieval models applied in prior re- search on text descriptions-based service retrieval models. These problems are caused due to service descriptions are brief. Service providers use few terms to describe desired services, thereby, when these descriptions are different to the sentences in queries, term mismatch problems decrease the effectiveness in classical models which depend on the observable text features instead of the latent semantic features of the text. We have applied a family of Information Retrieval (IR) models for the purpose of contributing to increase the effectiveness acquired with the models applied in prior research on service retrieval. Besides, we have conducted systematic experiments to compare our family of IR models with those used in the state-of-the-art in service discovery. From the outcomes of the experiments, we conclude that our model based on query expansion via a co-occurrence thesaurus outperforms the effectiveness of all the models studied in this research. Therefore, we have implemented this model in S3niffer, which is a text description-based service search engine. Recherche d'information Thésaurus de co-occurrence Factorisation matricielle La science des services Information retrieval Co-occurrence thesaurus Matrix Factorisation Service science 004
9	S³niffer : un système de recherche de service basé sur leur description / S3niffer : A text description-based service search system Caicedo-Castro, Isaac 12 May 2015 (has links) Dans cette recherche, nous abordons le problème de le recherche de services qui répondent à des besoins des utilisateurs exprimés sous forme de requête en texte libre. Notre objectif est de résoudre les problèmes qui affectent l'efficacité des modèles de recherche d'information existant lorsqu'ils sont appliqués à la recherche de services dans un corpus rassemblant des descriptions standard de ces services. Ces problèmes sont issus du fait que les descriptions des services sont brèves. En effet, les fournisseurs de services n'utilisent que quelques termes pour décrire les services souhaités. Ainsi, lorsque ces descriptions sont différentes des phrases dans les requêtes ce qui diminue l'efficacité des modèles classiques qui dépendent de traits observables au lieu de traits sémantiques latents du texte. Nous avons adapté une famille de modèles de recherche d'information (IR) dans le but de contribuer à accroître l'efficacité acquise avec les modèles existant concernant la découverte de services. En outre, nous avons mené des expériences systématiques afin de comparer notre famille de modèles IR avec ceux de l'état de l'art portant sur la découverte de service. Des résultats des expériences, nous concluons que notre modèle basé sur l'extension des requêtes via un thésaurus co-occurrence est plus efficace en terme des mesures classiques utilisées en IR que tous les modèles étudiés dans cette recherche. Par conséquent, nous avons mis en place ce modèle dans S3niffer, qui est un moteur de recherche de service basé sur leur description standard. / In this research, we address the problem of retrieving services which fulfil users' need expressed in query in free text. Our goal is to cope the term mismatch problems which affect the effectiveness of service retrieval models applied in prior re- search on text descriptions-based service retrieval models. These problems are caused due to service descriptions are brief. Service providers use few terms to describe desired services, thereby, when these descriptions are different to the sentences in queries, term mismatch problems decrease the effectiveness in classical models which depend on the observable text features instead of the latent semantic features of the text. We have applied a family of Information Retrieval (IR) models for the purpose of contributing to increase the effectiveness acquired with the models applied in prior research on service retrieval. Besides, we have conducted systematic experiments to compare our family of IR models with those used in the state-of-the-art in service discovery. From the outcomes of the experiments, we conclude that our model based on query expansion via a co-occurrence thesaurus outperforms the effectiveness of all the models studied in this research. Therefore, we have implemented this model in S3niffer, which is a text description-based service search engine. Recherche d'information Thésaurus de co-occurrence Factorisation matricielle La science des services Information retrieval Co-occurrence thesaurus Matrix Factorisation Service science 004
10	On the QR Decomposition of H-Matrices Benner, Peter, Mach, Thomas 28 August 2009 (has links) The hierarchical (<i>H-</i>) matrix format allows storing a variety of dense matrices from certain applications in a special data-sparse way with linear-polylogarithmic complexity. Many operations from linear algebra like matrix-matrix and matrix-vector products, matrix inversion and LU decomposition can be implemented efficiently using the <i>H</i>-matrix format. Due to its importance in solving many problems in numerical linear algebra like least-squares problems, it is also desirable to have an efficient QR decomposition of <i>H</i>-matrices. In the past, two different approaches for this task have been suggested. We will review the resulting methods and suggest a new algorithm to compute the QR decomposition of an <i>H</i>-matrix. Like other <i>H</i>-arithmetic operations the <i>H</i>QR decomposition is of linear-polylogarithmic complexity. We will compare our new algorithm with the older ones by using two series of test examples and discuss benefits and drawbacks of the new approach. info:eu-repo/classification/ddc/510 ddc:510 Hierarchische Matrix Orthogonalisierung HQR decomposition QR decomposition least squares problem matrix factorisation

Search results