Global ETD Search

331	Computational Gene Expression Deconvolution Otto, Dominik 23 August 2021 (has links) Technologies such as micro-expression arrays and high-throughput sequenc- ing assays have accelerated research of genetic transcription in biological cells. Furthermore, many links between the gene expression levels and the pheno- typic characteristics of cells have been discovered. Our current understanding of transcriptomics as an intermediate regulatory layer between genomics and proteomics raises hope that we will soon be able to decipher many more cel- lular mechanisms through the exploration of gene transcription. However, although large amounts of expression data are measured, only lim- ited information can be extracted. One general problem is the large set of considered genomic features. Expression levels are often analyzed individually because of limited computational resources and unknown statistical dependen- cies among the features. This leads to multiple testing issues or can lead to overfitting models, commonly referred to as the “curse of dimensionality.” Another problem can arise from ignorance of measurement uncertainty. In particular, approaches that consider statistical significance can suffer from underestimating uncertainty for weakly expressed genes and consequently re- quire subjective manual measures to produce consistent results (e.g., domain- specific gene filters). In this thesis, we lay out a theoretical foundation for a Bayesian interpretation of gene expression data based on subtle assumptions. Expression measure- ments are related to latent information (e.g., the transcriptome composition), which we formulate as a probability distribution that represents the uncer- tainty over the composition of the original sample. Instead of analyzing univariate gene expression levels, we use the multivari- ate transcriptome composition space. To realize computational feasibility, we develop a scalable dimensional reduction that aims to produce the best approximation that can be used with the computational resources available. To enable the deconvolution of gene expression, we describe subtissue specific probability distributions of expression profiles. We demonstrate the suitabil- ity of our approach with two deconvolution applications: first, we infer the composition of immune cells, and second we reconstruct tumor-specific ex- pression patterns from bulk-RNA-seq data of prostate tumor tissue samples.:1 Introduction 1 1.1 State of the Art and Motivation 2 1.2 Scope of this Thesis 5 2 Notation and Abbreviations 7 2.1 Notations 7 2.2 Abbreviations 9 3 Methods 10 3.1 The Convolution Assumption 10 3.2 Principal Component Analysis 11 3.3 Expression Patterns 11 3.4 Bayes’ Theorem 12 3.5 Inference Algorithms 13 3.5.1 Inference Through Sampling 13 3.5.2 Variationa lInference 14 4 Prior and Conditional Probabilities 16 4.1 Mixture Coefficients 16 4.2 Distribution of Tumor Cell Content 18 4.2.1 Optimal Tumor Cell Content Drawing 20 4.3 Transcriptome Composition Distribution 21 4.3.1 Sequencing Read Distribution 21 4.3.1.1 Empirical Plausibility Investigation 25 4.3.2 Dirichletand Normality 29 4.3.3 Theta◦logTransformation 29 4.3.4 Variance Stabilization 32 4.4 Cell and Tissue-Type-Specific Expression Pattern Distributions 32 4.4.1 Method of Moments and Factor Analysis 33 4.4.1.1 Tumor Free Cells 33 4.4.1.2 Tumor Cells 34 4.4.2 Characteristic Function 34 4.4.3 Gaussian Mixture Model 37 4.5 Prior Covariance Matrix Distribution 37 4.6 Bayesian Survival Analysis 38 4.7 Demarcation from Existing Methods 40 4.7.1 Negative Binomial Distribution 40 4.7.2 Steady State Assumption 41 4.7.3 Partial Correlation 41 4.7.4 Interaction Networks 42 5 Feasibility via Dimensional Reduction 43 5.1 DR for Deconvolution of Expression Patterns 44 5.1.1 Systematically Differential Expression 45 5.1.2 Internal Distortion 46 5.1.3 Choosinga DR 46 5.1.4 Testing the DR 47 5.2 Transformed Density Functions 49 5.3 Probability Distribution of Mixtures in DR Space 50 5.3.1 Likelihood Gradient 51 5.3.2 The Theorem 52 5.3.3 Implementation 52 5.4 DR for Inference of Cell Composition 53 5.4.1 Problem Formalization 53 5.4.2 Naive PCA 54 5.4.3 Whitening 55 5.4.3.1 Covariance Inflation 56 5.4.4 DR Through Optimization 56 5.4.4.1 Starting Point 57 5.4.4.2 The Optimization Process 58 5.4.5 Results 59 5.5 Interpretation of DR 61 5.6 Comparison to Other DRs 62 5.6.1 Weighted Correlation Network Analysis 62 5.6.2 t-Distributed Stochastic Neighbor Embedding 65 5.6.3 Diffusion Map 66 5.6.4 Non-negativeMatrix Factorization 66 5.7 Conclusion 67 6 Data for Example Application 68 6.1 Immune Cell Data 68 6.1.1 Provided List of Publicly Available Data 68 6.1.2 Obtaining the Publicly Available RNA-seq Data 69 6.1.3 Obtaining the Publicly Available Expression Microarray Data 71 6.1.4 Data Sanitization 71 6.1.4.1 A Tagging Tool 72 6.1.4.2 Tagging Results 73 6.1.4.3 Automatic Sanitization 74 6.1.5 Data Unification 75 6.1.5.1 Feature Mapping 76 6.1.5.2 Feature Selection 76 6.2 Examples of Mixtures with Gold Standard 79 6.2.1 Expression Microarray Data 81 6.2.2 Normalized Expression 81 6.2.3 Composition of the Gold Standard 82 6.3 Tumor Expression Data 82 6.3.1 Tumor Content 82 6.4 Benchmark Reference Study 83 6.4.1 Methodology 83 6.4.2 Reproduction 84 6.4.3 Reference Hazard Model 85 7 Bayesian Models in Example Applications 87 7.1 Inference of Cell Composition 87 7.1.1 The Expression Pattern Distributions (EPDs) 88 7.1.2 The Complete Model 89 7.1.3 Start Values 89 7.1.4 Resource Limits 90 7.2 Deconvolution of Expression Patterns 91 7.2.1 The Distribution of Expression Pattern Distribution 91 7.2.2 The Complete Model 92 7.2.3 SingleSampleDeconvolution 93 7.2.4 A Simplification 94 7.2.5 Start Values 94 8 Results of Example Applications 96 8.1 Inference of Cell Composition 96 8.1.1 Single Composition Output 96 8.1.2 ELBO Convergence in Variational Inference 97 8.1.3 Difficulty-Divergence 97 8.1.3.1 Implementing an Alternative Stick-Breaking 98 8.1.3.2 Using MoreGeneral Inference Methods 99 8.1.3.3 UsingBetterData 100 8.1.3.4 Restriction of Variance of Cell-Type-Specific EPDs 100 8.1.3.5 Doing Fewer Iterations 100 8.1.4 Difficulty-Bias 101 8.1.5 Comparison to Gold Standard 101 8.1.6 Comparison to Competitors 101 8.1.6.1 Submission-Aginome-XMU 105 8.1.6.2 Submission-Biogem 105 8.1.6.3 Submission-DA505 105 8.1.6.4 Submission-AboensisIV 105 8.1.6.5 Submission-mittenTDC19 106 8.1.6.6 Submission-CancerDecon 106 8.1.6.7 Submission-CCB 106 8.1.6.8 Submission-D3Team 106 8.1.6.9 Submission-ICTD 106 8.1.6.10 Submission-Patrick 107 8.1.6.11 Conclusion for the Competitor Review 107 8.1.7 Implementation 107 8.1.8 Conclusion 108 8.2 Deconvolution of Expression Patterns 108 8.2.1 Difficulty-Multimodality 109 8.2.1.1 Order of Kernels 109 8.2.1.2 Posterior EPD Complexity 110 8.2.1.3 Tumor Cell Content Estimate 110 8.2.2 Difficulty-Time 110 8.2.3 The Inference Process 111 8.2.3.1 ELBO Convergence in Variational Inference 111 8.2.4 Posterior of Tumor Cell Content 112 8.2.5 Posterior of Tissue Specific Expression 112 8.2.6 PosteriorHazardModel 113 8.2.7 Gene Marker Study with Deconvoluted Tumor Expression 115 8.2.8 Hazard Model Comparison Overview 116 8.2.9 Implementation 116 9 Discussion 117 9.1 Limitations 117 9.1.1 Simplifying Assumptions 117 9.1.2 Computation Resources 118 9.1.3 Limited Data and Suboptimal Format 118 9.1.4 ItIsJustConsistency 119 9.1.5 ADVI Uncertainty Estimation 119 9.2 Outlook 119 9.3 Conclusion 121 A Appendix 123 A.1 Optimalα 123 A.2 Digamma Function and Logarithm 123 A.3 Common Normalization 124 A.3.1 CPMNormalization 124 A.3.2 TPMNormalization 124 A.3.3 VSTNormalization 125 A.3.4 PCA After Different Normalizations 125 A.4 Mixture Prior Per Tissue Source 125 A.5 Data 125 A.6 Cell Type Characterization without Whitening 133 B Proofs 137 Bibliography 140 info:eu-repo/classification/ddc/000 ddc:000
332	A Lagrangian Meshfree Simulation Framework for Additive Manufacturing of Metals Fan, Zongyue 21 June 2021 (has links) No description available. Mechanical Engineering
333	Branching Processes: Optimization, Variational Characterization, and Continuous Approximation Wang, Ying 27 October 2010 (has links) In this thesis, we use multitype Galton-Watson branching processes in random environments as individual-based models for the evolution of structured populations with both demographic stochasticity and environmental stochasticity, and investigate the phenotype allocation problem. We explore a variational characterization for the stochastic evolution of a structured population modeled by a multitype Galton-Watson branching process. When the population under consideration is large and the time scale is fast, we deduce the continuous approximation for multitype Markov branching processes in random environments. Many problems in evolutionary biology involve the allocation of some limited resource among several investments. It is often of interest to know whether, and how, allocation strategies can be optimized for the evolution of a structured population with randomness. In our work, the investments represent different types of offspring, or alternative strategies for allocations to offspring. As payoffs we consider the long-term growth rate, the expected number of descendants with some future discount factor, the extinction probability of the lineage, or the expected survival time. Two different kinds of population randomness are considered: demographic stochasticity and environmental stochasticity. In chapter 2, we solve the allocation problem w.r.t. the above payoff functions in three stochastic population models depending on different kinds of population randomness. Evolution is often understood as an optimization problem, and there is a long tradition to look at evolutionary models from a variational perspective. In chapter 3, we deduce a variational characterization for the stochastic evolution of a structured population modeled by a multitype Galton-Watson branching process. In particular, the so-called retrospective process plays an important role in the description of the equilibrium state used in the variational characterization. We define the retrospective process associated with a multitype Galton-Watson branching process and identify it with the mutation process describing the type evolution along typical lineages of the multitype Galton-Watson branching process. Continuous approximation of branching processes is of both practical and theoretical interest. However, to our knowledge, there is no literature on approximation of multitype branching processes in random environments. In chapter 4, we firstly construct a multitype Markov branching process in a random environment. When conditioned on the random environment, we deduce the Kolmogorov equations and the mean matrix for the conditioned branching process. Then we introduce a parallel mutation-selection Markov branching process in a random environment and analyze its instability property. Finally, we deduce a weak convergence result for a sequence of the parallel Markov branching processes in random environments and give examples for applications. info:eu-repo/classification/ddc/500 ddc:500
334	Multivariate analysis of the parameters in a handwritten digit recognition LSTM system / Multivariat analys av parametrarna i ett LSTM-system för igenkänning av handskrivna siffror Zervakis, Georgios January 2019 (has links) Throughout this project, we perform a multivariate analysis of the parameters of a long short-term memory (LSTM) system for handwritten digit recognition in order to understand the model’s behaviour. In particular, we are interested in explaining how this behaviour precipitate from its parameters, and what in the network is responsible for the model arriving at a certain decision. This problem is often referred to as the interpretability problem, and falls under scope of Explainable AI (XAI). The motivation is to make AI systems more transparent, so that we can establish trust between humans. For this purpose, we make use of the MNIST dataset, which has been successfully used in the past for tackling digit recognition problem. Moreover, the balance and the simplicity of the data makes it an appropriate dataset for carrying out this research. We start by investigating the linear output layer of the LSTM, which is directly associated with the models’ predictions. The analysis includes several experiments, where we apply various methods from linear algebra such as principal component analysis (PCA) and singular value decomposition (SVD), to interpret the parameters of the network. For example, we experiment with different setups of low-rank approximations of the weight output matrix, in order to see the importance of each singular vector for each class of the digits. We found out that cutting off the fifth left and right singular vectors the model practically losses its ability to predict eights. Finally, we present a framework for analysing the parameters of the hidden layer, along with our implementation of an LSTM based variational autoencoder that serves this purpose. / I det här projektet utför vi en multivariatanalys av parametrarna för ett long short-term memory system (LSTM) för igenkänning av handskrivna siffror för att förstå modellens beteende. Vi är särskilt intresserade av att förklara hur detta uppträdande kommer ur parametrarna, och vad i nätverket som ligger bakom den modell som kommer fram till ett visst beslut. Detta problem kallas ofta för interpretability problem och omfattas av förklarlig AI (XAI). Motiveringen är att göra AI-systemen öppnare, så att vi kan skapa förtroende mellan människor. I detta syfte använder vi MNIST-datamängden, som tidigare framgångsrikt har använts för att ta itu med problemet med igenkänning av siffror. Dessutom gör balansen och enkelheten i uppgifterna det till en lämplig uppsättning uppgifter för att utföra denna forskning. Vi börjar med att undersöka det linjära utdatalagret i LSTM, som är direkt kopplat till modellernas förutsägelser. Analysen omfattar flera experiment, där vi använder olika metoder från linjär algebra, som principalkomponentanalys (PCA) och singulärvärdesfaktorisering (SVD), för att tolka nätverkets parametrar. Vi experimenterar till exempel med olika uppsättningar av lågrangordnade approximationer av viktutmatrisen för att se vikten av varje enskild vektor för varje klass av siffrorna. Vi upptäckte att om man skär av den femte vänster och högervektorn förlorar modellen praktiskt taget sin förmåga att förutsäga siffran åtta. Slutligen lägger vi fram ett ramverk för analys av parametrarna för det dolda lagret, tillsammans med vårt genomförande av en LSTM-baserad variational autoencoder som tjänar detta syfte. Deep Learning Interpretability Handwritten Digit Recognition MNIST Recurrent Neural Networks PCA SVD Variational Autoencoders Computer and Information Sciences Data- och informationsvetenskap
335	Some Mathematical Reasoning on the Artificial Force Induced Reaction Method Quapp, Wolfgang, Bofill, Josep Maria 03 July 2023 (has links) There are works of the Maeda–Morokuma group, which propose the artificial force induced reaction (AFIR) method (Maeda et al., J. Comput. Chem. 2014, 35, 166 and 2018, 39, 233). We study this important method from a theoretical point of view. The understanding of the proposers does not use the barrier breakdown point of the AFIR parameter, which usually is half of the reaction path between the minimum and the transition state which is searched for. Based on a comparison with the theory of Newton trajectories, we could better understand the method. It allows us to follow along some reaction pathways from minimum to saddle point, or vice versa. We discuss some well-known two-dimensional test surfaces where we calculate full AFIR pathways. If one has special AFIR curves at hand, one can also study the behavior of the ansatz. © 2019 The Authors. Journal of Computational Chemistry published by Wiley Periodicals, Inc. info:eu-repo/classification/ddc/510 ddc:510
336	Modelling user interaction at scale with deep generative methods / Storskalig modellering av användarinteraktion med djupa generativa metoder Ionascu, Beatrice January 2018 (has links) Understanding how users interact with a company's service is essential for data-driven businesses that want to better cater to their users and improve their offering. By using a generative machine learning approach it is possible to model user behaviour and generate new data to simulate or recognize and explain typical usage patterns. In this work we introduce an approach for modelling users' interaction behaviour at scale in a client-service model. We propose a novel representation of multivariate time-series data as time pictures that express temporal correlations through spatial organization. This representation shares two key properties that convolutional networks have been built to exploit and allows us to develop an approach based on deep generative models that use convolutional networks as backbone. In introducing this approach of feature learning for time-series data, we expand the application of convolutional neural networks in the multivariate time-series domain, and specifically user interaction data. We adopt a variational approach inspired by the β-VAE framework in order to learn hidden factors that define different user behaviour patterns. We explore different values for the regularization parameter β and show that it is possible to construct a model that learns a latent representation of identifiable and different user behaviours. We show on real-world data that the model generates realistic samples, that capture the true population-level statistics of the interaction behaviour data, learns different user behaviours, and provides accurate imputations of missing data. / Förståelse för hur användare interagerar med ett företags tjänst är essentiell för data-drivna affärsverksamheter med ambitioner om att bättre tillgodose dess användare och att förbättra deras utbud. Generativ maskininlärning möjliggör modellering av användarbeteende och genererande av ny data i syfte att simulera eller identifiera och förklara typiska användarmönster. I detta arbete introducerar vi ett tillvägagångssätt för storskalig modellering av användarinteraktion i en klientservice-modell. Vi föreslår en ny representation av multivariat tidsseriedata i form av tidsbilder vilka representerar temporala korrelationer via spatial organisering. Denna representation delar två nyckelegenskaper som faltningsnätverk har utvecklats för att exploatera, vilket tillåter oss att utveckla ett tillvägagångssätt baserat på på djupa generativa modeller som bygger på faltningsnätverk. Genom att introducera detta tillvägagångssätt för tidsseriedata expanderar vi applicering av faltningsnätverk inom domänen för multivariat tidsserie, specifikt för användarinteraktionsdata. Vi använder ett tillvägagångssätt inspirerat av ramverket β-VAE i syfte att lära modellen gömda faktorer som definierar olika användarmönster. Vi utforskar olika värden för regulariseringsparametern β och visar att det är möjligt att konstruera en modell som lär sig en latent representation av identifierbara och multipla användarbeteenden. Vi visar med verklig data att modellen genererar realistiska exempel vilka i sin tur fångar statistiken på populationsnivå hos användarinteraktionsdatan, samt lär olika användarbeteenden och bidrar med precisa imputationer av saknad data. generative model deep learning variational auto-encoder convolutional neural network time-series data reconstruction Computer Sciences Datavetenskap (datalogi)
337	Generative Models and Feature Extraction on Patient Images and Structure Data in Radiation Therapy / Generativamodeller för patientbilder inom strålterapi Gruselius, Hanna January 2018 (has links) This Master thesis focuses on generative models for medical patient data for radiation therapy. The objective with the project is to implement and investigate the characteristics of a Variational Autoencoder applied to this diverse and versatile data. The questions this thesis aims to answer are: (i) whether the VAE can capture salient features of medical image data, and (ii) if these features can be used to compare similarity between patients. Furthermore, (iii) if the VAE network can successfully reconstruct its input and lastly (iv) if the VAE can generate artificial data having a reasonable anatomical appearance. The experiments carried out conveyed that the VAE is a promising method for feature extraction, since it appeared to ascertain similarity between patient images. Moreover, the reconstruction of training inputs demonstrated that the method is capable of identifying and preserving anatomical details. Regarding the generative abilities, the artificial samples generally conveyed fairly realistic anatomical structures. Future work could be to investigate the VAEs ability to generalize, with respect to both the amount of data and probabilistic considerations as well as probabilistic assumptions. / Fokuset i denna masteruppsats är generativa modeller för patientdata från strålningsbehandling. Syftet med projektet är att implementera och undersöka egenskaperna som en “Variational Autoencoder” (VAE) har på denna typ av mångsidiga och varierade data. Frågorna som ska besvaras är: (i) kan en VAE fånga särdrag hos medicinsk bild-data, och (ii) kan dessa särdrag användas för att jämföra likhet mellan patienter. Därutöver, (iii) kan VAE-nätverket återskapa sin indata väl och slutligen (iv) kan en VAE skapa artificiell data med ett rimligt anatomiskt utseende. De experiment som utfördes pekade på att en VAE kan vara en lovande metod för att extrahera framtydande drag hos patienter, eftersom metoden verkade utröna likheter mellan olika patienters bilder. Dessutom påvisade återskapningen av träningsdata att metoden är kapabel att identifiera och bevara anatomiska detaljer. Vidare uppvisade generellt den artificiellt genererade datan, en realistisk anatomisk struktur. Framtida arbete kan bestå i att undersöka hur väl en VAE kan generalisera, med avseende på både mängd data som krävs och sannolikhetsteorietiska avgränsningar och antaganden. Variational Autoencoder Feature extraction Deep learning on medical images Computed tomography Radiation therapy Probability Theory and Statistics Sannolikhetsteori och statistik
338	Deep Synthetic Noise Generation for RGB-D Data Augmentation Hammond, Patrick Douglas 01 June 2019 (has links) Considerable effort has been devoted to finding reliable methods of correcting noisy RGB-D images captured with unreliable depth-sensing technologies. Supervised neural networks have been shown to be capable of RGB-D image correction, but require copious amounts of carefully-corrected ground-truth data to train effectively. Data collection is laborious and time-intensive, especially for large datasets, and generation of ground-truth training data tends to be subject to human error. It might be possible to train an effective method on a relatively smaller dataset using synthetically damaged depth-data as input to the network, but this requires some understanding of the latent noise distribution of the respective camera. It is possible to augment datasets to a certain degree using naive noise generation, such as random dropout or Gaussian noise, but these tend to generalize poorly to real data. A superior method would imitate real camera noise to damage input depth images realistically so that the network is able to learn to correct the appropriate depth-noise distribution.We propose a novel noise-generating CNN capable of producing realistic noise customized to a variety of different depth-noise distributions. In order to demonstrate the effects of synthetic augmentation, we also contribute a large novel RGB-D dataset captured with the Intel RealSense D415 and D435 depth cameras. This dataset pairs many examples of noisy depth images with automatically completed RGB-D images, which we use as proxy for ground-truth data. We further provide an automated depth-denoising pipeline which may be used to produce proxy ground-truth data for novel datasets. We train a modified sparse-to-dense depth-completion network on splits of varying size from our dataset to determine reasonable baselines for improvement. We determine through these tests that adding more noisy depth frames to each RGB-D image in the training set has a nearly identical impact on depth-completion training as gathering more ground-truth data. We leverage these findings to produce additional synthetic noisy depth images for each RGB-D image in our baseline training sets using our noise-generating CNN. Through use of our augmentation method, it is possible to achieve greater than 50% error reduction on supervised depth-completion training, even for small datasets. RGB-D images depth completion synthetic augmentation deep-generative neural networks variational autoencoders conditional GANs Computer Sciences Physical Sciences and Mathematics
339	Adversarial Deep Neural Networks Effectively Remove Nonlinear Batch Effects from Gene-Expression Data Dayton, Jonathan Bryan 01 July 2019 (has links) Gene-expression profiling enables researchers to quantify transcription levels in cells, thus providing insight into functional mechanisms of diseases and other biological processes. However, because of the high dimensionality of these data and the sensitivity of measuring equipment, expression data often contains unwanted confounding effects that can skew analysis. For example, collecting data in multiple runs causes nontrivial differences in the data (known as batch effects), known covariates that are not of interest to the study may have strong effects, and there may be large systemic effects when integrating multiple expression datasets. Additionally, many of these confounding effects represent higher-order interactions that may not be removable using existing techniques that identify linear patterns. We created Confounded to remove these effects from expression data. Confounded is an adversarial variational autoencoder that removes confounding effects while minimizing the amount of change to the input data. We tested the model on artificially constructed data and commonly used gene expression datasets and compared against other common batch adjustment algorithms. We also applied the model to remove cancer-type-specific signal from a pan-cancer expression dataset. Our software is publicly available at https://github.com/jdayton3/Confounded. batch effects batch correction gene expression transcriptomics deep learning adversarial neural network variational autoencoder Biology Life Sciences
340	Mathematical Structures of Cohomological Field Theories Jiang, Shuhan 29 August 2023 (has links) In this dissertation, we developed a mathematical framework for cohomological field theories (CohFTs) in the language of ``QK-manifolds', which unifies the previous ones in (Baulieu and Singer 1988; Baulieu and Singer 1989; Ouvry, Stora, and Van Baal 1989; Atiyah and Jeffrey 1990; Birmingham et al. 1991; Kalkman 1993; Blau 1993). Within this new framework, we classified the (gauge invariant) solutions to the descent equations in CohFTs (with gauge symmetries). We revisited Witten’s idea of topological twisting and showed that the twisted super-Poincaré algebra gives rise naturally to a ``QK-structure'. We also generalized the Mathai-Quillen construction of the universal Thom class via a variational bicomplex lift of the equivariant cohomology. Our framework enables a uniform treatment of examples like topological quantum mechanics, topological sigma model, and topological Yang-Mills theory. info:eu-repo/classification/ddc/500 ddc:500

Search results