Global ETD Search

11	Statistical and Machine Learning Methods for Precision Medicine Chen, Yuan January 2021 (has links) Heterogeneous treatment responses are commonly observed in patients with mental disorders. Thus, a universal treatment strategy may not be adequate, and tailored treatments adapted to individual characteristics could improve treatment responses. The theme of the dissertation is to develop statistical and machine learning methods to address patients heterogeneity and derive robust and generalizable individualized treatment strategies by integrating evidence from multi-domain data and multiple studies to achieve precision medicine. Unique challenges arising from the research of mental disorders need to be addressed in order to facilitate personalized medical decision-making in clinical practice. This dissertation contains four projects to achieve these goals while addressing the challenges: (i) a statistical method to learn dynamic treatment regimes (DTRs) by synthesizing independent trials over different stages when sequential randomization data is not available; (ii) a statistical method to learn optimal individualized treatment rules (ITRs) for mental disorders by modeling patients' latent mental states using probabilistic generative models; (iii) an integrative learning algorithm to incorporate multi-domain and multi-treatment-phase measures for optimizing individualized treatments; (iv) a statistical machine learning method to optimize ITRs that can benefit subjects in a target population for mental disorders with improved learning efficiency and generalizability. DTRs adaptively prescribe treatments based on patients' intermediate responses and evolving health status over multiple treatment stages. Data from sequential multiple assignment randomization trials (SMARTs) are recommended to be used for learning DTRs. However, due to the re-randomization of the same patients over multiple treatment stages and a prolonged follow-up period, SMARTs are often difficult to implement and costly to manage, and patient adherence is always a concern in practice. To lessen such practical challenges, in the first part of the dissertation, we propose an alternative approach to learn optimal DTRs by synthesizing independent trials over different stages without using data from SMARTs. Specifically, at each stage, data from a single randomized trial along with patients' natural medical history and health status in previous stages are used. We use a backward learning method to estimate optimal treatment decisions at a particular stage, where patients' future optimal outcome increment is estimated using data observed from independent trials with future stages' information. Under some conditions, we show that the proposed method yields consistent estimation of the optimal DTRs, and we obtain the same learning rates as those from SMARTs. We conduct simulation studies to demonstrate the advantage of the proposed method. Finally, we learn DTRs for treating major depressive disorder (MDD) by stage-wise synthesis of two randomized trials. We perform a validation study on independent subjects and show that the synthesized DTRs lead to the greatest MDD symptom reduction compared to alternative methods. The second part of the dissertation focuses on optimizing individualized treatments for mental disorders. Due to disease complexity, substantial diversity in patients' symptomatology within the same diagnostic category is widely observed. Leveraging the measurement model theory in psychiatry and psychology, we learn patient's intrinsic latent mental status from psychological or clinical symptoms under a probabilistic generative model, restricted Boltzmann machine (RBM), through which patients' heterogeneous symptoms are represented using an economic number of latent variables and yet remains flexible. These latent mental states serve as a better characterization of the underlying disorder status than a simple summary score of the symptoms. They also serve as more reliable and representative features to differentiate treatment responses. We then optimize a value function defined by the latent states after treatment by exploiting a transformation of the observed symptoms based on the RBM without modeling the relationship between the latent mental states before and after treatment. The optimal treatment rules are derived using a weighted large margin classifier. We derive the convergence rate of the proposed estimator under the latent models. Simulation studies are conducted to test the performance of the proposed method. Finally, we apply the developed method to real-world studies. We demonstrate the utility and advantage of our method in tailoring treatments for patients with major depression and identify patient subgroups informative for treatment recommendations. In the third part of the dissertation, based on the general framework introduced in the previous part, we propose an integrated learning algorithm that can simultaneously learn patients' underlying mental states and recommend optimal treatments for each individual with improved learning efficiency. It allows incorporation of both the pre- and post-treatment outcomes in learning the invariant latent structure and allows integration of outcome measures from different domains to characterize patients' mental health more comprehensively. A multi-layer neural network is used to allow complex treatment effect heterogeneity. Optimal treatment policy can be inferred for future patients by comparing their potential mental states under different treatments given the observed multi-domain pre-treatment measurements. Experiments on simulated data and real-world clinical trial data show that the learned treatment polices compare favorably to alternative methods on heterogeneous treatment effects and have broad utilities which lead to better patient outcomes on multiple domains. The fourth part of the dissertation aims to infer optimal treatments of mental disorders for a target population considering the potential distribution disparities between the patient data in a study we collect and the target population of interest. To achieve that, we propose a learning approach that connects measurement theory, efficient weighting procedure, and flexible neural network architecture through latent variables. In our method, patients' underlying mental states are represented by a reduced number of latent state variables allowing for incorporating domain knowledge, and the invariant latent structure is preserved for interpretability and validity. Subject-specific weights to balance population differences are constructed using these compact latent variables, which capture the major variations and facilitate the weighting procedure due to the reduced dimensionality. Data from multiple studies can be integrated to learn the latent structure to improve learning efficiency and generalizability. Extensive simulation studies demonstrate consistent superiority of the proposed method and the weighting scheme to alternative methods when applying to the target population. Application of our method to real-world studies is conducted to recommend treatments to patients with major depressive disorder and has shown a broader utility of the ITRs learned from the proposed method in improving the mental states of patients in the target population. Biometry Machine learning--Statistical methods Mental illness--Treatment Depression, Mental--Treatment Medical care--Decision making
12	Feature Selection for High Dimensional Causal Inference Lu, Rui January 2020 (has links) Selecting an appropriate set for confounding control is essential for causal inference. The strong ignorability is a strong assumption. With observational data, researchers are unsure the strong ignorability assumption holds. To reduce the possibility of the bias caused by unmeasured confounders, one solution is to include the widest range of pre-treatment covariates, which has been demonstrated to be problematic. Subjective knowledge-based covariate screening is a common approach that has been applied widely. However, under high dimensional settings, it becomes difficult for domain experts to screen thousands of covariates. Machine learning based automatic causal estimation makes it possible for high dimensional causal estimation. While the theoretical properties of these techniques are desirable, they are only necessarily applicable asymptotically (i.e., requiring large sample sizes to be guaranteed to hold), and their performance in smaller samples is sometimes less clear. Data-based pre-processing approaches may fill this gap. Nevertheless, there is no clear guidance on when and how covariate selection should be involved in high dimensional causal estimation. In this dissertation, I address the above issues by (a) providing a classification scheme for major causal covariate selections methods (b) extending causal covariate selection framework (c) conducting a comprehensive empirical Monte Carlo simulation study to illustrate theoretical properties of causal covariate selection and estimation methods, and (d) following-up with a case study to compare different covariate selection approaches in a real data testing ground. Under small sample and/or high dimensional settings, study results indicate choosing an appropriate covariate selection method as pre-processing tool is necessary for causal estimation. Under relatively large sample and low dimensional settings, covariate selection is not necessary for machine learning based automatic causal estimation. Careful pre-processing guided by subjective knowledge is essential. Statistics Educational evaluation Education and state School management and organization Machine learning--Statistical methods Inference--Mathematical models
13	Probabilistic Programming for Deep Learning Tran, Dustin January 2020 (has links) We propose the idea of deep probabilistic programming, a synthesis of advances for systems at the intersection of probabilistic modeling and deep learning. Such systems enable the development of new probabilistic models and inference algorithms that would otherwise be impossible: enabling unprecedented scales to billions of parameters, distributed and mixed precision environments, and AI accelerators; integration with neural architectures for modeling massive and high-dimensional datasets; and the use of computation graphs for automatic differentiation and arbitrary manipulation of probabilistic programs for flexible inference and model criticism. After describing deep probabilistic programming, we discuss applications in novel variational inference algorithms and deep probabilistic models. First, we introduce the variational Gaussian process (VGP), a Bayesian nonparametric variational family, which adapts its shape to match complex posterior distributions. The VGP generates approximate posterior samples by generating latent inputs and warping them through random non-linear mappings; the distribution over random mappings is learned during inference, enabling the transformed outputs to adapt to varying complexity of the true posterior. Second, we introduce hierarchical implicit models (HIMs). HIMs combine the idea of implicit densities with hierarchical Bayesian modeling, thereby defining models via simulators of data with rich hidden structure. Artificial intelligence Statistics Machine learning--Statistical methods Computer programming
14	Optimization Foundations of Reinforcement Learning Bhandari, Jalaj January 2020 (has links) Reinforcement learning (RL) has attracted rapidly increasing interest in the machine learning and artificial intelligence communities in the past decade. With tremendous success already demonstrated for Game AI, RL offers great potential for applications in more complex, real world domains, for example in robotics, autonomous driving and even drug discovery. Although researchers have devoted a lot of engineering effort to deploy RL methods at scale, many state-of-the art RL techniques still seem mysterious - with limited theoretical guarantees on their behaviour in practice. In this thesis, we focus on understanding convergence guarantees for two key ideas in reinforcement learning, namely Temporal difference learning and policy gradient methods, from an optimization perspective. In Chapter 2, we provide a simple and explicit finite time analysis of Temporal difference (TD) learning with linear function approximation. Except for a few key insights, our analysis mirrors standard techniques for analyzing stochastic gradient descent algorithms, and therefore inherits the simplicity and elegance of that literature. Our convergence results extend seamlessly to the study of TD learning with eligibility traces, known as TD(λ), and to Q-learning for a class of high-dimensional optimal stopping problems. In Chapter 3, we turn our attention to policy gradient methods and present a simple and general understanding of their global convergence properties. The main challenge here is that even for simple control problems, policy gradient algorithms face non-convex optimization problems and are widely understood to converge only to a stationary point of the objective. We identify structural properties -- shared by finite MDPs and several classic control problems -- which guarantee that despite non-convexity, any stationary point of the policy gradient objective is globally optimal. In the final chapter, we extend our analysis for finite MDPs to show linear convergence guarantees for many popular variants of policy gradient methods like projected policy gradient, Frank-Wolfe, mirror descent and natural policy gradients. Operations research Computer science Statistics Machine learning--Statistical methods Reinforcement learning
15	Partition-based Model Representation Learning Hsu, Yayun January 2020 (has links) Modern machine learning consists of both task forces from classical statistics and modern computation. On the one hand, this field becomes rich and quick-growing; on the other hand, different convention from different schools becomes harder and harder to communicate over time. A lot of the times, the problem is not about who is absolutely right or wrong, but about from which angle that one should approach the problem. This is the moment when we feel there should be a unifying machine learning framework that can withhold different schools under the same umbrella. So we propose one of such a framework and call it ``representation learning''. Representations are for the data, which is almost identical to a statistical model. However, philosophically, we would like to distinguish from classical statistical modeling such that (1) representations are interpretable to the scientist, (2) representations convey the pre-existing subject view that the scientist has towards his/her data before seeing it (in other words, representations may not align with the true data generating process), and (3) representations are task-oriented. To build such a representation, we propose to use partition-based models. Partition-based models are easy to interpret and useful for figuring out the interactions between variables. However, the major challenge lies in the computation, since the partition numbers can grow exponentially with respect to the number of variables. To solve the problem, we need a model/representation selection method over different partition models. We proposed to use I-Score with backward dropping algorithm to achieve the goal. In this work, we explore the connection between the I-Score variable selection methodology to other existing methods and extend the idea into developing other objective functions that can be used in other applications. We apply our ideas to analyze three datasets, one is the genome-wide association study (GWAS), one is the New York City Vision Zero, and, lastly, the MNIST handwritten digit database. On these applications, we showed the potential of the interpretability of the representations can be useful in practice and provide practitioners with much more intuitions in explaining their results. Also, we showed a novel way to look at causal inference problems from the view of partition-based models. We hope this work serve as an initiative for people to start thinking about approaching problems from a different angle and to involve interpretability into the consideration when building a model so that it can be easier to be used to communicate with people from other fields. Statistics Inference--Data processing Statistics--Methodology Statistics--Models Machine learning--Statistical methods
16	Algorithm Design and Localization Analysis in Sequential and Statistical Learning Xu, Yunbei January 2023 (has links) Learning theory is a dynamic and rapidly evolving field that aims to provide mathematical foundations for designing and understanding the behavior of algorithms and procedures that can learn from data automatically. At the heart of this field lies the interplay between algorithm design and statistical complexity analysis, with sharp statistical complexity characterizations often requiring localization analysis. This dissertation aims to advance the fields of machine learning and decision making by contributing to two key directions: principled algorithm design and localized statistical complexity. Our research develops novel algorithmic techniques and analytical frameworks to build more effective and robust learning systems. Specifically, we focus on studying uniform convergence and localization in statistical learning theory, developing efficient algorithms using the optimism principle for contextual bandits, and creating Bayesian design principles for bandit and reinforcement learning problems. Mathematics Operations research Computer science Machine learning--Statistical methods Bayesian statistical decision theory Algorithms--Design
17	Overlapping Communities on Large-Scale Networks: Benchmark Generation and Learning via Adaptive Stochastic Optimization Grande, Alessandro Antonio January 2022 (has links) This dissertation builds on two lines of research that are related to the task of community detection on large-scale network data. Our first contribution is a novel generator for large-scale networks with overlapping communities. Synthetic generators are essential for algorithm testing and simulation studies for networks, as these data are scarce and constantly evolving. We propose a generator based on a flexible random graph model that allows for the control of two complementary measures of centrality -- the degree centrality and the eigencentrality. For an arbitrary centrality target and community structure, we study the problem of recovering the model parameters that enforce such targets in expectation. We find that this problem always admits a solution in the parameter space, which is also unique for large graphs. We propose to recover this solution via a properly initialized multivariate-Newton Raphson algorithm. The resulting benchmark generator is able to simulate networks with a billion edges and hundreds of millions of nodes in 30 seconds, while reproducing a wide spectrum of network topologies -- including assortative mixing and power-law centrality distributions. Our second contribution involves variance reduction techniques for stochastic variational inference (SVI). SVI scales approximate inference to large-scale data -- including massive networks -- via stochastic optimization. SVI is efficient because, at each iteration, it only uses a random minibatch of the data to produce a noisy estimate of the gradient. However, such estimates can suffer from high variance, which slows down convergence. One strategy to reduce the variance of the gradient is to use importance sampling, biasing the distribution of data for each minibatch towards the data points that are most influential to the inference at hand. Here, we develop an importance sampling strategy for SVI. Our adaptive stochastic variational inference algorithm (AdaSVI) reweights the sampling distribution to minimize the variance of the stochastic natural gradient. We couple the importance sampling strategy with an adaptive learning rate providing a parameter-free stochastic optimization algorithm where the only user input required is the minibatch size. We study AdaSVI on a matrix factorization model and find that it significantly improves SVI, leading to faster convergence on synthetic data. Statistics Mathematics Stochastic processes Mathematical optimization Computer networks Machine learning--Statistical methods
18	Identifying Patterns in Behavioral Public Health Data Using Mixture Modeling with an Informative Number of Repeated Measures Yu, Gary January 2014 (has links) Finite mixture modeling is a useful statistical technique for clustering individuals based on patterns of responses. The fundamental idea of the mixture modeling approach is to assume there are latent clusters of individuals in the population which each generate their own distinct distribution of observations (multivariate or univariate) which are then mixed up together in the full population. Hence, the name mixture comes from the fact that what we observe is a mixture of distributions. The goal of this model-based clustering technique is to identify what the mixture of distributions is so that, given a particular response pattern, individuals can be clustered accordingly. Commonly, finite mixture models, as well as the special case of latent class analysis, are used on data that inherently involve repeated measures. The purpose of this dissertation is to extend the finite mixture model to allow for the number of repeated measures to be incorporated and contribute to the clustering of individuals rather than measures. The dimension of the repeated measures or simply the count of responses is assumed to follow a truncated Poisson distribution and this information can be incorporated into what we call a dimension informative finite mixture model (DIMM). The outline of this dissertation is as follows. Paper 1 is entitled, "Dimension Informative Mixture Modeling (DIMM) for questionnaire data with an informative number of repeated measures." This paper describes the type of data structures considered and introduces the dimension informative mixture model (DIMM). A simulation study is performed to examine how well the DIMM fits the known specified truth. In the first scenario, we specify a mixture of three univariate normal distributions with different means and similar variances with different and similar counts of repeated measurements. We found that the DIMM predicts the true underlying class membership better than the traditional finite mixture model using a predicted value metric score. In the second scenario, we specify a mixture of two univariate normal distributions with the same means and variances with different and similar counts of repeated measurements. We found that that the count-informative finite mixture model predicts the truth much better than the non-informative finite mixture model. Paper 2 is entitled, "Patterns of Physical Activity in the Northern Manhattan Study (NOMAS) Using Multivariate Finite Mixture Modeling (MFMM)." This is a study that applies a multivariate finite mixture modeling approach to examining and elucidating underlying latent clusters of different physical activity profiles based on four dimensions: total frequency of activities, average duration per activity, total energy expenditure and the total count of the number of different activities conducted. We found a five cluster solution to describe the complex patterns of physical activity levels, as measured by fifteen different physical activity items, among a US based elderly cohort. Adding in a class of individuals who were not doing any physical activity, the labels of these six clusters are: no exercise, very inactive, somewhat inactive, slightly under guidelines, meet guidelines and above guidelines. This methodology improves upon previous work which utilized only the total metabolic equivalent (a proxy of energy expenditure) to classify individuals into inactive, active and highly active. Paper 3 is entitled, "Complex Drug Use Patterns and Associated HIV Transmission Risk Behaviors in an Internet Sample of US Men Who Have Sex With Men." This is a study that applies the count-informative information into a latent class analysis on nineteen binary drug items of drugs consumed within the past year before a sexual encounter. In addition to the individual drugs used, the mixture model incorporated a count of the total number of drugs used. We found a six class solution: low drug use, some recreational drug use, nitrite inhalants (poppers) with prescription erectile dysfunction (ED) drug use, poppers with prescription/non-prescription ED drug use and high polydrug use. Compared to participants in the low drug use class, participants in the highest drug use class were 5.5 times more likely to report unprotected anal intercourse (UAI) in their last sexual encounter and approximately 4 times more likely to report a new sexually transmitted infection (STI) in the past year. Younger men were also less likely to report UAI than older men but more likely to report an STI. Biometry Public health--Research--Methodology Cluster analysis Discriminant analysis Machine learning--Statistical methods
19	Statistical and Machine Learning Methods for Pattern Identification in Environmental Mixtures Gibson, Elizabeth Atkeson January 2021 (has links) Background: Statistical and machine learning techniques are now being incorporated into high-dimensional mixture research to overcome issues with traditional methods. Though some methods perform well on specific tasks, no method consistently outperforms all others in complex mixture analyses, largely because different methods were developed to answer different research questions. The research presented here concentrates on answering a single mixtures question: Are there exposure patterns within a mixture corresponding with sources or behaviors that give rise to exposure? Objective: This dissertation details work to design, adapt, and apply pattern recognition methods to environmental mixtures and introduces two methods adapted to specific challenges of environmental health data, (1) Principal Component Pursuit (PCP) and (2) Bayesian non-parametric non-negative matrix factorization (BN²MF). We build on this work to characterize the relationship between identified patterns of in utero endocrine disrupting chemical (EDC) exposure and child neurodevelopment. Methods: PCP---a dimensionality reduction technique in computer vision---decomposes the exposure mixture into a low-rank matrix of consistent patterns and a sparse matrix of unique or extreme exposure events. We incorporated two existing PCP extensions that suit environmental data, (1) a non-convex rank penalty, and (2) a formulation that removes the need for parameter tuning. We further adapted PCP to accommodate environmental mixtures by including (1) a non-negativity constraint, (2) a modified algorithm to allow for missing values, and (3) a separate penalty for measurements below the limit of detection (PCP-LOD). BN²MF decomposes the exposure mixture into three parts, (1) a matrix of chemical loadings on identified patterns, (2) a matrix of individual scores on identified patterns, and (3) and diagonal matrix of pattern weights. It places non-negative continuous priors on pattern loadings, weights, and individual scores and uses a non-parametric sparse prior on the pattern weights to estimate the optimal number. We extended BN²MF to explicitly account for uncertainty in identified patterns by estimating the full distribution of scores and loadings. To test both methods, we simulated data to represent environmental mixtures with various structures, altering the level of complexity in the patterns, the noise level, the number of patterns, the size of the mixture, and the sample size. We evaluated PCP-LOD's performance against principal component analysis (PCA), and we evaluated BN²MF's performance against PCA, factor analysis, and frequentist nonnegative matrix factorization (NMF). For all methods, we compared their solutions with true simulated values to measure performance. We further assessed BN²MF's coverage of true simulated scores. We applied PCP-LOD to an exposure mixture of 21 persistent organic pollutants (POPs) measured in 1,000 U.S. adults from the 2001--2002 National Health and Nutrition Examination Survey (NHANES). We applied BN²MF to an exposure mixture of 17 EDCs measured in 343 pregnant women in the Columbia Center for Children’s Environmental Health's Mothers and Newborns Cohort. Finally, we designed a two-stage Bayesian hierarchical model to estimate health effects of environmental exposure patterns while incorporating the uncertainty of pattern identification. In the first stage, we identified EDC exposure patterns using BN²MF. In the second stage, we included individual pattern scores and their distributions as exposures of interest in a hierarchical regression model, with child IQ as the outcome, adjusting for potential confounders. We present sex-specific results. Results: PCP-LOD recovered the true number of patterns through cross-validation for all simulations; based on an a priori specified criterion, PCA recovered the true number of patterns in 32% of simulations. PCP-LOD achieved lower relative predictive error than PCA for all simulated datasets with up to 50% of the data < LOD. When 75% of values were < LOD, PCP-LOD outperformed PCA only when noise was low. In the POP mixture, PCP-LOD identified a rank three underlying structure. One pattern represented comprehensive exposure to all POPs. The other two patterns grouped chemicals based on known properties such as structure and toxicity. PCP-LOD also separated 6% of values as extreme events. Most participants had no extreme exposures (44%) or only extremely low exposures (18%). BN²MF estimated the true number of patterns for 99% of simulated datasets. BN²MF's variational confidence intervals achieved 95% coverage across all levels of structural complexity with up to 40% added noise. BN²MF performed comparably with frequentist methods in terms of overall prediction and estimation of underlying loadings and scores. We identified two patterns of EDC exposure in pregnant women, corresponding with diet and personal care product use as potentially separate sources or behaviors leading to exposure. The diet pattern expressed exposure to phthalates and BPA. One standard deviation increase in this pattern was associated with a decrease of 3.5 IQ points (95% credible interval: -6.7, -0.3), on average, in female children but not in males. The personal care product pattern represented exposure to phenols, including parabens, and diethyl phthalate. We found no associations between this pattern and child cognition. Conclusion: PCP-LOD and BN^2MF address limitations of existing pattern recognition methods employed in this field such as user-specified pattern number, lack of interpretability of patterns in terms of human understanding, influence of outlying values, and lack of uncertainty quantification. Both methods identified patterns that grouped chemicals based on known sources (e.g., diet), behaviors (e.g., personal care product use), or properties (e.g., structure and toxicity). Phthalates and BPA found in food packaging and can linings formed a BN²MF-identified pattern of EDC exposure negatively associated with female child intelligence in the Mothers and Newborns cohort. Results may be used to inform interventions designed to target modifiable behavior or regulations to act on dietary exposure sources. Environmental health Biometry Pregnancy Machine learning--Statistical methods Newborn infants--Health and hygiene Environmental toxicology
20	The role of model implementation in neuroscientific applications of machine learning Abe, Taiga January 2024 (has links) In modern neuroscience, large scale machine learning models are becoming increasingly critical components of data analysis. Despite the accelerating adoption of these large scale machine learning tools, there are fundamental challenges to their use in scientific applications that remain largely unaddressed. In this thesis, I focus on one such challenge: variability in the predictions of large scale machine learning models relative to seemingly trivial differences in their implementation. Existing research has shown that the performance of large scale machine learning models (more so than traditional model like linear regression) is meaningfully entangled with design choices such as the hardware components, operating system, software dependencies, and random seed that the corresponding model depends upon. Within the bounds of current practice, there are few ways of controlling this kind of implementation variability across the broad community of neuroscience researchers (making data analysis less reproducible), and little understanding of how data analyses might be designed to mitigate these issues (making data analysis unreliable). This dissertation will present two broad research directions that address these shortcomings. First, I will describe a novel, cloud-based platform for sharing data analysis tools reproducibly and at scale. This platform, called NeuroCAAS, enables developers of novel data analyses to precisely specify an implementation of their entire data analysis, which can then be used automatically by any other user on custom built cloud resources. I show that this approach is able to efficiently support a wide variety of existing data analysis tools, as well as novel tools which would not be feasible to build and share outside of a platform like NeuroCAAS. Second, I conduct two large-scale studies on the behavior of deep ensembles. Deep ensembles are a class of machine learning model which uses implementation variability to improve the quality of model predictions; in particular, by aggregating the predictions of deep networks over stochastic initialization and training. Deep ensembles simultaneously provide a way to control the impact of implementation variability (by aggregating predictions across random seeds) and also to understand what kind of predictive diversity is generated by this particular form of implementation variability. I present a number of surprising results that contradict widely held intuitions about the performance of deep ensembles as well as the mechanisms behind their success, and show that in many aspects, the behavior of deep ensembles is similar to that of an appropriately chosen single neural network. As a whole, this dissertation presents novel methods and insights focused on the role of implementation variability in large scale machine learning models, and more generally upon the challenges of working with such large models in neuroscience data analysis. I conclude by discussing other ongoing efforts to improve the reproducibility and accessibility of large scale machine learning in neuroscience, as well as long term goals to speed the adoption and reliability of such methods in a scientific context. Neurosciences Computer science Machine learning--Statistical methods Machine learning--Mathematical models Deep learning (Machine learning) Cloud computing

Search results