Spelling suggestions: "subject:"inference"" "subject:"lnference""
321 |
Multiple Causal Inference with Bayesian Factor ModelsWang, Yixin January 2020 (has links)
Causal inference from observational data is a vital problem, but it comes with strong assumptions. Most methods assume that we observe all confounders, variables that affect both the cause variables and the outcome variables. But whether we have observed all confounders is a famously untestable assumption. In this dissertation, we develop algorithms for causal inference from observational data, allowing for unobserved confounding. These algorithms focus on problems of multiple causal inference: scientific studies that involve many causes or many outcomes that are simultaneously of interest.
Begin with multiple causal inference with many causes. We develop the deconfounder, an algorithm that accommodates unobserved confounding by leveraging the multiplicity of the causes. How does the deconfounder work? The deconfounder uses the correlation among the multiple causes as evidence for unobserved confounders, combining Bayesian factor models and predictive model checking to perform causal inference.
We study the theoretical requirements for the deconfounder to provide unbiased causal estimates, along with its limitations and trade-offs. We also show how the deconfounder connects to the proxy-variable strategy for causal identification (Miao et al., 2018) by treating subsets of causes as proxies of the unobserved confounder. We demonstrate the deconfounder in simulation studies and real-world data. As an application, we develop the deconfounded recommender, a variant of the deconfounder tailored to causal inference on recommender systems.
Finally, we consider multiple causal inference with many outcomes. We develop the control-outcome deconfounder, an algorithm that corrects for unobserved confounders using multiple negative control outcomes. Negative control outcomes are outcome variables for which the cause is a priori known to have no effect. The control-outcome deconfounder uses the correlation among these outcomes as evidence for unobserved confounders. We discuss the theoretical and empirical properties of the control-outcome deconfounder. We also show how the control-outcome deconfounder generalizes the method of synthetic controls (Abadie et al., 2010, 2015; Abadie and Gardeazabal, 2003), expanding its scope to nonlinear settings and non-panel data.
|
322 |
Bayesian Modelling Frameworks for Simultaneous Estimation, Registration, and Inference for Functions and Planar CurvesMatuk, James Arthur January 2021 (has links)
No description available.
|
323 |
Learning COVID-19 network from literature databases using core decompositionGuo, Yang 22 July 2021 (has links)
The SARS-CoV-2 coronavirus is responsible for millions of deaths around the world. To help contribute to the understanding of crucial knowledge and to further generate new hypotheses relevant to SARS-CoV-2 and human protein interactions, we make use of the information abundant Biomine probabilistic database and extend the experimentally identified SARS-CoV-2-human protein-protein interaction (PPI) network in silico. We generate an extended network by integrating information from the Biomine database and the PPI network. To generate novel hypotheses, we focus on the high-connectivity sub-communities that overlap most with the PPI network in the extended network. Therefore, we propose a new data analysis pipeline that can efficiently compute core decomposition on the extended network and identify dense subgraphs. We then evaluate the identified dense subgraph and the generated hypotheses in three contexts: literature validation for uncovered virus targeting genes and proteins, gene function enrichment analysis on subgraphs, and literature support on drug repurposing for identified tissues and diseases related to COVID-19. The majority types of the generated hypotheses are proteins with their encoding genes and we rank them by sorting their connections to known PPI network nodes. In addition, we compile a comprehensive list of novel genes, and proteins potentially related to COVID-19, as well as novel diseases which might be comorbidities. Together with the generated hypotheses, our results provide novel knowledge relevant to COVID-19 for further validation. / Graduate
|
324 |
Emerging computational methods to support the design and analysis of high performance buildingsCant, Kevin 21 April 2022 (has links)
This thesis presents three emerging computational methods: machine learning,
gradient-free optimization, and Bayesian modelling. Each method is showcased in
its ability to enable energy savings in new and existing buildings when paired with
dynamic energy models. Machine learning algorithms provide rapid computational
speed increases when used as surrogate models, supporting early-stage designs of
buildings. Genetic algorithms support the design of complex interacting systems in a
reduced amount of effort. Finally, Bayesian modelling can be leveraged to incorporate
uncertainty in building energy model calibration. These methods are all readily available
and user-friendly, and can be incorporated into current engineering workflows. / Graduate
|
325 |
Latent Feature Models for Uncovering Human Mobility Patterns from Anonymized User Location Traces with MetadataAlharbi, Basma Mohammed 10 April 2017 (has links)
In the mobile era, data capturing individuals’ locations have become unprecedentedly available. Data from Location-Based Social Networks is one example of large-scale user-location data. Such data provide a valuable source for understanding patterns governing human mobility, and thus enable a wide range of research. However, mining and utilizing raw user-location data is a challenging task. This is mainly due to the sparsity of data (at the user level), the imbalance of data with power-law users and locations check-ins degree (at the global level), and more importantly the lack of a uniform low-dimensional feature space describing users.
Three latent feature models are proposed in this dissertation. Each proposed model takes as an input a collection of user-location check-ins, and outputs a new representation space for users and locations respectively. To avoid invading users privacy, the proposed models are designed to learn from anonymized location data where only IDs - not geophysical positioning or category - of locations are utilized. To enrich the inferred mobility patterns, the proposed models incorporate metadata, often associated with user-location data, into the inference process.
In this dissertation, two types of metadata are utilized to enrich the inferred patterns, timestamps and social ties. Time adds context to the inferred patterns, while social ties amplifies incomplete user-location check-ins. The first proposed model incorporates timestamps by learning from collections of users’ locations sharing the same discretized time. The second proposed model also incorporates time into the learning model, yet takes a further step by considering time at different scales (hour
of a day, day of a week, month, and so on). This change in modeling time allows for capturing meaningful patterns over different times scales. The last proposed model incorporates social ties into the learning process to compensate for inactive users who contribute a large volume of incomplete user-location check-ins. To assess the quality of the new representation spaces for each model, evaluation is done using an external application, social link prediction, in addition to case studies and analysis of inferred patterns. Each proposed model is compared to baseline models, where results show significant improvements.
|
326 |
Probabilistic logic as a unified framework for inferenceKane, Jonathan 12 March 2016 (has links)
I argue that a probabilistic logical language incorporates all the features of deductive, inductive, and abductive inference with the exception of how to generate hypotheses ex nihilo. In the context of abduction, it leads to the Bayes theorem for confirming hypotheses, and naturally captures the theoretical virtue of quantitative parsimony. I address common criticisms against this approach, including how to assign probabilities to sentences, the problem of the catch-all hypothesis, and the problem of auxiliary hypotheses. Finally, I make a tentative argument that mathematical deduction fits in the same probabilistic framework as a deterministic limiting case.
|
327 |
An Exposition on Bayesian InferenceLaffoon, John 01 May 1967 (has links)
The Bayesian approach to probability and statistics is described, a brief history of Bayesianism is related, differences between Bayesian and Frequentist schools of statistics are defined, protential applications are investigated, and a literature survey is presented in the form of a machine-sort card file.
Bayesian thought is increasing in favor among statisticians because of its ability to attack problems that are unassailable from the Frequentist approach. It should become more popular among practitioners because of the flexibility it allows experimenters and the ease with which prior knowledge can be combined with experimental data. (82 pages)
|
328 |
Evaluating Person-Oriented Methods for MediationJanuary 2019 (has links)
abstract: Statistical inference from mediation analysis applies to populations, however, researchers and clinicians may be interested in making inference to individual clients or small, localized groups of people. Person-oriented approaches focus on the differences between people, or latent groups of people, to ask how individuals differ across variables, and can help researchers avoid ecological fallacies when making inferences about individuals. Traditional variable-oriented mediation assumes the population undergoes a homogenous reaction to the mediating process. However, mediation is also described as an intra-individual process where each person passes from a predictor, through a mediator, to an outcome (Collins, Graham, & Flaherty, 1998). Configural frequency mediation is a person-oriented analysis of contingency tables that has not been well-studied or implemented since its introduction in the literature (von Eye, Mair, & Mun, 2010; von Eye, Mun, & Mair, 2009). The purpose of this study is to describe CFM and investigate its statistical properties while comparing it to traditional and casual inference mediation methods. The results of this study show that joint significance mediation tests results in better Type I error rates but limit the person-oriented interpretations of CFM. Although the estimator for logistic regression and causal mediation are different, they both perform well in terms of Type I error and power, although the causal estimator had higher bias than expected, which is discussed in the limitations section. / Dissertation/Thesis / Masters Thesis Psychology 2019
|
329 |
An OLS-Based Method for Causal Inference in Observational StudiesXu, Yuanfang 07 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Observational data are frequently used for causal inference of treatment effects
on prespecified outcomes. Several widely used causal inference methods have adopted
the method of inverse propensity score weighting (IPW) to alleviate the in
uence of
confounding. However, the IPW-type methods, including the doubly robust methods,
are prone to large variation in the estimation of causal e ects due to possible extreme
weights. In this research, we developed an ordinary least-squares (OLS)-based causal
inference method, which does not involve the inverse weighting of the individual
propensity scores.
We first considered the scenario of homogeneous treatment effect. We proposed
a two-stage estimation procedure, which leads to a model-free estimator of
average treatment effect (ATE). At the first stage, two summary scores, the propensity
and mean scores, are estimated nonparametrically using regression splines. The
targeted ATE is obtained as a plug-in estimator that has a closed form expression.
Our simulation studies showed that this model-free estimator of ATE is consistent,
asymptotically normal and has superior operational characteristics in comparison to
the widely used IPW-type methods. We then extended our method to the scenario
of heterogeneous treatment effects, by adding in an additional stage of modeling
the covariate-specific treatment effect function nonparametrically while maintaining
the model-free feature, and the simplicity of OLS-based estimation. The estimated covariate-specific function serves as an intermediate step in the estimation of ATE
and thus can be utilized to study the treatment effect heterogeneity.
We discussed ways of using advanced machine learning techniques in the proposed
method to accommodate high dimensional covariates. We applied the proposed
method to a case study evaluating the effect of early combination of biologic &
non-biologic disease-modifying antirheumatic drugs (DMARDs) compared to step-up
treatment plan in children with newly onset of juvenile idiopathic arthritis disease
(JIA). The proposed method gives strong evidence of significant effect of early combination
at 0:05 level. On average early aggressive use of biologic DMARDs leads to
around 1:2 to 1:7 more reduction in clinical juvenile disease activity score at 6-month
than the step-up plan for treating JIA.
|
330 |
A Switching Regressions Framework for Models with Count-Valued Omni-Dispersed Outcomes: Specification, Estimation and Causal InferenceManalew, Wondimu Samuel 02 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / In this dissertation, I develop a regression-based approach to the specification and
estimation of the effect of a presumed causal variable on a count-valued outcome of
interest. Statistics for relevant causal inference are also derived. As an illustration and as
a basis for comparing alternative parametric specifications with respect to ease of
implementation, computational efficiency and statistical performance, the proposed
models and estimation methods are used to analyze household fertility decisions. I
estimate the effect of a counterfactually imposed additional year of wife’s education on
actual family size (AFS) and desired family size (DFS) [count-valued variables]. In order
to ensure the causal interpretability of the effect parameter as I define it, the underlying
regression model is cast in a potential outcomes (PO) framework. The specification of the
relevant data generating process (DGP) is also derived. The regression-based approach
developed in the dissertation, in addition to taking explicit account of the fact that the
outcome of interest is count-valued, is designed to account for potential sample selection
bias due to a particular data deficiency in the count data context and to accommodate the
possibility that some structural aspects of the model may vary with the value of a binary
switching variable. Moreover, my approach loosens the equi-dispersion constraint
[conditional mean (CM) equals conditional variance (CV)] that plagues conventional
(poisson) count-outcome regression models. This is a particularly important feature of
my model and method because in most contexts in empirical economics the data are either over-dispersed (CM < CV) or under-dispersed (CM > CV) – fertility models are
usually characterized by the latter. Alternative count data models were discussed and
compared using simulated and real data. The simulation results and estimation results
using real data suggest that the estimated effects from my proposed models (models that
loosen the equi-dispersion constraint, account for the sample selection, and
accommodate variability in structural aspect of the models due to a switching variable)
substantively differ from estimates from a conventional linear and count regression
specifications.
|
Page generated in 0.0569 seconds