541 |
Uncertainty Quantification for Micro-Scale Simulations of Flow in Plant CanopiesGiacomini, Beatrice January 2023 (has links)
Recent decades have seen remarkable increase in the fidelity of computational fluid dynamics (CFD) models for the simulation of exchange processes between plant canopies and the atmosphere. However, no matter how accurate the selected CFD solver is, model results are found to be affected by an irreducible level of uncertainty that originates from the inability of exactly measuring vegetation (leaf orientation, foliage density, plant reconfiguration) and flow features (incoming wind direction, solar radiation, stratification effects).
Motivated by this consideration, the present PhD thesis proposes a Bayesian uncertainty quantification (UQ) framework for evaluating uncertainty on model parameters and its impact on model results, in the context of CFD for idealized and realistic plant canopy flow. Two problems are considered. First, for the one-dimensional flow within and above the Duke forest near Durham, NC, a one-dimensional Reynolds-averaged Navier--Stokes model is employed. In-situ measurements of turbulence statistics are used to inform the UQ framework in order to evaluate uncertainty on plant geometry and its impact on turbulence statistics and aerodynamic coefficients.
The second problem is characterized by a more realistic setup, with three-dimensional simulations aiming at replicating the flow over a walnut block in Dixon, CA. Due to the substantial computational cost associated with large-eddy simulation (LES), a surrogate model is used for flow simulations. The surrogate is built on top of an exiguous number of LESs over realistic plant canopy, with plant area density derived from LiDAR measurements. Here, the goal is to investigate uncertainty on incoming wind direction and potential repercussions on turbulence statistics. Synthetic data are used to inform the framework.
In both cases, uncertainty on model parameters is characterized via a Markov chain Monte Carlo procedure (inverse problem) and propagated to model results through Monte Carlo sampling (forward problem). In the validation phase, profiles of turbulence statistics with associated uncertainty are compared with the measurements used to inform the framework. By providing an enriched solution for simulation of flow over idealized and realistic plant canopy, this PhD thesis highlights the potential of UQ to enhance prediction of micro-scale exchange processes between vegetation and atmosphere.
|
542 |
General Bayesian Calibration Framework for Model Contamination and Measurement ErrorWang, Siquan January 2023 (has links)
Many applied statistical applications face the potential problem of model contamination and measurement error. The form and degree of contamination as well as the measurement error are usually unknown and sample-specific, which brings additional challenges for researchers. In this thesis, we have proposed several Bayesian inference models to address these issues, with the application to one type of special data for allergen concentration measurement, which is called serial dilution data and is self-calibrated.
In our first chapter, we address the problem of model contamination by using a multilevel model to simultaneously flag problematic observations and estimate unknown concentrations in serial dilution data, a problem where the current approach can lead to noisy estimates and difficulty in estimating very low or high concentrations.
In our second chapter, we propose the Bayesian joint contamination model for modeling multiple measurement units at the same time while adjusting for differences between experiments using the idea of global calibration, and it could account for uncertainty in both predictors and response variables in Bayesian regression. We are able to get efficacy gain by analyzing multiple experiments together while maintaining robustness with the use of hierarchical models.
In our third chapter, we develop a Bayesian two-step inference model to account for measurement uncertainty propagation in regression analysis when the joint inference model is infeasible. We aim to increase model inference reliability while providing flexibility to users by not restricting the type of inference model used in the first step. For each of the proposed methods, We also demonstrate how to integrate multiple model building blocks through the idea of Bayesian workflow.
In extensive simulation studies, we show that our proposed methods outperform other commonly used approaches. For the data applications, we apply the proposed new methods to the New York City Neighborhood Asthma and Allergy Study (NYC NAAS) data to estimate indoor allergen concentrations more accurately as well as reveal the underlying associations between dust mite allergen concentrations and the exhaled nitric oxide (NO) measurement for asthmatic children. The methods and tools developed here have a wide range of applications and can be used to improve lab analyses, which are crucial for quantifying exposures to assess disease risk and evaluating interventions.
|
543 |
A Force Directed Graph for Visualization of Voters Preferences Relative to Political Parties / En Force Directed Graph för visualisering av medborgares preferenser i relation till politiska partierNeppare, Christoffer January 2018 (has links)
As conversations in society to a larger extent are carried out on the internet, so are the civic conversations that are the basis of the democratic process. To help citizens better navigate the democratic process, several news agencies in Sweden provide a version of Valkompassen, the election compass. The intent behind Valkompassen is to give the normal reader an easily understood answer for which party they are most aligned with on 25 questions relevant for the election. This paper suggests an alternative information visualization based on a force directed graph of the results from Valkompassen (developed by TT Nyhetsbyrån). The affordances of a force-directed graph make it an interesting option due to its ability to display specific questions in an intuitive way and aesthetically pleasing way on a two or three dimensional plane not restricted to a political left-right axis. The results of this study suggest that the average citizen might not be familiar with the force directed graph as an information visualization tool but that they felt confident in using it after a few minutes of interaction. Most participants in the study did experience seeing their political sympathies spread across the political left-right divide and found the graph informative for exploring individual questions. The report did not however find that the alternative graph replaced the original Valkompassen in experience provided. The discussion contains some recommendations for how to facilitate the learning curve and proposals for how the artifact could better make use of the affordances of the force-directed graph in the future. / I dagens samhälle äger diskussioner i allt större grad rum på internet vilket även gäller för de diskussioner som ligger till grund för den demokratiska processen. För att hjälpa medborgare bättre navigera i det politiska landskapet och aktivt delta i den demokratiska processen har flera nyhetsbyråer bidragit med ett test av respondentens politiska åsikter kallat“Valkompassen. Avsikten bakom Valkompassen är att ge den vanliga läsaren ett lättförståeligt svar på vilket parti de mest stämmer överens med baserat på 25 frågor som är relevanta för det kommande valet. Den här uppsatsen föreslår ett alternativ informationsvisualisering baserad på en force directed graph byggd på resultaten från Valkompassen (utvecklad av TT Nyhetsbyrån). Egenskaperna en force directed graph erbjuder gör den till ett intressant alternativ då den låter individuella frågor visualiseras på estetiskt tilltalande sätt i ett två- eller tredimensionellt plan utan att vara begränsad till en politisk höger-vänster axel. Resultaten indikerar att den genomsnittliga medborgaren kanske inte är bekant med en force directed graph som en informationsvisualisering, men att de kände sig bekväma med att använda den efter några minuters interaktion. De flesta deltagarna i studien upplevde att de kunde se sina politiska sympatier spridda över både den politiska vänstern och högern i grafen och fann grafen informativ för att utforska individuella frågor. Studien fann däremot inte att den alternativa grafen ersatte Valkompassen i användarupplevelse. Diskussionen innehåller rekommendationer för hur inlärningskurvan kan underlättas och förslag på hur prototypen bättre kan använda de inneboende egenskaperna hos en force directed graph i framtiden.
|
544 |
Exploring Confidence Intervals in the Case of Binomial and Hypergeometric DistributionsMojica, Irene 01 January 2011 (has links)
The objective of this thesis is to examine one of the most fundamental and yet important methodologies used in statistical practice, interval estimation of the probability of success in a binomial distribution. The textbook confidence interval for this problem is known as the Wald interval as it comes from the Wald large sample test for the binomial case. It is generally acknowledged that the actual coverage probability of the standard interval is poor for values of p near 0 or 1. Moreover, recently it has been documented that the coverage properties of the standard interval can be inconsistent even if p is not near the boundaries. For this reason, one would like to study the variety of methods for construction of confidence intervals for unknown probability p in the binomial case. The present thesis accomplishes the task by presenting several methods for constructing confidence intervals for unknown binomial probability p. It is well known that the hypergeometric distribution is related to the binomial distribution. In particular, if the size of the population, N, is large and the number of items of interest k is such that k/N tends to p as N grows, then the hypergeometric distribution can be approximated by the binomial distribution. Therefore, in this case, one can use the confidence intervals constructed for p in the case of the binomial distribution as a basis for construction of the confidence intervals for the unknown value k = pN. The goal of this thesis is to study this approximation and to point out several confidence intervals which are designed specifically for the hypergeometric distribution. In particular, this thesis considers several confidence intervals which are based on estimation of a binomial proportion as well as Bayesian credible sets based on various priors.
|
545 |
Decision Theory Classification Of High-dimensional Vectors Based On Small SamplesBradshaw, David 01 January 2005 (has links)
In this paper, we review existing classification techniques and suggest an entirely new procedure for the classification of high-dimensional vectors on the basis of a few training samples. The proposed method is based on the Bayesian paradigm and provides posterior probabilities that a new vector belongs to each of the classes, therefore it adapts naturally to any number of classes. Our classification technique is based on a small vector which is related to the projection of the observation onto the space spanned by the training samples. This is achieved by employing matrix-variate distributions in classification, which is an entirely new idea. In addition, our method mimics time-tested classification techniques based on the assumption of normally distributed samples. By assuming that the samples have a matrix-variate normal distribution, we are able to replace classification on the basis of a large covariance matrix with classification on the basis of a smaller matrix that describes the relationship of sample vectors to each other.
|
546 |
Exploiting Structure in Coordinating Multiple Decision MakersMostafa, Hala 01 September 2011 (has links)
This thesis is concerned with sequential decision making by multiple agents, whether they are acting cooperatively to maximize team reward or selfishly trying to maximize their individual rewards. The practical intractability of this general problem led to efforts in identifying special cases that admit efficient computation, yet still represent a wide enough range of problems. In our work, we identify the class of problems with structured interactions, where actions of one agent can have non-local effects on the transitions and/or rewards of another agent. We addressed the following research questions: 1) How can we compactly represent this class of problems? 2) How can we efficiently calculate agent policies that maximize team reward (for cooperative agents) or achieve equilibrium (selfinterested agents)? 3) How can we exploit structured interactions to make reasoning about communication offline tractable? For representing our class of problems, we developed a new decision-theoretic model, Event-Driven Interactions with Complex Rewards (EDI-CR), that explicitly represents structured interactions. EDI-CR is a compact yet general representation capable of capturing problems where the degree of coupling among agents ranges from complete independence to complete dependence. For calculating agent policies, we draw on several techniques from the field of mathematical optimization and adapt them to exploit the special structure in EDI-CR. We developed a Mixed Integer Linear Program formulation of EDI-CR with cooperative agents that results in programs much more compact and faster to solve than formulations ignoring structure. We also investigated the use of homotopy methods as an optimization technique, as well as formulation of self-interested EDI-CR as a system of non-linear equations. We looked at the issue of communication in both cooperative and self-interested settings. For the cooperative setting, we developed heuristics that assess the impact of potential communication points and add the ones with highest impact to the agents' decision problems. Our heuristics successfully pick communication points that improve team reward while keeping problem size manageable. Also, by controlling the amount of communication introduced by a heuristic, our approach allows us to control the tradeoff between solution quality and problem size. For self-interested agents, we look at an example setting where communication is an integral part of problem solving, but where the self-interested agents have a reason to be reticent (e.g. privacy concerns). We formulate this problem as a game of incomplete information and present a general algorithm for calculating approximate equilibrium profile in this class of games.
|
547 |
Increasing Scalability in Algorithms for Centralized and Decentralized Partially Observable Markov Decision Processes: Efficient Decision-Making and Coordination in Uncertain EnvironmentsAmato, Christopher 01 September 2010 (has links)
As agents are built for ever more complex environments, methods that consider the uncertainty in the system have strong advantages. This uncertainty is common in domains such as robot navigation, medical diagnosis and treatment, inventory management, sensor networks and e-commerce. When a single decision maker is present, the partially observable Markov decision process (POMDP) model is a popular and powerful choice. When choices are made in a decentralized manner by a set of decision makers, the problem can be modeled as a decentralized partially observable Markov decision process (DEC-POMDP). While POMDPs and DEC-POMDPs offer rich frameworks for sequential decision making under uncertainty, the computational complexity of each model presents an important research challenge. As a way to address this high complexity, this thesis develops several solution methods based on utilizing domain structure, memory-bounded representations and sampling. These approaches address some of the major bottlenecks for decision-making in real-world uncertain systems. The methods include a more efficient optimal algorithm for DEC-POMDPs as well as scalable approximate algorithms for POMDPs and DEC-POMDPs. Key contributions include optimizing compact representations as well as automatic structure extraction and exploitation. These approaches increase the scalability of algorithms, while also increasing their solution quality.
|
548 |
Computational Psychometrics for Item-based Computerized Adaptive LearningChen, Yi January 2023 (has links)
With advances in computer technology and expanded access to educational data, psychometrics faces new opportunities and challenges for enhancing pattern discovery and decision-making in testing and learning. In this dissertation, I introduced three computational psychometrics studies for solving the technical problems in item-based computerized adaptive learning (CAL) systems related to dynamic measurement, diagnosis, and recommendation based on Bayesian item response theory (IRT).
For the first study, I introduced a new knowledge tracing (KT) model, dynamic IRT (DIRT), which can iteratively update the posterior distribution of latent ability based on moment match approximation and capture the uncertainty of ability change during the learning process. For dynamic measurement, DIRT has advantages in interpretation, flexibility, computation cost, and implementability. For the second study, A new measurement model, named multilevel and multidimensional item response theory with Q matrix (MMIRT-Q), was proposed to provide fine-grained diagnostic feedback. I introduced sequential Monte Carlo (SMC) for online estimation of latent abilities.
For the third study, I proposed the maximum expected ratio of posterior variance reduction criterion (MERPV) for testing purposes and the maximum expected improvement in posterior mean (MEIPM) criterion for learning purposes under the unified framework of IRT. With these computational psychometrics solutions, we can improve the students’ learning and testing experience with accurate psychometrics measurement, timely diagnosis feedback, and efficient item selection.
|
549 |
Correcting for Measurement Error and Misclassification using General Location ModelsKwizera, Muhire Honorine January 2023 (has links)
Measurement error is common in epidemiologic studies and can lead to biased statistical inference. It is well known, for example, that regression analyses involving measurement error in predictors often produce biased model coefficient estimates. The work in this dissertation adds to the existing vast literature on measurement error by proposing a missing data treatment of measurement error through general location models.
The focus is on the case in which information about the measurement error model is not obtained from a subsample of the main study data but from separate, external information, namely the external calibration. Methods for handling measurement error in the setting of external calibration are in need with the increase in the availability of external data sources and the popularity of data integration in epidemiologic studies. General location models are well suited for the joint analysis of continuous and discrete variables. They offer direct relationships with the linear and logistic regression models and can be readily implemented using frequentist and Bayesian approaches. We use the general location models to correct for measurement error and misclassification in the context of three practical problems.
The first problem concerns measurement error in a continuous variable from a dataset containing both continuous and categorical variables. In the second problem, measurement error in the continuous variable is further complicated by the limit of detection (LOD) of the measurement instrument, resulting in some measures of the error-prone continuous variable undetectable if they are below LOD. The third problem deals with misclassification in a binary treatment variable. We implement the proposed methods using Bayesian approaches for the first two problems and using the Expectation-maximization algorithm for the third problem.
For the first problem we propose a Bayesian approach, based on the general location model, to correct measurement error of a continuous variable in a data set with both continuous and categorical variables. We consider the external calibration setting where in addition to the main study data of interest, calibration data are available and provide information on the measurement error but not on the error-free variables.
The proposed method uses observed data from both the calibration and main study samples and incorporates relationships among all variables in measurement error adjustment, unlike existing methods that only use the calibration data for model estimation. We assume by strong nondifferential measurement error (sNDME) that the measurement error is independent of all the error-free variables given the true value of the error-prone variable. The sNDME assumption allows us to identify our model parameters. We show through simulations that the proposed method yields reduced bias, smaller mean squared error, and interval coverage closer to the nominal level compared to existing methods in regression settings. Furthermore, this improvement is pronounced with increased measurement error, higher correlation between covariates, and stronger covariate effects. We apply the new method to the New York City Neighborhood Asthma and Allergy Study to examine the association between indoor allergen concentrations and asthma morbidity among urban asthmatic children.
The simultaneous occurrence of measurement error and LOD is common particularly in environmental exposures such as measurements of the indoor allergen concentrations mentioned in the first problem. Statistical analyses that do not address these two problems simultaneously could lead to wrong scientific conclusions. To address this second problem, we extend the Bayesian general location models for measurement error adjustment to handle both measurement error and values below LOD in a continuous environmental exposure in a regression setting with mixed continuous and discrete variables. We treat values below LOD as censored. Simulations show that our method yields smaller bias and root mean squared error and the posterior credible interval of our method has coverage closer to the nominal level compared to alternative methods, even when the proportion of data below LOD is moderate. We revisit data from the New York City Neighborhood Asthma and Allergy Study and quantify the effect of indoor allergen concentrations on childhood asthma when over 50% of the measured concentrations are below LOD.
We finally look at the third problem of group mean comparison when treatment groups are misclassified. Our motivation comes from the Frequent User Services Engagement (FUSE) study. Researchers wish to compare quantitative health and social outcome measures for frequent jail-and-shelter users who were assigned housing and those who were not housed, and misclassification occurs as a result of noncompliance. The recommended intent-to-treat analysis which is based on initial group assignment is known to underestimate group mean differences. We use the general location model to estimate differences in group means after adjusting for misclassification in the binary grouping variable. Information on the misclassification is available through the sensitivity and specificity. We assume nondifferential misclassification so that misclassification does not depend on the outcome. We use the expectation-maximization algorithm to obtain estimates of the general location model parameters and the group means difference. Simulations show the bias reduction in the estimates of group means difference.
|
550 |
The Functional Mechanism of the Bacterial Ribosome, an Archetypal Biomolecular MachineRay, Korak Kumar January 2023 (has links)
Biomolecular machines are responsible for carrying out a host of essential cellular processes. In accordance to the wide range of functions they execute, the architectures of these also vary greatly. Yet, despite this diversity in both structure and function, they have some common characteristics. They are all large macromolecular complexes that enact multiple steps during the course of their functions. They are also ’Brownian’ in nature, i.e., they rectify the thermal motions of their surroundings into work. Yet how these machines can utilise their surrounding thermal energy in a directional manner, and do so in a cycle over and over again, is still not well understood.
The work I present in this thesis spans the development, evaluation and use of biophysical, in particular single-molecule, tools in the study of the functional mechanisms of biomolecular machines. In Chapter 2, I describe a mathematical framework which utilises both the framework of Bayesian inference to relate any experimental data to an ideal template irrespective of the scale, background and noise in the data. This framework may be used for the analysis of data generated by multiple experimental techniques in an accurate, fast, and human-independent manner.
One such application is described in Chapter 3, where this framework is used to evaluate the extent of spatial information present in experimental data generated using cryogenic electron microscopy (cryoEM). This application will not only aid the study of biomolecular structure using cryoEM by structural biologists, but also enable biophysicists and biochemists who use structural models to interpret and design their experiments to evaluate the cryoEM data they need to use for their investigations.
In Chapter 4, I describe an investigation into the use of one class of analytical models, hidden Markov models (HMMs) to accurately extract kinetic information from single-molecule experimental data, such as the data generated by single-molecule fluorescence resonance energy transfer (smFRET) experiments.
Finally in Chapter 5, I describe how single-molecule experiments have led to the discovery of a mechanism by which ligands can modulate and drive the conformational dynamics of the ribosome in a manner that facilitates ribosome-catalysed protein synthesis. This mechanism has implications to our understanding of the functional mechanisms of the ribosome in particular, and of biomolecular machines in general.
|
Page generated in 0.0718 seconds