281 |
Addressing the Uncertainty Due to Random Measurement Errors in Quantitative Analysis of Microorganism and Discrete Particle Enumeration DataSchmidt, Philip J. 10 1900 (has links)
Parameters associated with the detection and quantification of microorganisms (or discrete particles) in water such as the analytical recovery of an enumeration method, the concentration of the microorganisms or particles in the water, the log-reduction achieved using a treatment process, and the sensitivity of a detection method cannot be measured exactly. There are unavoidable random errors in the enumeration process that make estimates of these parameters imprecise and possibly also inaccurate. For example, the number of microorganisms observed divided by the volume of water analyzed is commonly used as an estimate of concentration, but there are random errors in sample collection and sample processing that make these estimates imprecise. Moreover, this estimate is inaccurate if poor analytical recovery results in observation of a different number of microorganisms than what was actually present in the sample. In this thesis, a statistical framework (using probabilistic modelling and Bayes’ theorem) is developed to enable appropriate analysis of microorganism concentration estimates given information about analytical recovery and knowledge of how various random errors in the enumeration process affect count data. Similar models are developed to enable analysis of recovery data given information about the seed dose. This statistical framework is used to address several problems: (1) estimation of parameters that describe random sample-to-sample variability in the analytical recovery of an enumeration method, (2) estimation of concentration, and quantification of the uncertainty therein, from single or replicate data (which may include non-detect samples), (3) estimation of the log-reduction of a treatment process (and the uncertainty therein) from pre- and post-treatment concentration estimates, (4) quantification of random concentration variability over time, and (5) estimation of the sensitivity of enumeration processes given knowledge about analytical recovery. The developed models are also used to investigate alternative strategies that may enable collection of more precise data. The concepts presented in this thesis are used to enhance analysis of pathogen concentration data in Quantitative Microbial Risk Assessment so that computed risk estimates are more predictive. Drinking water research and prudent management of treatment systems depend upon collection of reliable data and appropriate interpretation of the data that are available.
|
282 |
Scalable Nonparametric Bayes LearningBanerjee, Anjishnu January 2013 (has links)
<p>Capturing high dimensional complex ensembles of data is becoming commonplace in a variety of application areas. Some examples include</p><p>biological studies exploring relationships between genetic mutations and diseases, atmospheric and spatial data, and internet usage and online behavioral data. These large complex data present many challenges in their modeling and statistical analysis. Motivated by high dimensional data applications, in this thesis, we focus on building scalable Bayesian nonparametric regression algorithms and on developing models for joint distributions of complex object ensembles.</p><p>We begin with a scalable method for Gaussian process regression, a commonly used tool for nonparametric regression, prediction and spatial modeling. A very common bottleneck for large data sets is the need for repeated inversions of a big covariance matrix, which is required for likelihood evaluation and inference. Such inversion can be practically infeasible and even if implemented, highly numerically unstable. We propose an algorithm utilizing random projection ideas to construct flexible, computationally efficient and easy to implement approaches for generic scenarios. We then further improve the algorithm incorporating some structure and blocking ideas in our random projections and demonstrate their applicability in other contexts requiring inversion of large covariance matrices. We show theoretical guarantees for performance as well as substantial improvements over existing methods with simulated and real data. A by product of the work is that we discover hitherto unknown equivalences between approaches in machine learning, random linear algebra and Bayesian statistics. We finally connect random projection methods for large dimensional predictors and large sample size under a unifying theoretical framework.</p><p>The other focus of this thesis is joint modeling of complex ensembles of data from different domains. This goes beyond traditional relational modeling of ensembles of one type of data and relies on probability mixing measures over tensors. These models have added flexibility over some existing product mixture model approaches in letting each component of the ensemble have its own dependent cluster structure. We further investigate the question of measuring dependence between variables of different types and propose a very general novel scaled measure based on divergences between the joint and marginal distributions of the objects. Once again, we show excellent performance in both simulated and real data scenarios.</p> / Dissertation
|
283 |
The role of forensic epidemiology in evidence-based forensic medical practiceFreeman, Michael January 2013 (has links)
Objectives This thesis is based on 4 papers that were all written with the same intent, which was to describe and demonstrate how epidemiologic concepts and data can serve as a basis for improved validity of probabilistic conclusions in forensic medicine (FM). Conclusions based on probability are common in FM, and the validity of probabilistic conclusions is dependant on their foundation, which is often no more than personal experience. Forensic epidemiology (FE) describes the use and application of epidemiologic methods and data to questions encountered in the practice of FM, as a means of providing an evidence-based foundation, and thus increased validity, for certain types of opinions. The 4 papers comprising this thesis describe 4 unique applications of FE that have the common goal of assessing probabilities associated with evidence gathered during the course of the investigation of traumatic injury and death. Materials and Methods Paper I used a case study of a fatal traffic crash in which the seat position of the surviving occupant was uncertain as an example for describing a probabilistic approach to the investigation of occupant position in a fatal crash. The methods involved the matching of the occupants’ injuries to the vehicular and crash evidence in order to assess the probability that the surviving occupant was either the driver or passenger of the vehicle at the time of the crash. In the second and third papers, epidemiologic data pertaining to traffic crash-related injuries from the National Automotive Sampling System-Crashworthiness Data System (NASS-CDS) was used to assess the utility and strength of evidence, such as vehicle deformation and occupant injury of a particular severity and pattern, as a means of assessing the probability of an uncertain issue of interest. The issue of interest in Paper II was the seat position of the occupant at the time of a rollover crash (similar to Paper I), and the association that was investigated was the relationship between the degree of downward roof deformation and likelihood of a serious head and neck injury in the occupant. The analysis was directed at the circumstance in which a vehicle has sustained roof deformation on one side but not the other, and only one of the occupants has sustained a serious head or neck injury. In Paper III the issue of interest was whether an occupant was using a seat belt prior to being ejected from a passenger vehicle, when there was evidence that the seat belt could have unlatched during a crash, and thus it was uncertain whether the occupant was restrained and then ejected after the seat belt unlatched, or unrestrained. Of particular interest was the relative frequency of injury to the upper extremity closest to the side window (the outboard upper extremity [OUE]), as several prior authors have postulated that during ejection when the seat belt has become unlatched the retracting seat belt would invariably cinch around the OUE and cause serious injury. In Paper IV the focus of the analysis was the predictability of the distribution of skull and cervical spine fractures associated with fatal falls as a function of the fall circumstances. Swedish autopsy data were used as the source material for this study. Results In Paper I the indifferent pre-crash probability that the survivor was the driver (0.5) was modified by the evidence to arrive at a post-test odds of 19 to 1 that he was driving. In Paper II NASS-CDS data for 960 (unweighted) occupants of rollover crashes were included in the analysis. The association between downward roof deformation and head and neck injury severity (as represented by a composite numerical value [HNISS] ranging from 1 to 75) was as follows: for each unit increase of the HNISS there were increased odds of 4% that the occupant was exposed to >8 cm of roof crush versus <8 cm; 6% for >15 cm compared to <8 cm, and 11% for >30 cm of roof crush compared to <8 cm. In Paper III NASS-CDS data for 232,931 (weighted) ejected occupants were included in the analysis, with 497 coded as seat belt failures, and 232,434 coded as unbelted. Of the 7 injury types included in the analysis, only OUE and serious head injury were found to have a significant adjusted association with seat belt failure, (OR=3.87, [95% CI 1.2, 13.0] and 3.1, [95% CI 1.0, 9.7], respectively). The results were used to construct a table of post-test probabilities that combined the derived sensitivity and (1 - specificity) rates with a range of pre-crash seat belt use rates so that the results could be used in an investigation of a suspected case of belt latch failure. In Paper IV, the circumstances of 1,008 fatal falls were grouped in 3 categories of increasing fall height; falls occurring at ground level, falls from a height of <3 meters or down stairs, and falls from ≥3 meters. Logistic regression modeling revealed significantly increased odds of skull base and lower cervical fracture in the middle (<3 m) and upper (≥3 m) fall height groups, relative to ground level falls, as follows: (lower cervical <3 m falls, OR = 2.55 [1.32, 4.92]; lower cervical ≥3 m falls, OR = 2.23 [0.98, 5.08]; skull base <3 m falls, OR = 1.82 [1.32, 2.50]; skull base ≥3 m falls, OR = 2.30 [1.55, 3.40]). Additionally, C0-C1 dislocations were strongly related to fall height, with an OR of 8.3 for the injury in a ≥3 m fall versus ground level. Conclusions In this thesis 4 applications of FE methodology were described. In all of the applications epidemiologic data resulting from prior FM investigations were analyzed in order to draw probabilistic conclusions that could be reliably applied to the circumstances of a specific investigation. It is hoped that this thesis will serve to demonstrate the utility of FE in enhancing evidence-based practice in FM.
|
284 |
Fonctions de perte en actuariatCraciun, Geanina January 2009 (has links)
Mémoire numérisé par la Division de la gestion de documents et des archives de l'Université de Montréal
|
285 |
Investigations into the design and dissection of genetic networksLibby, Eric. January 2007 (has links)
The sequencing of the human genome revealed that the number of genes does not explain why humans are different from other organisms like mice and dogs. Instead, it is how genes interact with each other and the environment that separates us from other organisms. This motivates the study of genetic networks and, consequently, my research. My work delves into the roles that simple genetic networks play in a cell and explores the biotechnological aspects of how to uncover such genes and their interactions in experimental models. / Cells must respond to the extracellular environment to contract, migrate, and live. Cells, however, are subject to stochastic fluctuations in protein concentrations. I investigate how cells make important decisions such as gene transcription based on noisy measurements of the extracellular environment. I propose that genetic networks perform Bayesian inference as a way to consider the probabilistic nature of these measurements and make the best decision. With mathematical models, I show that allosteric repressors and activators can correctly infer the state of the environment despite fluctuating concentrations of molecules. Viewing transcriptional networks as inference modules explains previous experimental data. I also discover that the particular inference problem determines whether repressors or activators are better. / Next, I explore the genetic underpinnings of two canine models of atrial fibrillation: atrial tachypacing and ventricular tachypacing. Using Affymetrix microarrays, I find that the genetic signatures of these two models are significantly different both in magnitude and in class of genes expressed. The ventricular tachypacing model has thousands of transcripts differentially expressed with little overlap between 24 hours and 2 weeks, suggesting independent mechanisms. The atrial tachypacing model demonstrates an adaptation as the number of genes found changed decreases with increasing time to the point that no genes are changed at 6 weeks. I use higher level analysis to find that extracellular matrix components are among the most changed in ventricular tachypacing and that genes like connective tissue growth factor may be responsible. / Finally, I generalize the main problem of microarray analysis into an evaluation problem of choosing between two competing options based on the scores of many independent judges. In this context, I rediscover the voting paradox and compare two different solutions to this problem: the sum rule and the majority rule. I find that the accuracy of a decision depends on the distribution of the judges' scores. Narrow distributions are better solved with a sum rule, while broad distributions prefer a majority rule. This finding motivates a new algorithm for microarray analysis which outperforms popular existing algorithms on a sample data set and the canine data set examined earlier. A cost analysis reveals that the optimal number of judges depends on the ratio of the cost of a wrong decision to the cost of a judge.
|
286 |
Real world performance of choice-based conjoint modelsNatter, Martin, Feurstein, Markus January 2001 (has links) (PDF)
Conjoint analysis is one of the most important tools to support product development, pricing and positioning decisions in management practice. For this purpose various models have been developed. It is widely accepted that models that take consumer heterogeneity into account, outperform aggregate models in terms of hold-out tasks. The aim of our study is to investigate empirically whether predictions of choice-based conjoint models which incorporate heterogeneity can successfully be generalized to a whole market. To date no studies exist that examine the real world performance of choice-based conjoint models by use of aggregate scanner panel data. Our analysis is based on four commercial choice-based conjoint pricing studies including a total of 43 stock keeping units (SKU) and the corresponding weekly scanning data for approximately two years. An aggregate model serves as a benchmark for the performance of two models that take heterogeneity into account, hierarchical Bayes and latent class. Our empirical analysis demonstrates that, in contrast to the performance using hold-out tasks, the real world performance of hierarchical Bayes and latent class is similar to the performance of the aggregate model. Our results indicate that heterogeneity cannot be generalized to a whole market and suggest that aggregate models are sufficient to predict market shares. (author's abstract) / Series: Report Series SFB "Adaptive Information Systems and Modelling in Economics and Management Science"
|
287 |
Applying Data Mining Techniques on Continuous Sensed Data : For daily living activity recognitionLi, Yunjie January 2014 (has links)
Nowadays, with the rapid development of the Internet of Things, the applicationfield of wearable sensors has been continuously expanded and extended, especiallyin the areas of remote electronic medical treatment, smart homes ect. Human dailyactivities recognition based on the sensing data is one of the challenges. With avariety of data mining techniques, the activities can be automatically recognized. Butdue to the diversity and the complexity of the sensor data, not every kind of datamining technique can performed very easily, until after a systematic analysis andimprovement. In this thesis, several data mining techniques were involved in theanalysis of a continuous sensing dataset in order to achieve the objective of humandaily activities recognition. This work studied several data mining techniques andfocuses on three of them; Decision Tree, Naive Bayes and neural network, analyzedand compared these techniques according to the classification results. The paper alsoproposed some improvements to the data mining techniques according to thespecific dataset. The comparison of the three classification results showed that eachclassifier has its own limitations and advantages. The proposed idea of combing theDecision Tree model with the neural network model significantly increased theclassification accuracy in this experiment.
|
288 |
Duomenų tyrybos empirinių Bajeso metodų tyrimas ir taikymas / Analysis and application of empirical Bayes methods in data miningJakimauskas, Gintautas 23 April 2014 (has links)
Darbo tyrimų objektas yra duomenų tyrybos empiriniai Bajeso metodai ir algoritmai, taikomi didelio matavimų skaičiaus didelių populiacijų duomenų analizei. Darbo tyrimų tikslas yra sudaryti metodus ir algoritmus didelių populiacijų neparametrinių hipotezių tikrinimui ir duomenų modelių parametrų vertinimui. Šiam tikslui pasiekti yra sprendžiami tokie uždaviniai: 1. Sudaryti didelio matavimo duomenų skaidymo algoritmą. 2. Pritaikyti didelio matavimo duomenų skaidymo algoritmą neparametrinėms hipotezėms tikrinti. 3. Pritaikyti empirinį Bajeso metodą daugiamačių duomenų komponenčių nepriklausomumo hipotezei tikrinti su skirtingais matematiniais modeliais, nustatant optimalų modelį ir atitinkamą empirinį Bajeso įvertinį. 4. Sudaryti didelių populiacijų retų įvykių dažnių vertinimo algoritmą panaudojant empirinį Bajeso metodą palyginant Puasono-gama ir Puasono-Gauso matematinius modelius. 5. Sudaryti retų įvykių logistinės regresijos algoritmą panaudojant empirinį Bajeso metodą. Darbo metu gauti nauji rezultatai įgalina atlikti didelio matavimo duomenų skaidymą; atlikti didelio matavimo nekoreliuotų duomenų pasirinktų komponenčių nepriklausomumo tikrinimą; parinkti didelių populiacijų retų įvykių optimalų modelį ir atitinkamą empirinį Bajeso įvertinį. Pateikta nesinguliarumo sąlyga Puasono-gama modelio atveju. / The research object is data mining empirical Bayes methods and algorithms applied in the analysis of large populations of large dimensions. The aim and objectives of the research are to create methods and algorithms for testing nonparametric hypotheses for large populations and for estimating the parameters of data models. The following problems are solved to reach these objectives: 1. To create an efficient data partitioning algorithm of large dimensional data. 2. To apply the data partitioning algorithm of large dimensional data in testing nonparametric hypotheses. 3. To apply the empirical Bayes method in testing the independence of components of large dimensional data vectors. 4. To develop an algorithm for estimating probabilities of rare events in large populations, using the empirical Bayes method and comparing Poisson-gamma and Poisson-Gaussian mathematical models, by selecting an optimal model and a respective empirical Bayes estimator. 5. To create an algorithm for logistic regression of rare events using the empirical Bayes method. The results obtained enables us to perform very fast and efficient partitioning of large dimensional data; testing the independence of selected components of large dimensional data; selecting the optimal model in the estimation of probabilities of rare events, using the Poisson-gamma and Poisson-Gaussian mathematical models and empirical Bayes estimators. The nonsingularity condition in the case of the Poisson-gamma model is presented.
|
289 |
Analysis and application of empirical Bayes methods in data mining / Duomenų tyrybos empirinių Bajeso metodų tyrimas ir taikymasJakimauskas, Gintautas 23 April 2014 (has links)
The research object is data mining empirical Bayes methods and algorithms applied in the analysis of large populations of large dimensions. The aim and objectives of the research are to create methods and algorithms for testing nonparametric hypotheses for large populations and for estimating the parameters of data models. The following problems are solved to reach these objectives: 1. To create an efficient data partitioning algorithm of large dimensional data. 2. To apply the data partitioning algorithm of large dimensional data in testing nonparametric hypotheses. 3. To apply the empirical Bayes method in testing the independence of components of large dimensional data vectors. 4. To develop an algorithm for estimating probabilities of rare events in large populations, using the empirical Bayes method and comparing Poisson-gamma and Poisson-Gaussian mathematical models, by selecting an optimal model and a respective empirical Bayes estimator. 5. To create an algorithm for logistic regression of rare events using the empirical Bayes method. The results obtained enables us to perform very fast and efficient partitioning of large dimensional data; testing the independence of selected components of large dimensional data; selecting the optimal model in the estimation of probabilities of rare events, using the Poisson-gamma and Poisson-Gaussian mathematical models and empirical Bayes estimators. The nonsingularity condition in the case of the Poisson-gamma model is presented. / Darbo tyrimų objektas yra duomenų tyrybos empiriniai Bajeso metodai ir algoritmai, taikomi didelio matavimų skaičiaus didelių populiacijų duomenų analizei. Darbo tyrimų tikslas yra sudaryti metodus ir algoritmus didelių populiacijų neparametrinių hipotezių tikrinimui ir duomenų modelių parametrų vertinimui. Šiam tikslui pasiekti yra sprendžiami tokie uždaviniai: 1. Sudaryti didelio matavimo duomenų skaidymo algoritmą. 2. Pritaikyti didelio matavimo duomenų skaidymo algoritmą neparametrinėms hipotezėms tikrinti. 3. Pritaikyti empirinį Bajeso metodą daugiamačių duomenų komponenčių nepriklausomumo hipotezei tikrinti su skirtingais matematiniais modeliais, nustatant optimalų modelį ir atitinkamą empirinį Bajeso įvertinį. 4. Sudaryti didelių populiacijų retų įvykių dažnių vertinimo algoritmą panaudojant empirinį Bajeso metodą palyginant Puasono-gama ir Puasono-Gauso matematinius modelius. 5. Sudaryti retų įvykių logistinės regresijos algoritmą panaudojant empirinį Bajeso metodą. Darbo metu gauti nauji rezultatai įgalina atlikti didelio matavimo duomenų skaidymą; atlikti didelio matavimo nekoreliuotų duomenų pasirinktų komponenčių nepriklausomumo tikrinimą; parinkti didelių populiacijų retų įvykių optimalų modelį ir atitinkamą empirinį Bajeso įvertinį. Pateikta nesinguliarumo sąlyga Puasono-gama modelio atveju.
|
290 |
Myoelectric Signal Processing for Prosthesis ControlHofmann, David 05 February 2014 (has links)
No description available.
|
Page generated in 0.066 seconds