Global ETD Search

91	Statistical model selection techniques for data analysis Stark, J. Alex January 1995 (has links) No description available. 519.5
92	Inferring Demographic History of Admixed Human Populations with SNP Array Data Quinto Cortes, Consuelo Dayzu, Quinto Cortes, Consuelo Dayzu January 2016 (has links) The demographic history of human populations, both archaic and modern, have been the focus of extensive research. Earlier studies were based on a small number of genetic markers but technological advances have made possible the examination of data at the genome scale to answer important questions regarding the history of our species. A widely used application of single nucleotide polymorphisms (SNPs) are genotyping arrays that allow the study of several hundred thousand of these sites at the same time. However, most of the SNPs present in commercial genotyping arrays have often been discovered by sampling a small number of chromosomes from a group of selected populations. This form of non-random discovery skews patterns of nucleotide diversity and can affect population genetic inferences. Although different methods have been proposed to take into account this ascertainment bias, the challenge remains because the exact discovery protocols are not known for most of the commercial arrays. In this dissertation, I propose a demographic inference pipeline that explicitly models the underlying SNP discovery and I implement this methodology in specific examples of admixture in human populations when only SNP array data are available. In the first chapter, I describe the developed pipeline and applied it to a known example of recent population admixture in Mexico. The inferred time of admixture between Iberian and Native American populations that gave rise to admixed Mexicans was in line with historical records, as opposed to previous published underestimates. Next, I examined different demographic models on the first human settlement in Easter Island and determined that the island of Mangareva is the most likely point of origin for this migration. Finally, I investigated the dynamics of the admixture process between the ancestral Jomon and Yayoi populations in different locations across Japan. The estimates of the time of this encounter were closer to dates inferred from anthropological data, in contrast with past works. The results show that the proposed framework corrects ascertainment bias to improve inference in cases when only SNP chip data are available, and for genotype data originated from different platforms. Genetics Inference Population SNP Array Genetics Demography
93	Propensity Score for Causal Inference of Multiple and Multivalued Treatments Gu, Zirui 01 January 2016 (has links) Propensity score methods (PSM) that have been widely used to reduce selection bias in observational studies are restricted to a binary treatment. Imai and van Dyk extended PSM to estimate non-binary treatment effect using stratification with P-Function, and generalized inverse treatment probability weighting (GIPTW). However, propensity score (PS) matching methods on multiple treatments received little attention, and existing generalized PSMs merely focused on estimates of main treatment effects but omitted potential interaction effects that are of essential interest in many studies. In this dissertation, I extend Rubin’s PS matching theory to general treatment regimens under the P-Function framework. From theory to practice, I propose an innovative distance measure that can summarize similarities among subjects in multiple treatment groups. Based on this distance measure I propose four generalized propensity score matching methodologies. The first two methods are extensions of nearest neighbor matching. I implemented Monte Carlo simulation studies to compare them with GIPTW and stratification on P-Function methods. The next two methods are extensions of the nearest neighbor caliper width matching and variable matching. I define the caliper width as the product of a weighted standard deviation of all possible pairwise distances between two treatment groups. I conduct a series of simulation studies to determine an optimal caliper width by searching the lowest mean square error of average causal interaction effect. I further compare the ones with optimal caliper width with other methods using simulations. Finally, I apply these methods to the National Medical Expenditure Survey data to examine the average causal main effect of duration and frequency of smoking as well as their interaction effect on annual medical expenditures. Using proposed methods, researchers can apply regression models with specified interaction terms to the matched data and simultaneously obtain both main and interaction effects estimate with improved statistical properties. Propensity Score Causal Inference Multiple and Multivalued Treatments
94	Učení jazykových obrázků pomocí restartovacích automatů / Learning picture languages using restarting automata Krtek, Lukáš January 2014 (has links) There are many existing models of automata working on two-dimensional inputs (pictures), though very little work has been done on the subject of learning of these automata. In this thesis, we introduce a new model called two-dimensional limited context restarting automaton. Our model works similarly as the two-dimensional restarting tiling automaton, yet we show that it is equally powerful as the two-dimensional sgraffito automaton. We propose an algorithm for learning of such automata from positive and negative samples of pictures. The algorithm is implemented and subsequently tested with several basic picture languages. Powered by TCPDF (www.tcpdf.org)
95	Inference of XML Integrity Constraints / Inference of XML Integrity Constraints Vitásek, Matej January 2012 (has links) In this work we expand upon the previous efforts to infer schema information from existing XML documents. We find the inference of structure to be sufficiently researched and focus further on integrity constraints. After briefly introducing some of them we turn our attention to ID/IDREF/IDREFS attributes in DTD. Building on the research by Barbosa and Menelzon (2003) we introduce a heuristic approach to the problem of finding an optimal ID set. The approach is evaluated and tuned in a wide range of experiments.
96	The statistical analysis of complex sampling data Paulse, Bradley January 2018 (has links) >Magister Scientiae - MSc / Most standard statistical techniques illustrated in text books assume that the data are collected from a simple random sample (SRS) and hence are independently and identically distributed (i.i.d.). In reality, data are often sourced through complex sampling (CS) designs, with a combination of stratification and clustering at different levels of the design. Consequently, the CS data are not i.i.d. and sampling weights that are developed over different stages, are calculated and included in the analysis of this data to account for the sampling design. Logistic regression is often employed in the modelling of survey data since the response under investigation typically has a dichotomous outcome. Furthermore, since the logistic regression model has no homogeneity or normality assumptions, it is appealing when modelling a dichotomous response from survey data. This research considers the comparison of the estimates of the logistic regression model parameters when the CS design is accounted for, i.e. weighting is present, to when the data are modelled using an SRS design, i.e. no weighting. In addition, the standard errors of the estimators will be obtained using three different variance techniques, viz. Taylor series linearization, the jackknife and the bootstrap. The different estimated standard errors will be used in the calculation of the standard (asymptotic) interval which will be compared to the bootstrap percentile interval in terms of the interval coverage probability. A further level of comparison is obtained when using only design weights to those obtained using calibrated and integrated sampling weights. This simulation study is based on the Income and Expenditure Survey (IES) of 2005/2006. The results showed that generally when weighting was used the estimators performed better as opposed to when the design was ignored, i.e. under the assumption of SRS, with the results for the Taylor series linearization being more stable. Complex sampling Inference Weighting Survey data Resampling
97	Metanálise para Modelos de Regressão / Meta-analysis for Regression Models Santos, Laryssa Vieira dos 28 October 2016 (has links) A metanálise tem sido amplamente utilizada em estudos médicos especialmente em revisões sistemáticas de ensaios clínicos aleatorizados. Para modelos de regressão a técnica ainda é muito escassa e limitada. Geralmente trata-se apenas de uma medida baseada nas médias de estimativas pontuais dos diferentes estudos, perdendo-se muita informação dos dados originais. Atualmente torna-se cada vez mais fundamental o uso da metanálise para sumarizar estudos de mesmo objetivo, em razão do avanço da ciência e o desejo de usar o menor número de seres humanos em ensaios clínicos. Utilizando uma medida metanalítica Bayesiana, o objetivo é propor um método genérico e eficiente para realizar metanálise em modelos de regressão. / Meta analysis has been widely used in medical studies especially in systematic reviews of randomized clinical trials. For regression models the technique is still very scarce and limited. Usually it is just a measure based on the average point estimates of dierent studies, losing a lot of information of the original data. Currently it becomes increasingly important to use the meta-analysis to summarize the same objective studies, due to the advancement of science and the desire to use the smallest number of human subjects in clinical trials. Using a meta-analytic Bayesian measure, the objective is to propose a generic and ecient method to perform meta-analysis in regression models. Bayesian inference Inferência Bayesiana Meta-Analysis Metanálise
98	Supervised learning for back analysis of excavations in the observational method Jin, Yingyan January 2018 (has links) In the past few decades, demand for construction in underground spaces has increased dramatically in urban areas with high population densities. However, the impact of the construction of underground structures on surrounding infrastructure raises concerns since movements caused by deep excavations might damage adjacent buildings. Unfortunately, the prediction of geotechnical behaviour is difficult due to uncertainties and lack of information of on the underground environment. Therefore, to ensure safety, engineers tend to choose very conservative designs that result in requiring unnecessary material and longer construction time. The observational method, which was proposed by Peck in 1969, and formalised in Eurocode 7 in 1987, provides a way to avoid such redundancy by modifying the design based on the knowledge gathered during construction. The review process within the observational method is recognised as back analysis. Supervised learning can aid in this process, providing a systematic procedure to assess soil parameters based on monitoring data and prediction of the ground response. A probabilistic model is developed in this research to account for the uncertainties in the problem. Sequential Bayesian inference is used to update the soil parameters at each excavation stage when observations are available. The accuracy of the prediction for future stages improves at each stage. Meanwhile, the uncertainty contained in the prediction decreases, and therefore the confidence on the corresponding design also increases. Moreover, the Bayesian method integrates subjective engineering experience and objective observations in a rational and quantitative way, which enables the model to update soil parameters even when the amount of data is very limited. It also allows the use of the knowledge learnt from comparable ground conditions, which is particularly useful in the absence of site-specific information on ground conditions. Four probabilistic models are developed in this research. The first two incorporate empirical excavation design methods. These simple models are used to examine the practicality of the approach with several cases. The next two are coupled with a program called FREW, which is able to simulate the excavation process, still in a relatively simplistic way. The baseline model with simple assumptions on model error and another one is a more sophisticated model considering measurement error and spatial relationships among the observations. Their efficiency and accuracy are verified using a synthetic case and tested based on a case history from the London Crossrail project. In the end, the models are compared and their flexibility in different cases is discussed.
99	Approximation methods and inference for stochastic biochemical kinetics Schnoerr, David Benjamin January 2016 (has links) Recent experiments have shown the fundamental role that random fluctuations play in many chemical systems in living cells, such as gene regulatory networks. Mathematical models are thus indispensable to describe such systems and to extract relevant biological information from experimental data. Recent decades have seen a considerable amount of modelling effort devoted to this task. However, current methodologies still present outstanding mathematical and computational hurdles. In particular, models which retain the discrete nature of particle numbers incur necessarily severe computational overheads, greatly complicating the tasks of characterising statistically the noise in cells and inferring parameters from data. In this thesis we study analytical approximations and inference methods for stochastic reaction dynamics. The chemical master equation is the accepted description of stochastic chemical reaction networks whenever spatial effects can be ignored. Unfortunately, for most systems no analytic solutions are known and stochastic simulations are computationally expensive, making analytic approximations appealing alternatives. In the case where spatial effects cannot be ignored, such systems are typically modelled by means of stochastic reaction-diffusion processes. As in the non-spatial case an analytic treatment is rarely possible and simulations quickly become infeasible. In particular, the calibration of models to data constitutes a fundamental unsolved problem. In the first part of this thesis we study two approximation methods of the chemical master equation; the chemical Langevin equation and moment closure approximations. The chemical Langevin equation approximates the discrete-valued process described by the chemical master equation by a continuous diffusion process. Despite being frequently used in the literature, it remains unclear how the boundary conditions behave under this transition from discrete to continuous variables. We show that this boundary problem results in the chemical Langevin equation being mathematically ill-defined if defined in real space due to the occurrence of square roots of negative expressions. We show that this problem can be avoided by extending the state space from real to complex variables. We prove that this approach gives rise to real-valued moments and thus admits a probabilistic interpretation. Numerical examples demonstrate better accuracy of the developed complex chemical Langevin equation than various real-valued implementations proposed in the literature. Moment closure approximations aim at directly approximating the moments of a process, rather then its distribution. The chemical master equation gives rise to an infinite system of ordinary differential equations for the moments of a process. Moment closure approximations close this infinite hierarchy of equations by expressing moments above a certain order in terms of lower order moments. This is an ad hoc approximation without any systematic justification, and the question arises if the resulting equations always lead to physically meaningful results. We find that this is indeed not always the case. Rather, moment closure approximations may give rise to diverging time trajectories or otherwise unphysical behaviour, such as negative mean values or unphysical oscillations. They thus fail to admit a probabilistic interpretation in these cases, and care is needed when using them to not draw wrong conclusions. In the second part of this work we consider systems where spatial effects have to be taken into account. In general, such stochastic reaction-diffusion processes are only defined in an algorithmic sense without any analytic description, and it is hence not even conceptually clear how to define likelihoods for experimental data for such processes. Calibration of such models to experimental data thus constitutes a highly non-trivial task. We derive here a novel inference method by establishing a basic relationship between stochastic reaction-diffusion processes and spatio-temporal Cox processes, two classes of models that were considered to be distinct to each other to this date. This novel connection naturally allows to compute approximate likelihoods and thus to perform inference tasks for stochastic reaction-diffusion processes. The accuracy and efficiency of this approach is demonstrated by means of several examples. Overall, this thesis advances the state of the art of modelling methods for stochastic reaction systems. It advances the understanding of several existing methods by elucidating fundamental limitations of these methods, and several novel approximation and inference methods are developed. 519.2
100	Some results on familywise robustness for multiple comparison procedures. January 2005 (has links) Chan Ka Man. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2005. / Includes bibliographical references (leaves 46-48). / Abstracts in English and Chinese. / Abstract --- p.i / Acknowledgement --- p.iii / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Multiple comparison procedures and their applications --- p.1 / Chapter 1.2 --- Different types of error control --- p.3 / Chapter 1.3 --- Single-step and stepwise procedures --- p.5 / Chapter 1.4 --- From familywise error rate control to false discovery rate control --- p.8 / Chapter 1.5 --- The FDR procedure of BH --- p.10 / Chapter 1.6 --- Application of the FDR procedure --- p.11 / Chapter 1.7 --- Family size and family size robustness --- p.16 / Chapter 1.8 --- Objectives of the thesis --- p.17 / Chapter 2 --- The Familywise Robustness Criteria --- p.18 / Chapter 2.1 --- The basic idea of familywise robustness --- p.18 / Chapter 2.2 --- Definitions and notations --- p.19 / Chapter 2.3 --- The measurement of robustness to changing family size --- p.21 / Chapter 2.4 --- Main Theorems --- p.21 / Chapter 2.5 --- Example --- p.23 / Chapter 2.6 --- Summary --- p.24 / Chapter 3 --- FDR and FWR --- p.26 / Chapter 3.1 --- Positive false discovery rate --- p.26 / Chapter 3.2 --- A unified approach to FDR --- p.29 / Chapter 3.3 --- The S procedure --- p.30 / Chapter 3.4 --- Family wise robustness criteria and the S procedure --- p.32 / Chapter 4 --- Simulation Study --- p.41 / Chapter 4.1 --- The setup --- p.41 / Chapter 4.2 --- Simulation result --- p.43 / Chapter 4.3 --- Conclusions --- p.44 / Bibliography --- p.46 Multiple comparisons (Statistics) Error analysis (Mathematics) Inference

Search results