• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 92413
  • 58247
  • 33330
  • 15513
  • 5695
  • 3705
  • 1283
  • 1215
  • 1101
  • 1089
  • 1031
  • 967
  • 893
  • 710
  • Tagged with
  • 8973
  • 7954
  • 7348
  • 7104
  • 6420
  • 6143
  • 5758
  • 5194
  • 5036
  • 4587
  • 4492
  • 4392
  • 4209
  • 3533
  • 3482
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
201

Testing For Normality of Censored Data

Andersson, Johan, Burberg, Mats January 2015 (has links)
In order to make statistical inference, that is drawing conclusions from a sample to describe a population, it is crucial to know the correct distribution of the data. This paper focused on censored data from the normal distribution. The purpose of this paper was to answer whether we can test if data comes from a censored normal distribution. This by using normality tests and tests designed for censored data and investigate if we got correct size of these tests. This has been carried out with simulations in the program R for left censored data. The results indicated that with increasing censoring normality tests failed to accept normality in a sample. On the other hand the censoring tests met the requirements with increasing censoring level, which was the most important conclusion in this paper.
202

Ancient DNA studies : of the Asiatic Eskimo site Ekven

Homeister, Anne January 2012 (has links)
Den här uppsatsen behandlar gammal DNA från 32 människor från den prehistoriska byn Ekwen belägen in nordöst Asien. Proverna har blivit masskopierade med hjälp av PCR och sekvenserad med FLX pyrosekvensering. Autentiska sekvenser har blivit bedömt genom användningen av PhyloNet och c-statistik och senare anpassad och jämförd med en referens sekvens (CRS). Tydliga C-T, T-C och A-G skador har upptäckts vid nukleotidpositioner vilket visar sig vara utmärkande för just den här populationen.
203

Chi-Square Orthogonal Components for Assessing Goodness-of-fit of Multidimensional Multinomial Data

January 2011 (has links)
abstract: It is common in the analysis of data to provide a goodness-of-fit test to assess the performance of a model. In the analysis of contingency tables, goodness-of-fit statistics are frequently employed when modeling social science, educational or psychological data where the interest is often directed at investigating the association among multi-categorical variables. Pearson's chi-squared statistic is well-known in goodness-of-fit testing, but it is sometimes considered to produce an omnibus test as it gives little guidance to the source of poor fit once the null hypothesis is rejected. However, its components can provide powerful directional tests. In this dissertation, orthogonal components are used to develop goodness-of-fit tests for models fit to the counts obtained from the cross-classification of multi-category dependent variables. Ordinal categories are assumed. Orthogonal components defined on marginals are obtained when analyzing multi-dimensional contingency tables through the use of the QR decomposition. A subset of these orthogonal components can be used to construct limited-information tests that allow one to identify the source of lack-of-fit and provide an increase in power compared to Pearson's test. These tests can address the adverse effects presented when data are sparse. The tests rely on the set of first- and second-order marginals jointly, the set of second-order marginals only, and the random forest method, a popular algorithm for modeling large complex data sets. The performance of these tests is compared to the likelihood ratio test as well as to tests based on orthogonal polynomial components. The derived goodness-of-fit tests are evaluated with studies for detecting two- and three-way associations that are not accounted for by a categorical variable factor model with a single latent variable. In addition the tests are used to investigate the case when the model misspecification involves parameter constraints for large and sparse contingency tables. The methodology proposed here is applied to data from the 38th round of the State Survey conducted by the Institute for Public Policy and Michigan State University Social Research (2005) . The results illustrate the use of the proposed techniques in the context of a sparse data set. / Dissertation/Thesis / Ph.D. Mathematics 2011
204

Spatial Statistics and Its Applications in Biostatistics and Environmental Statistics

Unknown Date (has links)
This dissertation presents some topics in spatial statistics and their application in biostatistics and environmental statistics. The field of spatial statistics is an energetic area in statistics. In Chapter 2 and Chapter 3, the goal is to build subregion models under the assumption that the responses or the parameters are spatially correlated. For regression models, considering spatially varying coecients is a reasonable way to build subregion models. There are two different techniques for exploring spatially varying coecients. One is geographically weighted regression (Brunsdon et al. 1998). The other is a spatially varying coecients model which assumes a stationary Gaussian process for the regression coecients (Gelfand et al. 2003). Based on the ideas of these two techniques, we introduce techniques for exploring subregion models in survival analysis which is an important area of biostatistics. In Chapter 2, we introduce modied versions of the Kaplan-Meier and Nelson-Aalen estimators which incorporate geographical weighting. We use ideas from counting process theory to obtain these modied estimators, to derive variance estimates, and to develop associated hypothesis tests. In Chapter 3, we introduce a Bayesian parametric accelerated failure time model with spatially varying coefficients. These two techniques can explore subregion models in survival analysis using both nonparametric and parametric approaches. In Chapter 4, we introduce Bayesian parametric covariance regression analysis for a response vector. The proposed method denes a regression model between the covariance matrix of a p-dimensional response vector and auxiliary variables. We propose a constrained Metropolis-Hastings algorithm to get the estimates. Simulation results are presented to show performance of both regression and covariance matrix estimates. Furthermore, we have a more realistic simulation experiment in which our Bayesian approach has better performance than the MLE. Finally, we illustrate the usefulness of our model by applying it to the Google Flu data. In Chapter 5, we give a brief summary of future work. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Fall Semester 2017. / November 9, 2017. / Biostatistics, Environment Statistics, Spatial Statistics / Includes bibliographical references. / Fred Huffer, Professor Directing Dissertation; Insu Paek, University Representative; Debajyoti Sinha, Committee Member; Elizabeth Slate, Committee Member; Jonathan Bradley, Committee Member.
205

Statistical modeling and statistical learning for disease prediction and classification

Chen, Tianle January 2014 (has links)
This dissertation studies prediction and classification models for disease risk through semiparametric modeling and statistical learning. It consists of three parts. In the first part, we propose several survival models to analyze the Cooperative Huntington's Observational Research Trial (COHORT) study data accounting for the missing mutation status in relative participants (Kieburtz and Huntington Study Group, 1996a). Huntington's disease (HD) is a progressive neurodegenerative disorder caused by an expansion of cytosine-adenine-guanine (CAG) repeats at the IT15 gene. A CAG repeat number greater than or equal to 36 is defined as carrying the mutation and carriers will eventually show symptoms if not censored by other events. There is an inverse relationship between the age-at-onset of HD and the CAG repeat length; the greater the CAG expansion, the earlier the age-at-onset. Accurate estimation of age-at-onset based on CAG repeat length is important for genetic counseling and the design of clinical trials for HD. Participants in COHORT (denoted as probands) undergo a genetic test and their CAG repeat number is determined. Family members of the probands do not undergo the genetic test and their HD onset information is provided by probands. Several methods are proposed in the literature to model the age specific cumulative distribution function (CDF) of HD onset as a function of the CAG repeat length. However, none of the existing methods can be directly used to analyze COHORT proband and family data because family members' mutation status is not always known. In this work, we treat the presence or absence of an expanded CAG repeat in first-degree family members as missing data and use the expectation-maximization (EM) algorithm to carry out the maximum likelihood estimation of the COHORT proband and family data jointly. We perform simulation studies to examine finite sample performance of the proposed methods and apply these methods to estimate the CDF of HD age-at-onset from the COHORT proband and family combined data. Our results show a slightly lower estimated cumulative risk of HD with the combined data compared to using proband data alone. We then extend the approach to predict the cumulative risk of disease accommodating predictors with time-varying effects and outcomes subject to censoring. We model the time-specific effect through a nonparametric varying-coefficient function and handle censoring through self-consistency equations that redistribute the probability mass of censored outcomes to the right. The computational procedure is extremely convenient and can be implemented by standard software. We prove large sample properties of the proposed estimator and evaluate its finite sample performance through simulation studies. We apply the method to estimate the cumulative risk of developing HD from the mutation carriers in COHORT data and illustrate an inverse relationship between the cumulative risk of HD and the length of CAG repeats at the IT15 gene. In the second part of the dissertation, we develop methods to accurately predict whether pre-symptomatic individuals are at risk of a disease based on their various marker profiles, which offers an opportunity for early intervention well before definitive clinical diagnosis. For many diseases, existing clinical literature may suggest the risk of disease varies with some markers of biological and etiological importance, for example age. To identify effective prediction rules using nonparametric decision functions, standard statistical learning approaches treat markers with clear biological importance (e.g., age) and other markers without prior knowledge on disease etiology interchangeably as input variables. Therefore, these approaches may be inadequate in singling out and preserving the effects from the biologically important variables, especially in the presence of potential noise markers. Using age as an example of a salient marker to receive special care in the analysis, we propose a local smoothing large margin classifier implemented with support vector machine to construct effective age-dependent classification rules. The method adaptively adjusts age effect and separately tunes age and other markers to achieve optimal performance. We derive the asymptotic risk bound of the local smoothing support vector machine, and perform extensive simulation studies to compare with standard approaches. We apply the proposed method to two studies of premanifest HD subjects and controls to construct age-sensitive predictive scores for the risk of HD and risk of receiving HD diagnosis during the study period. In the third part of the dissertation, we develop a novel statistical learning method for longitudinal data. Predicting disease risk and progression is one of the main goals in many clinical studies. Cohort studies on the natural history and etiology of chronic diseases span years and data are collected at multiple visits. Although kernel-based statistical learning methods are proven to be powerful for a wide range of disease prediction problems, these methods are only well studied for independent data but not for longitudinal data. It is thus important to develop time-sensitive prediction rules that make use of the longitudinal nature of the data. We develop a statistical learning method for longitudinal data by introducing subject-specific long-term and short-term latent effects through designed kernels to account for within-subject correlation of longitudinal measurements. Since the presence of multiple sources of data is increasingly common, we embed our method in a multiple kernel learning framework and propose a regularized multiple kernel statistical learning with random effects to construct effective nonparametric prediction rules. Our method allows easy integration of various heterogeneous data sources and takes advantage of correlation among longitudinal measures to increase prediction power. We use different kernels for each data source taking advantage of distinctive feature of data modality, and then optimally combine data across modalities. We apply the developed methods to two large epidemiological studies, one on Huntington's disease and the other on Alzhemeier's Disease (Alzhemeier's Disease Neuroimaging Initiative, ADNI) where we explore a unique opportunity to combine imaging and genetic data to predict the conversion from mild cognitive impairment to dementia, and show a substantial gain in performance while accounting for the longitudinal feature of data.
206

Contributions to statistical learning and statistical quantification in nanomaterials

Deng, Xinwei. January 2009 (has links)
Thesis (Ph.D)--Industrial and Systems Engineering, Georgia Institute of Technology, 2009. / Committee Chair: Wu, C. F. Jeff; Committee Co-Chair: Yuan, Ming; Committee Member: Huo, Xiaoming; Committee Member: Vengazhiyil, Roshan Joseph; Committee Member: Wang, Zhonglin. Part of the SMARTech Electronic Thesis and Dissertation Collection.
207

Accurate statistical circuit simulation in the presence of statistical variability

Asenov, Plamen January 2013 (has links)
Semiconductor device performance variation due to the granular nature of charge and matter has become a key problem in the semiconductor industry. The main sources of this ‘statistical’ variability include random discrete dopants (RDD), line edge roughness (LER) and metal gate granularity (MGG). These variability sources have been studied extensively, however a methodology has not been developed to accurately represent this variability at a circuit and system level. In order to accurately represent statistical variability in real devices the GSS simulation toolchain was utilised to simulate 10,000 20/22nm n- and p-channel transistors including RDD, LER and MGG variability sources. A statistical compact modelling methodology was developed which accurately captured the behaviour of the simulated transistors, and produced compact model parameter distributions suitable for advanced compact model generation strategies like PCA and NPM. The resultant compact model libraries were then utilised to evaluate the impact of statistical variability on SRAM design, and to quantitatively evaluate the difference between accurate compact model generation using NPM with the Gaussian VT methodology. Over 5 million dynamic write simulations were performed, and showed that at advanced technology nodes, statistical variability cannot be accurately represented using Gaussian VT . The results also show that accurate modelling techniques can help reduced design margins by elimiating some of the pessimism of standard variability modelling approaches.
208

Composite strength statistics from fiber strength statistics.

Johnson, Eric P. January 1991 (has links)
Utilization of composites in critical design applications requires an extensive engineering experience data base which is generally lacking, especially for rapidly developing constituent fibers. As a supplement, an accurate reliability theory can be applied in design. This investigation is a part of a research effort to develop a probabilistic model of composite reliability capable of using data produced in small laboratory test samples to predict the behavior of large structures with respect to their actual dimensions. This work included testing of composite strength which was then used in exploring the methodology of predicting composite reliability from the parent single filament fiber strength statistics. This required testing of a coordinate set of test samples which consisted of a composite and its parent fibers. Previously collected fiber strength statistics from two different production spools were used in conjunction with the current effort. This investigation established that, for a well made composite, the Local Load Sharing Model of reliability prediction exhibited outstanding correlation with experimental data and was sufficiently sensitive to predict deficient composite strength due to a specific fiber spool with an abnormally weak lower tail. In addition, it provided an upper bound on the composite reliability. This investigation is unique in that is used a coordinate set of data with an unambiguous genesis of parent fiber and subsequent composite. The findings of this investigation are also definitive in that six orders of extrapolation of size in reliability prediction has been verified
209

Contributions to statistical learning and statistical quantification in nanomaterials

Deng, Xinwei 22 June 2009 (has links)
This research focuses to develop some new techniques on statistical learning including methodology, computation and application. We also developed statistical quantification in nanomaterials. For a large number of random variables with temporal or spatial structures, we proposed shrink estimates of covariance matrix to account their Markov structures. The proposed method exploits the sparsity in the inverse covariance matrix in a systematic fashion. To deal with high dimensional data, we proposed a robust kernel principal component analysis for dimension reduction, which can extract the nonlinear structure of high dimension data more robustly. To build a prediction model more efficiently, we developed an active learning via sequential design to actively select the data points into the training set. By combining the stochastic approximation and D-optimal designs, the proposed method can build model with minimal time and effort. We also proposed factor logit-models with a large number of categories for classification. We show that the convergence rate of the classifier functions estimated from the proposed factor model does not rely on the number of categories, but only on the number of factors. It therefore can achieve better classification accuracy. For the statistical nano-quantification, a statistical approach is presented to quantify the elastic deformation of nanomaterials. We proposed a new statistical modeling technique, called sequential profile adjustment by regression (SPAR), to account for and eliminate the various experimental errors and artifacts. SPAR can automatically detect and remove the systematic errors and therefore gives more precise estimation of the elastic modulus.
210

Topics in the foundations of statistical inference and statistical mechanics /

Guszcza, James. January 2000 (has links)
Thesis (Ph. D.)--University of Chicago, Dept. of Philosophy. / Includes bibliographical references. Also available on the Internet.

Page generated in 0.1852 seconds