• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 6867
  • 727
  • 652
  • 593
  • 427
  • 427
  • 427
  • 427
  • 427
  • 424
  • 342
  • 133
  • 119
  • 111
  • 108
  • Tagged with
  • 13129
  • 2380
  • 2254
  • 2048
  • 1772
  • 1657
  • 1447
  • 1199
  • 1066
  • 904
  • 858
  • 776
  • 760
  • 741
  • 739
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
541

Elastic Shape Analysis of RNAs and Proteins

Unknown Date (has links)
Proteins and RNAs are molecular machines performing biological functions in the cells of all organisms. Automatic comparison and classification of these biomolecules are fundamental yet open problems in the field of Structural Bioinformatics. An outstanding unsolved issue is the definition and efficient computation of a formal distance between any two biomolecules. Current methods use alignment scores, which are not proper distances, to derive statistical tests for comparison and classifications. This work applies Elastic Shape Analysis (ESA), a method recently developed in computer vision, to construct rigorous mathematical and statistical frameworks for the comparison, clustering and classification of proteins and RNAs. ESA treats bio molecular structures as 3D parameterized curves, which are represented with a special map called the square root velocity function (SRVF). In the resulting shape space of elastic curves, one can perform statistical analysis of curves as if they were random variables. One can compare, match and deform one curve into another, or as well as compute averages and covariances of curve populations, and perform hypothesis testing and classification of curves according to their shapes. We have successfully applied ESA to the comparison and classification of protein and RNA structures. We further extend the ESA framework to incorporate additional non-geometric information that tags the shape of the molecules (namely, the sequence of nucleotide/amino-acid letters for RNAs/proteins and, in the latter case, also the labels for the so-called secondary structure). The biological representation is chosen such that the ESA framework continues to be mathematically formal. We have achieved superior classification of RNA functions compared to state-of-the-art methods on benchmark RNA datasets which has led to the publication of this work in the journal, Nucleic Acids Research (NAR). Based on the ESA distances, we have also developed a fast method to classify protein domains by using a representative set of protein structures generated by a clustering-based technique we call Multiple Centroid Class Partitioning (MCCP). Comparison with other standard approaches showed that MCCP significantly improves the accuracy while keeping the representative set smaller than the other methods. The current schemes for the classification and organization of proteins (such as SCOP and CATH) assume a discrete space of their structures, where a protein is classified into one and only one class in a hierarchical tree structure. Our recent study, and studies by other researchers, showed that the protein structure space is more continuous than discrete. To capture the complex but quantifiable continuous nature of protein structures, we propose to organize these molecules using a network model, where individual proteins are mapped to possibly multiple nodes of classes, each associated with a probability. Structural classes will then be connected to form a network based on overlaps of corresponding probability distributions in the structural space. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Fall Semester, 2013. / November 1, 2013. / Clustering of biomolecules, Elastic Shape Analysis, Geodesic Distance, Protein classification, RNA function prediction / Includes bibliographical references. / Anuj Srivastava, Professor Directing Dissertation; Jinfeng Zhang, Professor Directing Dissertation; Eric Klassen, University Representative; Daniel McGee, Committee Member.
542

Failure Time Regression Models for Thinned Point Processes

Unknown Date (has links)
In survival analysis, data on the time until a specific criterion event (or "endpoint") occurs are analyzed, often with regard to the effects of various predictors. In the classic applications, the criterion event is in some sense a terminal event, e.g., death of a person or failure of a machine or machine component. In these situations, the analysis requires assumptions only about the distribution of waiting times until the criterion event occurs and the nature of the effects of the predictors on that distribution. Suppose that the criterion event isn't a terminal event that can only occur once, but is a repeatable event. The sequence of events forms a stochastic {it point process}. Further suppose that only some of the events are detected (observed); the detected events form a thinned point process. Any failure time model based on the data will be based not on the time until the first occurrence, but on the time until the first detected occurrence of the event. The implications of estimating survival regression models from such incomplete data will be analyzed. It will be shown that the effect of thinning on regression parameters depends on the combination of the type of regression model, the type of point process that generates the events, and the thinning mechanism. For some combinations, the effect of a predictor will be the same for time to the first event and the time to the first detected event. For other combinations, the regression effect will be changed as a result of the incomplete detection. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Fall Semester, 2013. / September 27, 2013. / Regression Models, Survival Analysis, Thinned Point Processes / Includes bibliographical references. / Fred G. Huffer, Professor Directing Dissertation; Warren Nichols, University Representative; Dan McGee, Committee Member; Debajyoti Sinha, Committee Member.
543

The Studies of Joint Structure Sparsity Pursuit in the Applications of Hierarchical Variable Selection and Fused Lasso

Unknown Date (has links)
In this dissertation, we study joint sparsity pursuit and its applications in variable selection in high dimensional data. The first part of dissertation focuses on hierarchical variable selection and its application in a two-way interaction model. In high-dimensional models that involve interactions, statisticians usually favor variable selection obeying certain logical hierarchical constraints. The first part of this paper focuses on structural hierarchy which means that the existence of an interaction term implies that at least one or both associated main effects must be present. Lately this problem has attracted a lot of attentions from statisticians, but existing computational algorithms converge slow and cannot meet the challenge of big data computation. More importantly, theoretical studies of hierarchical variable selection are extremely scarce, largely due to the difficulty that multiple sparsity-promoting penalties are enforced on the same subject. This work investigates a new type of estimator based on group multi-regularization to capture various types of structural parsimony simultaneously. In this work, we present non-asymptotic results based on combined statistical and computational analysis, and reveal the minimax optimal rate. A general-purpose algorithm is developed with a theoretical guarantee of strict iterate convergence and global optimality. Simulations and real data experiments demonstrate the efficiency and efficacy of the proposed approach. The second topic studies Fused Lasso which pursues joint sparsity of both variables and their consecutive differences simultaneously. The overlapping penalties of Fused Lasso pose critical challenges to computation studies and theoretical analysis. Some theoretical analysis about fused lasso, however, is only performed under an orthogonal design and there is hardly any nonasymptotic study in the past literature. In this work, we study Fused Lasso and its application in a classification problem to achieve exact clustering. Computationally, we derive a simple-to-implement algorithm which scales well to big data computation; in theory, we propose a brand new technique and some nonasymptotic analysis are performed. To evaluate the prediction performance theoretically, we derived oracle inequality of Fused Lasso estimator to show the $ell_2$ prediction error rate. The minimax optimal rate is also revealed. For estimation accuracy, $ell_q (1leq q leq infty)$ norm error bound for fused lasso estimator is derived. The simulation studies shows that exact clustering can be achieved using post-thresholding technique. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Spring Semester, 2015. / April 17, 2015. / Fused Lasso, Hierarchical variable selection, Interaction, Minimax optimal rate, Non-asymptotic / Includes bibliographical references. / Yiyuan She, Professor Directing Dissertation; Giray Okten, University Representative; Adrian Barbu, Committee Member; Qing Mai, Committee Member.
544

Nonlinear Multivariate Tests for High-Dimensional Data Using Wavelets with Applications in Genomics and Engineering

Unknown Date (has links)
Gaussian processes are not uncommon in various fields of science such as engineering, genomics, quantitative finance and astronomy, to name a few. In fact, such processes are special cases in a broader class of data known as functional data. When the underlying mean response of a process is a function, the resulting data from these processes are functional responses and specialized statistical tools are required in their analysis. The methodology discussed in this work offers non-parametric tests that can detect differences in such data with greater power and good control of Type-I error over existing methods. The incorporation of Wavelet Transforms makes the test an efficient approach due to its de-correlation properties. These tests are designed primarily to handle functional responses from multiple treatments simultaneously and generally are extensible to high dimensional data. The sparseness introduced by Wavelet Transforms is another advantage of this test when compared to traditional tests. In addition to offering a theoretical framework, several applications of such tests in the fields of engineering, genomics and quantitative finance are also discussed. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Spring Semester, 2014. / February 5, 2014. / array CGH, Fourier transforms, Ngs, Quality control, Quantitative finanace, Wavelet transforms / Includes bibliographical references. / Eric Chicken, Professor Directing Dissertation; Jinfeng Zhang, Professor Directing Dissertation; Jon Ahlquist, University Representative; Minjing Tao, Committee Member.
545

Practical Methods for Equivalence and Non-Inferiority Studies with Survival Response

Unknown Date (has links)
Determining the equivalence or non-inferiority of a new drug (test drug) with a existing treatment (reference drug) is an important topic of statistical interest. Wellek (1993) pioneered the way for log-rank based equivalence and non-inferiority testing by formulating a testing procedure using proportional hazards model (PHM) of Cox (1972). In many equivalence and non-inferiority trials, two hazards functions may converge to one rather than being proportional for all time-points. In this case, the proportional odds survival model (POSM) of Bennett (1983) will be more sufficient than a Cox's PHM assumption. We show in both cases, when the wrong modeling assumption is made and Cox's PH assumption is violated, the popular procedure of Wellek (1993) has an inflated type I error. On the contrary, our proposed POS model based equivalence and non-inferiority tests maintains the practitioners desired 5% level of significance regardless of the underlying modeling assumption (e.g. Cox,1972; Wellek, 1993). Furthermore for non-inferiority trials, we introduce a method to determine the optimal sample size required when a desired power and type I error is specified and the data follows the POSM of Bennett (1983). For both of the above trials, we present simulation studies showing the finite approximation of powers and type I error rates, when the under-lying modeling assumption are correctly specified and when the assumptions are misspecified. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Fall Semester, 2014. / November 07, 2014. / Bennett's Proportional Odds Model, Clinical trials, Cox's Proportional Hazards Model, Equivalence Studies, Non-Inferiority Studies, Survival Analysis / Includes bibliographical references. / Debajyoti Sinha, Professor Directing Dissertation; Cathy Levenson, University Representative; Eric Chicken, Committee Member; Stuart Lipsitz, Committee Member; Dan McGee, Committee Member.
546

Age Effects in the Extinction of Planktonic Foraminifera: A New Look at Van Valen's Red Queen Hypothesis

Unknown Date (has links)
Van Valen's Red Queen hypothesis states that within a homogeneous taxonomic group the age is statistically independent of the rate of extinction. The case of the Red Queen hypothesis being addressed here is when the homogeneous taxonomic group is a group of similar species. Since Van Valen's work, various statistical approaches have been used to address the relationship between taxon duration (age) and the rate of extinction. Some of the more recent approaches to this problem using Planktonic Foraminifera (Foram) extinction data include Weibull and Exponential modeling (Parker and Arnold, 1997), and Cox proportional hazards modeling (Doran et al. 2004,2006). I propose a general class of test statistics that can be used to test for the effect of age on extinction. These test statistics allow for a varying background rate of extinction and attempt to remove the effects of other covariates when assessing the effect of age on extinction. No model is assumed for the covariate effects. Instead I control for covariate effects by pairing or grouping together similar species. I use simulated data sets to compare the power of the statistics. In applying the test statistics to the Foram data, I have found age to have a positive effect on extinction. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Degree Awarded: Fall Semester, 2010. / Date of Defense: October 19, 2010. / Survival Analysis, Statistics, Red Queen Hypothesis, Planktonic Foraminifera, Extinction / Includes bibliographical references. / Fred Huffer, Professor Directing Dissertation; William Parker, University Representative; Eric Chicken, Committee Member; Debajyoti Sinha, Committee Member.
547

Inference for Semiparametric Time-Varying Covariate Effect Relative Risk Regression Models

Unknown Date (has links)
A major interest of survival analysis is to assess covariate effects on survival via appropriate conditional hazard function regression models. The Cox proportional hazards model, which assumes an exponential form for the relative risk, has been a popular choice. However, other regression forms such as Aalen's additive risk model may be more appropriate in some applications. In addition, covariate effects may depend on time, which can not be reflected by a Cox proportional hazards model. In this dissertation, we study a class of time-varying covariate effect regression models in which the link function (relative risk function) is a twice continuously differentiable and prespecified, but otherwise general given function. This is a natural extension of the Prentice-Self model, in which the link function is general but covariate effects are modelled to be time invariant. In the first part of the dissertation, we focus on estimating the cumulative or integrated covariate effects. The standard martingale approach based on counting processes is utilized to derive a likelihood-based iterating equation. An estimator for the cumulative covariate effect that is generated from the iterating equation is shown to be ¡Ìn-consistent. Asymptotic normality of the estimator is also demonstrated. Another aspect of the dissertation is to investigate a new test for the above time-varying covariate effect regression model and study consistency of the test based on martingale residuals. For Aalen's additive risk model, we introduce a test statistic based on the Huffer-McKeague weighted-least-squares estimator and show its consistency against some alternatives. An alternative way to construct a test statistic based on Bayesian Bootstrap simulation is introduced. An application to real lifetime data will be presented. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Degree Awarded: Spring Semester, 2005. / Date of Defense: December 3, 2004. / Time-varying Hazard Function Regression Model, Counting Process Method, Cox Model / Includes bibliographical references. / Ian W. McKeague, Professor Directing Dissertation; Xiaoming Wang, Outside Committee Member; Fred W. Huffer, Committee Member; Kai-Sheng Song, Committee Member.
548

Estimating distributions in the presence of measurement error with replicate values

Chen, Hsing-Me 01 January 1996 (has links)
In many applications, observations from some distribution of interest are "contaminated" with errors. In this thesis we examine estimation of the underlying distribution in the presence of such errors. The nonparametric maximum likelihood estimator (NPMLE) of the mixing distribution of interest has been studied by various authors when there are no nuisance parameters in the model. This thesis examines the existence, finite support, and weak convergence of the NPMLE under more general conditions than those previously studied. Attention is then given to a particular model in which, on a unit, either replicates or a mean value are assumed normally distributed with expected value equal to the true value of interest and some variance. A variety of models are allowed for the variances, which may be fixed or random. A full maximum likelihood method and various pseudo methods are examined for estimating the distribution when nuisance parameters are in the model. Asymptotic properties are developed for some special cases using finite mixtures and some approaches outlined for handling more general cases. Two bootstrap methods are discussed for making inferences on cumulative probabilities and percentiles from the distribution of interest. The methods are demonstrated with an example in which the distribution of beta carotene intake is estimated.
549

Non-Linear diffusion processes and applications

Pienaar, Etienne A D January 2016 (has links)
Diffusion models are useful tools for quantifying the dynamics of continuously evolving processes. Using diffusion models it is possible to formulate compact descriptions for the dynamics of real-world processes in terms of stochastic differential equations. Despite the exibility of these models, they can often be extremely difficult to work with. This is especially true for non-linear and/or time-inhomogeneous diffusion models where even basic statistical properties of the process can be elusive. As such, we explore various techniques for analysing non-linear diffusion models in contexts ranging from conducting inference under discrete observation and solving first passage time problems, to the analysis of jump diffusion processes and highly non-linear diffusion processes. We apply the methodology to a number of real-world ecological and financial problems of interest and demonstrate how non-linear diffusion models can be used to better understand such phenomena. In conjunction with the methodology, we develop a series of software packages that can be used to accurately and efficiently analyse various classes of non-linear diffusion models.
550

Parametric and Nonparametric Spherical Regression with Diffeomorphisms

Unknown Date (has links)
Spherical regression explores relationships between pairs of variables on spherical domains. Spherical data has become more prevalent in biological, gaming, geographical, and meteorological investigations, creating a need for tools that analyze such data. Previous works on spherical regression have focused on rigid parametric models or nonparametric kernel smoothing methods. This leaves a huge gap in the available tools with no intermediate options currently available. This work will develop two such intermediate models, one parametric using projective linear transformation and one nonparametric model using diffeomorphic maps from a sphere to itself. The models are estimated in a maximum-likelihood framework using gradient-based optimizations. For the parametric model, an efficient Newton-Raphson algorithm is derived and asymptotic analysis is developed. A first-order roughness penalty is specified for the nonparametric model using the Jacobian of diffeomorphisms. The prediction performance of the proposed models are compared with state-of-the-art methods using simulated and real data involving plate tectonics, cloud deformations, wind, accelerometer, bird migration, and vector-cardiogram data. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Summer Semester, 2014. / June 5, 2014. / Diffeomorphism, Manifold, Nonlinear Regression, Projective Linear Transformation, Riemannian Geometry, Spherical Regression / Includes bibliographical references. / Anuj Srivastava, Professor Co-Directing Dissertation; Wei Wu, Professor Co-Directing Dissertation; Eric Klassen, University Representative; Debdeep Pati, Committee Member.

Page generated in 0.0833 seconds