• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 6757
  • 117
  • 29
  • 4
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 6757
  • 1456
  • 1226
  • 1216
  • 1130
  • 963
  • 638
  • 636
  • 579
  • 465
  • 462
  • 453
  • 451
  • 404
  • 396
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
71

Finite horizon singular control and a related two-person game

Unknown Date (has links)
We consider the finite horizon problem of tracking a Brownian Motion, with possibly non zero drift, by a process of bounded variation, in such a way as to minimize total expected cost of "action" and "deviation from a target state." The cost of "action" is given by two functions (of time), which represent price per unit of increase and decrease in the state process; the cost of "deviation" is incurred continuously at a rate given by a function convex in the state variable and a terminal cost function. We obtain the optimal cost function for this problem, as well an $\varepsilon$-optimal strategy, through the solution of a system of variational inequalities, which has a stochastic representation as the value function for an appropriate two-person game. / Source: Dissertation Abstracts International, Volume: 49-06, Section: B, page: 2256. / Major Professor: Michael Taksar. / Thesis (Ph.D.)--The Florida State University, 1988.
72

Semi-Parametric Generalized Estimating Equations with Kernel Smoother: A Longitudinal Study in Financial Data Analysis

Unknown Date (has links)
Longitudinal studies are widely used in various fields, such as public health, clinic trials and financial data analysis. A major challenge for longitudinal studies is repeated measurements from each subject, which cause time dependent correlation within subjects. Generalized Estimating Equations can deal with correlated outcomes for longitudinal data through marginal effect. My model will base on Generalized Estimating Equations with semi-parametric approach, providing a flexible structure for regression models: coefficients for parametric covariates will be estimated and nuisance covariates will be fitted in kernel smoothers for non-parametric part. Profile kernel estimator and the seemingly unrelated kernel estimator (SUR) will be used to deliver consistent and efficient semi-parametric estimators comparing to parametric models. We provide simulation results for estimating semi-parametric models with one or multiple non-parametric terms. In application part, we would like to focus on financial market: a credit card loan data will be used with the payment information for each customer across 6 months, investigating whether gender, income, age or other factors will influence payment status significantly. Furthermore, we propose model comparisons to evaluate whether our model should be fitted based on different levels of factors, such as male and female or based on different types of estimating methods, such as parametric estimation or semi-parametric estimation. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Fall Semester 2017. / November 15, 2017. / Includes bibliographical references. / Xufeng Niu, Professor Directing Dissertation; Yingmei Cheng, University Representative; Fred Huffer, Committee Member; Minjing Tao, Committee Member.
73

Bayesian Modeling and Variable Selection for Complex Data

Unknown Date (has links)
As we routinely encounter high-throughput datasets in complex biological and environment research, developing novel models and methods for variable selection has received widespread attention. In this dissertation, we addressed a few key challenges in Bayesian modeling and variable selection for high-dimensional data with complex spatial structures. a) Most Bayesian variable selection methods are restricted to mixture priors having separate components for characterizing the signal and the noise. However, such priors encounter computational issues in high dimensions. This has motivated continuous shrinkage priors, resembling the two-component priors facilitating computation and interpretability. While such priors are widely used for estimating high-dimensional sparse vectors, selecting a subset of variables remains a daunting task. b) Spatial/spatial-temporal data sets with complex structures are nowadays commonly encountered in various scientific research fields ranging from atmospheric sciences, forestry, environmental science, biological science, and social science. Selecting important spatial variables that have significant influences on occurrences of events is undoubtedly necessary and essential for providing insights to researchers. Self-excitation, which is a feature that occurrence of an event increases the likelihood of more occurrences of the same type of events nearby in time and space, can be found in many natural/social events. Research on modeling data with self-excitation feature has increasingly drawn interests recently. However, existing literature on self-exciting models with inclusion of high-dimensional spatial covariates is still underdeveloped. c) Gaussian Process is among the most powerful model frames for spatial data. Its major bottleneck is the computational complexity which stems from inversion of dense matrices associated with a Gaussian process covariance. Hierarchical divide-conquer Gaussian Process models have been investigated for ultra large data sets. However, computation associated with scaling the distributing computing algorithm to handle a large number of sub-groups poses a serious bottleneck. In chapter 2 of this dissertation, we propose a general approach for variable selection with shrinkage priors. The presence of very few tuning parameters makes our method attractive in comparison to ad hoc thresholding approaches. The applicability of the approach is not limited to continuous shrinkage priors, but can be used along with any shrinkage prior. Theoretical properties for near-collinear design matrices are investigated and the method is shown to have good performance in a wide range of synthetic data examples and in a real data example on selecting genes affecting survival due to lymphoma. In Chapter 3 of this dissertation, we propose a new self-exciting model that allows the inclusion of spatial covariates. We develop algorithms which are effective in obtaining accurate estimation and variable selection results in a variety of synthetic data examples. Our proposed model is applied on Chicago crime data where the influence of various spatial features is investigated. In Chapter 4, we focus on a hierarchical Gaussian Process regression model for ultra-high dimensional spatial datasets. By evaluating the latent Gaussian process on a regular grid, we propose an efficient computational algorithm through circulant embedding. The latent Gaussian process borrows information across multiple sub-groups, thereby obtaining a more accurate prediction. The hierarchical model and our proposed algorithm are studied through simulation examples. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Fall Semester 2017. / October 23, 2017. / Includes bibliographical references. / Debdeep Pati, Professor Co-Directing Dissertation; Fred Huffer, Professor Co-Directing Dissertation; Alec Kercheval, University Representative; Debajyoti Sinha, Committee Member; Jonathan Bradley, Committee Member.
74

Spatial Statistics and Its Applications in Biostatistics and Environmental Statistics

Unknown Date (has links)
This dissertation presents some topics in spatial statistics and their application in biostatistics and environmental statistics. The field of spatial statistics is an energetic area in statistics. In Chapter 2 and Chapter 3, the goal is to build subregion models under the assumption that the responses or the parameters are spatially correlated. For regression models, considering spatially varying coecients is a reasonable way to build subregion models. There are two different techniques for exploring spatially varying coecients. One is geographically weighted regression (Brunsdon et al. 1998). The other is a spatially varying coecients model which assumes a stationary Gaussian process for the regression coecients (Gelfand et al. 2003). Based on the ideas of these two techniques, we introduce techniques for exploring subregion models in survival analysis which is an important area of biostatistics. In Chapter 2, we introduce modied versions of the Kaplan-Meier and Nelson-Aalen estimators which incorporate geographical weighting. We use ideas from counting process theory to obtain these modied estimators, to derive variance estimates, and to develop associated hypothesis tests. In Chapter 3, we introduce a Bayesian parametric accelerated failure time model with spatially varying coefficients. These two techniques can explore subregion models in survival analysis using both nonparametric and parametric approaches. In Chapter 4, we introduce Bayesian parametric covariance regression analysis for a response vector. The proposed method denes a regression model between the covariance matrix of a p-dimensional response vector and auxiliary variables. We propose a constrained Metropolis-Hastings algorithm to get the estimates. Simulation results are presented to show performance of both regression and covariance matrix estimates. Furthermore, we have a more realistic simulation experiment in which our Bayesian approach has better performance than the MLE. Finally, we illustrate the usefulness of our model by applying it to the Google Flu data. In Chapter 5, we give a brief summary of future work. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Fall Semester 2017. / November 9, 2017. / Biostatistics, Environment Statistics, Spatial Statistics / Includes bibliographical references. / Fred Huffer, Professor Directing Dissertation; Insu Paek, University Representative; Debajyoti Sinha, Committee Member; Elizabeth Slate, Committee Member; Jonathan Bradley, Committee Member.
75

Statistical Methods for High Dimensional Variable Selection

Li, Kaiqiao 19 April 2019 (has links)
<p> This thesis focuses on high dimensional variable selection and addresses the limitation of existing penalized likelihood-based prediction models, as well as multiple hypothesis testing issues in jump detection. In the first project, we proposed a weighted sparse network learning method which allows users to first estimate a data driven network with sparsity property. The estimated network is then optimally combined using a weighted approach to a known or partially known network structure. We adapted the &ell;<sub>1</sub> penalties and proved the oracle property of our proposed model which aims to improve the accuracy of parameter estimation and achieves a parsimonious model in high dimensional setting. We further implemented a stability selection method for tuning the parameters and compared its performance to the cross-validation approach. We implemented our proposed framework for several generalized linear models including the Gaussian, logistic, and Cox proportional hazards (partial) models. We carried out extensive Monte Carlo simulations and compared the performance of our proposed model to the existing methods. Results showed that in the absence of prior information for constructing known network, our approach showed significant improvement over the elastic net models using data driven estimated network structure. On the other hand, if the prior network is correctly specified in advance, our prediction model significantly outperformed other methods. Results further showed that our proposed method is robust to network misspecification and the &ell;<sub>1</sub> penalty improves the prediction and variable selection regardless of the magnitude of the effects size. We also found that the stability selection method achieved a more robust parameter tuning results compared to the cross-validation approach, for all three phenotypes (continuous, binary and survival) considered in our simulation studies. Case studies on proteomic ovarian cancer and gene expression skin cutaneous melanoma further demonstrated that our proposed model achieved good operating characteristics in predicting response to platinum-based chemotherapy and survival risk. We further extended our work in statistical predictive learning in nonlinear prediction, where the traditional generalized linear models are insufficient. Nonlinear methods such as kernel methods show a great power in mapping the nonlinear space to a linear space, which can be easily incorporated into generalized linear models. This thesis demonstrated how to apply multiple kernel tricks to generalized linear model. Results from simulation shows that our proposed multiple kernel learning method can successfully identify the nonlinear likelihood functions under various scenarios. </p><p> The second project concerns jump detection in high frequency financial data. Nonparametric tests are popular and efficient methods for detecting jumps in high frequency financial data. Each method has its own advantageous and disadvantageous and their performance could be affected by the underlying noise and dynamic structure. To address this, we proposed a robust <i> p</i>-values pooling method which aims to combine the advantages of each method. We focus on model validation within a Monte Carlo framework to assess the reproducibility and false discovery rate. Reproducible analysis via correspondence curve and irreproducible discovery rate were analyzed with replicates to study local dependency and robustness across replicates. Extensive simulation studies of high frequency trading data at the minute level were carried out and the operating characteristics of these methods were compared via the false discovery rate control (FDR) framework. Our proposed method was robust across all scenario under reproducibility and FDR analysis. Finally, we applied the method to minute level data from the Limit Order Book System&mdash;the Efficient Reconstruction System (LOBSTER). An R package <b>JumpTest</b> implementing these methods is made available on the Comprehensive R Archive Network (CRAN).</p><p>
76

A Study of Some Issues of Goodness-of-Fit Tests for Logistic Regression

Unknown Date (has links)
Goodness-of-fit tests are important to assess how well a model fits a set of observations. Hosmer-Lemeshow (HL) test is a popular and commonly used method to assess the goodness-of-fit for logistic regression. However, there are two issues for using HL test. One of them is that we have to specify the number of partition groups and the different groups often suggest the different decisions. So in this study, we propose several grouping tests to combine multiple HL tests with varying the number of groups to make the decision instead of just using one arbitrary group or finding the optimum group. This is due to the reason that the best selection for the groups is data-dependent and it is not easy to find. The other drawback of HL test is that it is not powerful to detect the violation of missing interactions between continuous and dichotomous covariates. Therefore, we propose global and interaction tests in order to capture such violations. Simulation studies are carried out to assess the Type I errors and powers for all the proposed tests. These tests are illustrated by the bone mineral density data from NHANES III. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Summer Semester 2018. / July 17, 2018. / Includes bibliographical references. / Dan McGee, Professor Co-Directing Dissertation; Qing Mai, Professor Co-Directing Dissertation; Cathy Levenson, University Representative; Xufeng Niu, Committee Member.
77

Statistical methods for indirectly observed network data

McCormick, Tyler H. January 2011 (has links)
Social networks have become an increasingly common framework for understanding and explaining social phenomena. Yet, despite an abundance of sophisticated models, social network research has yet to realize its full potential, in part because of the difficulty of collecting social network data. In many cases, particularly in the social sciences, collecting complete network data is logistically and financially challenging. In contrast, Aggregated Relational Data (ARD) measure network structure indirectly by asking respondents how many connections they have with members of a certain subpopulation (e.g. How many individuals with HIV/AIDS do you know?). These data require no special sampling procedure and are easily incorporated into existing surveys. This research develops a latent space model for ARD. This dissertation proposes statistical methods for methods for estimating social network and population characteristics using one type of social network data collected using standard surveys. First, a method to estimate both individual social network size (i.e., degree) and the distribution of network sizes in a population is prosed. A second method estimates the demographic characteristics of hard-to-reach groups, or latent demographic profiles. These groups, such as those with HIV/AIDS, unlawful immigrants, or the homeless, are often excluded from the sampling frame of standard social science surveys. A third method develops a latent space model for ARD. This method is similar in spirit to previous latent space models for networks (see Hoff, Raftery and Handcock (2002), for example) in that the dependence structure of the network is represented parsimoniously in a multidimensional geometric space. The key distinction from the complete network case is that instead of conditioning on the (latent) distance between two members of the network, the latent space model for ARD conditions on the expected distance between a survey respondent and the center of a subpopulation in the latent space. A spherical latent space facilitates tractable computation of this expectation. This model estimates relative homogeneity between groups in the population and variation in the propensity for interaction between respondents and group members.
78

Some Nonparametric Methods for Clinical Trials and High Dimensional Data

Wu, Xiaoru January 2011 (has links)
This dissertation addresses two problems from novel perspectives. In chapter 2, I propose an empirical likelihood based method to nonparametrically adjust for baseline covariates in randomized clinical trials and in chapter 3, I develop a survival analysis framework for multivariate K-sample problems. (I): Covariate adjustment is an important tool in the analysis of randomized clinical trials and observational studies. It can be used to increase efficiency and thus power, and to reduce possible bias. While most statistical tests in randomized clinical trials are nonparametric in nature, approaches for covariate adjustment typically rely on specific regression models, such as the linear model for a continuous outcome, the logistic regression model for a dichotomous outcome, and the Cox model for survival time. Several recent efforts have focused on model-free covariate adjustment. This thesis makes use of the empirical likelihood method and proposes a nonparametric approach to covariate adjustment. A major advantage of the new approach is that it automatically utilizes covariate information in an optimal way without fitting a nonparametric regression. The usual asymptotic properties, including the Wilks-type result of convergence to a chi-square distribution for the empirical likelihood ratio based test, and asymptotic normality for the corresponding maximum empirical likelihood estimator, are established. It is also shown that the resulting test is asymptotically most powerful and that the estimator for the treatment effect achieves the semiparametric efficiency bound. The new method is applied to the Global Use of Strategies to Open Occluded Coronary Arteries (GUSTO)-I trial. Extensive simulations are conducted, validating the theoretical findings. This work is not only useful for nonparametric covariate adjustment but also has theoretical value. It broadens the scope of the traditional empirical likelihood inference by allowing the number of constraints to grow with the sample size. (II): Motivated by applications in high-dimensional settings, I propose a novel approach to testing equality of two or more populations by constructing a class of intensity centered score processes. The resulting tests are analogous in spirit to the well-known class of weighted log-rank statistics that is widely used in survival analysis. The test statistics are nonparametric, computationally simple and applicable to high-dimensional data. We establish the usual large sample properties by showing that the underlying log-rank score process converges weakly to a Gaussian random field with zero mean under the null hypothesis, and with a drift under the contiguous alternatives. For the Kolmogorov-Smirnov-type and the Cramer-von Mises-type statistics, we also establish the consistency result for any fixed alternative. As a practical means to obtain approximate cutoff points for the test statistics, a simulation based resampling method is proposed, with theoretical justification given by establishing weak convergence for the randomly weighted log-rank score process. The new approach is applied to a study of brain activation measured by functional magnetic resonance imaging when performing two linguistic tasks and also to a prostate cancer DNA microarray data set.
79

On testing the change-point in the longitudinal bent line quantile regression model

Sha, Nanshi January 2011 (has links)
The problem of detecting changes has been receiving considerable attention in various fields. In general, the change-point problem is to identify the location(s) in an ordered sequence that divides this sequence into groups, which follow different models. This dissertation considers the change-point problem in quantile regression for observational or clinical studies involving correlated data (e.g. longitudinal studies) . Our research is motivated by the lack of ideal inference procedures for such models. Our contributions are two-fold. First, we extend the previously reported work on the bent line quantile regression model [Li et al. (2011)] to a longitudinal framework. Second, we propose a score-type test for hypothesis testing of the change-point problem using rank-based inference. The proposed test in this thesis has several advantages over the existing inferential approaches. Most importantly, it circumvents the difficulties of estimating nuisance parameters (e.g. density function of unspecified error) as required for the Wald test in previous works and thus is more reliable in finite sample performance. Furthermore, we demonstrate, through a series of simulations, that the proposed methods also outperform the extensively used bootstrap methods by providing more accurate and computationally efficient confidence intervals. To illustrate the usage of our methods, we apply them to two datasets from real studies: the Finnish Longitudinal Growth Study and an AIDS clinical trial. In each case, the proposed approach sheds light on the response pattern by providing an estimated location of abrupt change along with its 95% confidence interval at any quantile of interest "” a key parameter with clinical implications. The proposed methods allow for different change-points at different quantile levels of the outcome. In this way, they offer a more comprehensive picture of the covariate effects on the response variable than is provided by other change-point models targeted exclusively on the conditional mean. We conclude that our framework and proposed methodology are valuable for studying the change-point problem involving longitudinal data.
80

Self-controlled methods for postmarketing drug safety surveillance in large-scale longitudinal data

Simpson, Shawn E. January 2011 (has links)
A primary objective in postmarketing drug safety surveillance is to ascertain the relationship between time-varying drug exposures and adverse events (AEs) related to health outcomes. Surveillance can be based on longitudinal observational databases (LODs), which contain time-stamped patient-level medical information including periods of drug exposure and dates of diagnoses. Due to its desirable properties, we focus on the self-controlled case series (SCCS) method for analysis in this context. SCCS implicitly controls for fixed multiplicative baseline covariates since each individual acts as their own control. In addition, only exposed cases are required for the analysis, which is computationally advantageous. In the first part of this work we present how the simple SCCS model can be applied to the surveillance problem, and compare the results of simple SCCS to those of existing methods. Many current surveillance methods are based on marginal associations between drug exposures and AEs. Such analyses ignore confounding drugs and interactions and have the potential to give misleading results. In order to avoid these difficulties, it is desirable for an analysis strategy to incorporate large numbers of time-varying potential confounders such as other drugs. In the second part of this work we propose the Bayesian multiple SCCS approach, which deals with high dimensionality and can provide a sparse solution via a Laplacian prior. We present details of the model and optimization procedure, as well as results of empirical investigations. SCCS is based on a conditional Poisson regression model, which assumes that events at different time points are conditionally independent given the covariate process. This requirement is problematic when the occurrence of an event can alter the future event risk. In a clinical setting, for example, patients who have a first myocardial infarction (MI) may be at higher subsequent risk for a second. In the third part of this work we propose the positive dependence self-controlled case series (PD-SCCS) method: a generalization of SCCS that allows the occurrence of an event to increase the future event risk, yet maintains the advantages of the original by controlling for fixed baseline covariates and relying solely on data from cases. We develop the model and compare the results of PD-SCCS and SCCS on example drug-AE pairs.

Page generated in 0.1103 seconds