• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 307
  • 90
  • 59
  • 51
  • 12
  • 10
  • 7
  • 6
  • 6
  • 5
  • 4
  • 3
  • 2
  • 2
  • 2
  • Tagged with
  • 642
  • 280
  • 159
  • 138
  • 137
  • 100
  • 72
  • 69
  • 67
  • 66
  • 66
  • 63
  • 57
  • 49
  • 48
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
241

Measurement Error in Progress Monitoring Data: Comparing Methods Necessary for High-Stakes Decisions

Bruhl, Susan 2012 May 1900 (has links)
Support for the use of progress monitoring results for high-stakes decisions is emerging in the literature, but few studies support the reliability of the measures for this level of decision-making. What little research exists is limited to oral reading fluency measures, and their reliability for progress monitoring (PM) is not supported. This dissertation explored methods rarely applied in the literature for summarizing and analyzing progress monitoring results for medium- to high-stakes decisions. The study was conducted using extant data from 92 "low performing" third graders who were progress monitored using mathematics concept and application measures. The results for the participants in this study identified 1) the number of weeks needed to reliably assess growth on the measure; 2) if slopes differed when results were analyzed with parametric or nonparametric analyses; 3) the reliability of growth; and 4) the extent to which the group did or did not meet parametric assumptions inherent in the ordinary least square regression model. The results indicate reliable growth from static scores can be obtained in as few as 10 weeks of progress monitoring. It was also found that within this dataset, growth through parametric and nonparametric analyses was similar. These findings are limited to the dataset analyzed in this study but provide promising methods not widely known among practitioners and rarely applied in the PM literature.
242

Selection of smoothing parameters with application in causal inference

Häggström, Jenny January 2011 (has links)
This thesis is a contribution to the research area concerned with selection of smoothing parameters in the framework of nonparametric and semiparametric regression. Selection of smoothing parameters is one of the most important issues in this framework and the choice can heavily influence subsequent results. A nonparametric or semiparametric approach is often desirable when large datasets are available since this allow us to make fewer and weaker assumptions as opposed to what is needed in a parametric approach. In the first paper we consider smoothing parameter selection in nonparametric regression when the purpose is to accurately predict future or unobserved data. We study the use of accumulated prediction errors and make comparisons to leave-one-out cross-validation which is widely used by practitioners. In the second paper a general semiparametric additive model is considered and the focus is on selection of smoothing parameters when optimal estimation of some specific parameter is of interest. We introduce a double smoothing estimator of a mean squared error and propose to select smoothing parameters by minimizing this estimator. Our approach is compared with existing methods.The third paper is concerned with the selection of smoothing parameters optimal for estimating average treatment effects defined within the potential outcome framework. For this estimation problem we propose double smoothing methods similar to the method proposed in the second paper. Theoretical properties of the proposed methods are derived and comparisons with existing methods are made by simulations.In the last paper we apply our results from the third paper by using a double smoothing method for selecting smoothing parameters when estimating average treatment effects on the treated. We estimate the effect on BMI of divorcing in middle age. Rich data on socioeconomic conditions, health and lifestyle from Swedish longitudinal registers is used.
243

Essays in Industrial Organization and Econometrics

Blevins, Jason Ryan January 2010 (has links)
<p>This dissertation consists of three chapters relating to</p> <p>identification and inference in dynamic microeconometric models</p> <p>including dynamic discrete games with many players, dynamic games with</p> <p>discrete and continuous choices, and semiparametric binary choice and</p> <p>duration panel data models.</p> <p>The first chapter provides a framework for estimating large-scale</p> <p>dynamic discrete choice models (both single- and multi-agent models)</p> <p>in continuous time. The advantage of working in continuous time is</p> <p>that state changes occur sequentially, rather than simultaneously,</p> <p>avoiding a substantial curse of dimensionality that arises in</p> <p>multi-agent settings. Eliminating this computational bottleneck is</p> <p>the key to providing a seamless link between estimating the model and</p> <p>performing post-estimation counterfactuals. While recently developed</p> <p>two-step estimation techniques have made it possible to estimate</p> <p>large-scale problems, solving for equilibria remains computationally</p> <p>challenging. In many cases, the models that applied researchers</p> <p>estimate do not match the models that are then used to perform</p> <p>counterfactuals. By modeling decisions in continuous time, we are able</p> <p>to take advantage of the recent advances in estimation while</p> <p>preserving a tight link between estimation and policy experiments. We</p> <p>also consider estimation in situations with imperfectly sampled data,</p> <p>such as when we do not observe the decision not to move, or when data</p> <p>is aggregated over time, such as when only discrete-time data are</p> <p>available at regularly spaced intervals. We illustrate the power of</p> <p>our framework using several large-scale Monte Carlo experiments.</p> <p>The second chapter considers semiparametric panel data binary choice</p> <p>and duration models with fixed effects. Such models are point</p> <p>identified when at least one regressor has full support on the real</p> <p>line. It is common in practice, however, to have only discrete or</p> <p>continuous, but possibly bounded, regressors. We focus on</p> <p>identification, estimation, and inference for the identified set in</p> <p>such cases, when the parameters of interest may only be partially</p> <p>identified. We develop a set of general results for</p> <p>criterion-function-based estimation and inference in partially</p> <p>identified models which can be applied to both regular and irregular</p> <p>models. We apply our general results first to a fixed effects binary</p> <p>choice panel data model where we obtain a sharp characterization of</p> <p>the identified set and propose a consistent set estimator,</p> <p>establishing its rate of convergence under different conditions.</p> <p>Rates arbitrarily close to <italic>n<super>-1/3</super></italic> are</p> <p>possible when a continuous, but possibly bounded, regressor is</p> <p>present. When all regressors are discrete the estimates converge</p> <p>arbitrarily fast to the identified set. We also propose a</p> <p>subsampling-based procedure for constructing confidence regions in the</p> <p>models we consider. Finally, we carry out a series of Monte Carlo</p> <p>experiments to illustrate and evaluate the proposed procedures. We</p> <p>also consider extensions to other fixed effects panel data models such</p> <p>as binary choice models with lagged dependent variables and duration</p> <p>models.</p> <p>The third chapter considers nonparametric identification of dynamic</p> <p>games of incomplete information in which players make both discrete</p> <p>and continuous choices. Such models are commonly used in applied work</p> <p>in industrial organization where, for example, firms make discrete</p> <p>entry and exit decisions followed by continuous investment decisions.</p> <p>We first review existing identification results for single agent</p> <p>dynamic discrete choice models before turning to single-agent models</p> <p>with an additional continuous choice variable and finally to</p> <p>multi-agent models with both discrete and continuous choices. We</p> <p>provide conditions for nonparametric identification of the utility</p> <p>function in both cases.</p> / Dissertation
244

Modeling Temporal and Spatial Data Dependence with Bayesian Nonparametrics

Ren, Lu January 2010 (has links)
<p>In this thesis, temporal and spatial dependence are considered within nonparametric priors to help infer patterns, clusters or segments in data. In traditional nonparametric mixture models, observations are usually assumed exchangeable, even though dependence often exists associated with the space or time at which data are generated.</p> <p>Focused on model-based clustering and segmentation, this thesis addresses the issue in different ways, for temporal and spatial dependence.</p> <p>For sequential data analysis, the dynamic hierarchical Dirichlet process is proposed to capture the temporal dependence across different groups. The data collected at any time point are represented via a mixture associated with an appropriate underlying model; the statistical properties of data collected at consecutive time points are linked via a random parameter that controls their probabilistic similarity. The new model favors a smooth evolutionary clustering while allowing innovative patterns to be inferred. Experimental analysis is performed on music, and may also be employed on text data for learning topics.</p> <p>Spatially dependent data is more challenging to model due to its spatially-grid structure and often large computational cost of analysis. As a non-parametric clustering prior, the logistic stick-breaking process introduced here imposes the belief that proximate data are more likely to be clustered together. Multiple logistic regression functions generate a set of sticks with each dominating a spatially localized segment. The proposed model is employed on image segmentation and speaker diarization, yielding generally homogeneous segments with sharp boundaries.</p> <p>In addition, we also consider a multi-task learning with each task associated with spatial dependence. For the specific application of co-segmentation with multiple images, a hierarchical Bayesian model called H-LSBP is proposed. By sharing the same mixture atoms for different images, the model infers the inter-similarity between each pair of images, and hence can be employed for image sorting.</p> / Dissertation
245

Nonparametric Bayesian Methods for Multiple Imputation of Large Scale Incomplete Categorical Data in Panel Studies

Si, Yajuan January 2012 (has links)
<p>The thesis develops nonparametric Bayesian models to handle incomplete categorical variables in data sets with high dimension using the framework of multiple imputation. It presents methods for ignorable missing data in cross-sectional studies, and potentially non-ignorable missing data in panel studies with refreshment samples.</p><p>The first contribution is a fully Bayesian, joint modeling approach of multiple imputation for categorical data based on Dirichlet process mixtures of multinomial distributions. The approach automatically models complex dependencies while being computationally expedient. </p><p>I illustrate repeated sampling properties of the approach</p><p>using simulated data. This approach offers better performance than default chained equations methods, which are often used in such settings. I apply the methodology to impute missing background data in the 2007 Trends in International Mathematics and Science Study.</p><p>For the second contribution, I extend the nonparametric Bayesian imputation engine to consider a mix of potentially non-ignorable attrition and ignorable item nonresponse in multiple wave panel studies. Ignoring the attrition in models for panel data can result in biased inference if the reason for attrition is systematic and related to the missing values. Panel data alone cannot estimate the attrition effect without untestable assumptions about the missing data mechanism. Refreshment samples offer an extra data source that can be utilized to estimate the attrition effect while reducing reliance on strong assumptions of the missing data mechanism. </p><p>I consider two novel Bayesian approaches to handle the attrition and item non-response simultaneously under multiple imputation in a two wave panel with one refreshment sample when the variables involved are categorical and high dimensional. </p><p>First, I present a semi-parametric selection model that includes an additive non-ignorable attrition model with main effects of all variables, including demographic variables and outcome measures in wave 1 and wave 2. The survey variables are modeled jointly using Bayesian mixture of multinomial distributions. I develop the posterior computation algorithms for the semi-parametric selection model under different prior choices for the regression coefficients in the attrition model. </p><p>Second, I propose two Bayesian pattern mixture models for this scenario that use latent classes to model the dependency among the variables and the attrition. I develop a dependent Bayesian latent pattern mixture model for which variables are modeled via latent classes and attrition is treated as a covariate in the class allocation weights. And, I develop a joint Bayesian latent pattern mixture model, for which attrition and variables are modeled jointly via latent classes.</p><p>I show via simulation studies that the pattern mixture models can recover true parameter estimates, even when inferences based on the panel alone are biased from attrition. </p><p>I apply both the selection and pattern mixture models to data from the 2007-2008 Associated Press/Yahoo News election panel study.</p> / Dissertation
246

Choosing a Kernel for Cross-Validation

Savchuk, Olga 14 January 2010 (has links)
The statistical properties of cross-validation bandwidths can be improved by choosing an appropriate kernel, which is different from the kernels traditionally used for cross- validation purposes. In the light of this idea, we developed two new methods of bandwidth selection termed: Indirect cross-validation and Robust one-sided cross- validation. The kernels used in the Indirect cross-validation method yield an improvement in the relative bandwidth rate to n^1=4, which is substantially better than the n^1=10 rate of the least squares cross-validation method. The robust kernels used in the Robust one-sided cross-validation method eliminate the bandwidth bias for the case of regression functions with discontinuous derivatives.
247

Efficient inference in general semiparametric regression models

Maity, Arnab 15 May 2009 (has links)
Semiparametric regression has become very popular in the field of Statistics over the years. While on one hand more and more sophisticated models are being developed, on the other hand the resulting theory and estimation process has become more and more involved. The main problems that are addressed in this work are related to efficient inferential procedures in general semiparametric regression problems. We first discuss efficient estimation of population-level summaries in general semiparametric regression models. Here our focus is on estimating general population-level quantities that combine the parametric and nonparametric parts of the model (e.g., population mean, probabilities, etc.). We place this problem in a general context, provide a general kernel-based methodology, and derive the asymptotic distributions of estimates of these population-level quantities, showing that in many cases the estimates are semiparametric efficient. Next, motivated from the problem of testing for genetic effects on complex traits in the presence of gene-environment interaction, we consider developing score test in general semiparametric regression problems that involves Tukey style 1 d.f form of interaction between parametrically and non-parametrically modeled covariates. We develop adjusted score statistics which are unbiased and asymptotically efficient and can be performed using standard bandwidth selection methods. In addition, to over come the difficulty of solving functional equations, we give easy interpretations of the target functions, which in turn allow us to develop estimation procedures that can be easily implemented using standard computational methods. Finally, we take up the important problem of estimation in a general semiparametric regression model when covariates are measured with an additive measurement error structure having normally distributed measurement errors. In contrast to methods that require solving integral equation of dimension the size of the covariate measured with error, we propose methodology based on Monte Carlo corrected scores to estimate the model components and investigate the asymptotic behavior of the estimates. For each of the problems, we present simulation studies to observe the performance of the proposed inferential procedures. In addition, we apply our proposed methodology to analyze nontrivial real life data sets and present the results.
248

Nonparametric Methods for Point Processes and Geostatistical Data

Kolodziej, Elizabeth Young 2010 August 1900 (has links)
In this dissertation, we explore the properties of correlation structure for spatio-temporal point processes and a quantitative spatial process. Spatio-temporal point processes are often assumed to be separable; we propose a formal approach for testing whether a particular data set is indeed separable. Because of the resampling methodology, the approach requires minimal conditions on the underlying spatio-temporal process to perform the hypothesis test, and thus is appropriate for a wide class of models. Africanized Honey Bees (AHBs, Apis mellifera scutellata) abscond more frequently and defend more quickly than colonies of European origin. That they also utilize smaller cavities for building colonies expands their range of suitable hive locations to common objects in urban environments. The aim of the AHB study is to create a model of this quantitative spatial process to predict where AHBs were more likely to build a colony, and to explore what variables might be related to the occurrences of colonies. We constructed two generalized linear models to predict the habitation of water meter boxes, based on surrounding landscape classifications, whether there were colonies in surrounding areas, and other variables. The presence of colonies in the area was a strong predictor of whether AHBs occupied a water meter box, suggesting that AHBs tend to form aggregations, and that the removal of a colony from a water meter box may make other nearby boxes less attractive to the bees.
249

An Additive Bivariate Hierarchical Model for Functional Data and Related Computations

Redd, Andrew Middleton 2010 August 1900 (has links)
The work presented in this dissertation centers on the theme of regression and computation methodology. Functional data is an important class of longitudinal data, and principal component analysis is an important approach to regression with this type of data. Here we present an additive hierarchical bivariate functional data model employing principal components to identify random e ects. This additive model extends the univariate functional principal component model. These models are implemented in the pfda package for R. To t the curves from this class of models orthogonalized spline basis are used to reduce the dimensionality of the t, but retain exibility. Methods for handing spline basis functions in a purely analytical manner, including the orthogonalizing process and computing of penalty matrices used to t the principal component models are presented. The methods are implemented in the R package orthogonalsplinebasis. The projects discussed involve complicated coding for the implementations in R. To facilitate this I created the NppToR utility to add R functionality to the popular windows code editor Notepad . A brief overview of the use of the utility is also included.
250

Inference and Visualization of Periodic Sequences

Sun, Ying 2011 August 1900 (has links)
This dissertation is composed of four articles describing inference and visualization of periodic sequences. In the first article, a nonparametric method is proposed for estimating the period and values of a periodic sequence when the data are evenly spaced in time. The period is estimated by a "leave-out-one-cycle" version of cross-validation (CV) and complements the periodogram, a widely used tool for period estimation. The CV method is computationally simple and implicitly penalizes multiples of the smallest period, leading to a "virtually" consistent estimator. The second article is the multivariate extension, where we present a CV method of estimating the periods of multiple periodic sequences when data are observed at evenly spaced time points. The basic idea is to borrow information from other correlated sequences to improve estimation of the period of interest. We show that the asymptotic behavior of the bivariate CV is the same as the CV for one sequence, however, for finite samples, the better the periods of the other correlated sequences are estimated, the more substantial improvements can be obtained. The third article proposes an informative exploratory tool, the functional boxplot, for visualizing functional data, as well as its generalization, the enhanced functional boxplot. Based on the center outwards ordering induced by band depth for functional data, the descriptive statistics of a functional boxplot are: the envelope of the 50 percent central region, the median curve and the maximum non-outlying envelope. In addition, outliers can be detected by the 1.5 times the 50 percent central region empirical rule. The last article proposes a simulation-based method to adjust functional boxplots for correlations when visualizing functional and spatio-temporal data, as well as detecting outliers. We start by investigating the relationship between the spatiotemporal dependence and the 1.5 times the 50 percent central region empirical outlier detection rule. Then, we propose to simulate observations without outliers based on a robust estimator of the covariance function of the data. We select the constant factor in the functional boxplot to control the probability of correctly detecting no outliers. Finally, we apply the selected factor to the functional boxplot of the original data.

Page generated in 0.0807 seconds