Spelling suggestions: "subject:"estatistics"" "subject:"cstatistics""
481 |
Three Aspects of Biostatistical Learning TheoryNeykov, Matey 17 July 2015 (has links)
In the present dissertation we consider three classical problems in biostatistics and statistical learning - classification, variable selection and statistical inference.
Chapter 2 is dedicated to multi-class classification. We characterize a class of loss functions which we deem relaxed Fisher consistent, whose local minimizers not only recover the Bayes rule but also the exact conditional class probabilities. Our class encompasses previously studied classes of loss-functions, and includes non-convex functions, which are known to be less susceptible to outliers. We propose a generic greedy functional gradient-descent minimization algorithm for boosting weak learners, which works with any loss function in our class. We show that the boosting algorithm achieves geometric rate of convergence in the case of a convex loss. In addition we provide numerical studies and a real data example which serve to illustrate that the algorithm performs well in practice.
In Chapter 3, we provide insights on the behavior of sliced inverse regression in a high-dimensional setting under a single index model. We analyze two algorithms: a thresholding based algorithm known as diagonal thresholding and an L1 penalization algorithm - semidefinite programming, and show that they achieve optimal (up to a constant) sample size in terms of support recovery in the case of standard Gaussian predictors. In addition, we look into the performance of the linear regression LASSO in single index models with correlated Gaussian designs. We show that under certain restrictions on the covariance and signal, the linear regression LASSO can also enjoy optimal sample size in terms of support recovery. Our analysis extends existing results on LASSO's variable selection capabilities for linear models.
Chapter 4 develops general inferential framework for testing and constructing confidence intervals for high-dimensional estimating equations. Such framework has a variety of applications and allows us to provide tests and confidence regions for parameters estimated by algorithms such as the Dantzig Selector, CLIME and LDP among others, non of which has been previously equipped with inferential procedures. / Biostatistics
|
482 |
On Causal Inference for Ordinal OutcomesLu, Jiannan 04 December 2015 (has links)
This dissertation studies the problem of causal inference for ordinal outcomes. Chapter 1 focuses on the sharp null hypothesis of no treatment effect on all experimental units, and develops a systematic procedure for closed-form construction of sequences of alternative hypotheses in increasing orders of their departures from the sharp null hypothesis. The resulted construction procedure helps assessing the powers of randomization tests with ordinal outcomes. Chapter 2 proposes two new causal parameters, i.e., the probabilities that the treatment is beneficial and strictly beneficial for the experimental units, and derives their sharp bounds using only the marginal distributions, without imposing any assumptions on the joint distribution of the potential outcomes. Chapter 3 generalizes the framework in Chapter 2 to address noncompliance. / Statistics
|
483 |
Topics in Bayesian Inference for Causal EffectsGarcia Horton, Viviana 04 December 2015 (has links)
This manuscript addresses two topics in Bayesian inference for causal effects.
1) Treatment noncompliance is frequent in clinical trials, and because the treatment actually received may be different from that assigned, comparisons between groups as randomized will no longer assess the effect of the treatment received.
To address this complication, we create latent subgroups based on the potential outcomes of treatment received and focus on the subgroup of compliers, where under certain assumptions the estimands of causal effects of assignment can be interpreted as causal effects of receipt of treatment.
We propose estimands of causal effects for right-censored time-to event endpoints, and discuss a framework to estimate those causal effects that relies on modeling survival times as parametric functions of pre-treatment variables.
We demonstrate a Bayesian estimation strategy that multiply imputes the missing data using posterior predictive distributions using a randomized clinical trial involving breast cancer patients.
Finally, we establish a connection with the commonly used parametric proportional hazards and accelerated failure time models, and briefly discuss the consequences of relaxing the assumption of independent censoring.
2) Bayesian inference for causal effects based on data obtained from ignorable assignment mechanisms can be sensitive to the model specified for the data.
Ignorability is defined with respect to specific models for an assignment mechanism and data, which we call the ``true'' generating data models, generally unknown to the statistician; these, in turn, determine a true posterior distribution for a causal estimand of interest.
On the other hand, the statistician poses a set of models to conduct the analysis, which we call the ``statistician's'' models; a posterior distribution for the causal estimand can be obtained assuming these models.
Let $\Delta_M$ denote the difference between the true models and the statistician's models, and let $\Delta_D$ denote the difference between the true posterior distribution and the statistician's posterior distribution (for a specific estimand).
For fixed $\Delta_M$ and fixed sample size, $\Delta_D$ varies more with data-dependent assignment mechanisms than with data-free assignment mechanisms.
We illustrate this through a sequence of examples of $\Delta_M$, and
under various ignorable assignment mechanisms, namely, complete randomization design, rerandomization design, and the finite selection model design.
In each case, we create the 95\% posterior interval for an estimand under a statistician's model, and then compute its coverage probability for the correct posterior distribution; this Bayesian coverage probability is our choice of measure $\Delta_D$.
The objective of these examples is to provide insights into the ranges of data models for which Bayesian inference for causal effects from datasets obtained through ignorable assignment mechanisms is approximately valid from the Bayesian perspective, and how these validities are influenced by data-dependent assignment mechanisms. / Statistics
|
484 |
`Time for a New Angle!': Unravel the Mystery of Split-Plot Designs via the Potential Outcomes PrismZhao, Anqi 25 July 2017 (has links)
This manuscript investigates two different approaches, namely the Neymanian randomization based (Neyman, 1923) method and the Bayesian model based (Rubin, 1978) method, towards the causal inference for 2-by-2 split-plot designs (Jones and Nachtsheim, 2009), both under the potential outcomes framework (Neyman, 1923; Rubin, 1974, 1978, 2005).
Chapter 1 -- Chapter 5. Given two 2-level factors of interest, a 2-by-2 split-plot design (a) takes each of the 2-by-2 = 4 possible factorial combinations as a treatment, (b) identifies one factor as 'whole-plot,' (c) divides the experimental units into blocks, and (d) assigns the treatments in such a way that all units within the same block receive the same level of the whole-plot factor. Assuming the potential outcomes framework, we propose in Chapters 1 — 5 a randomization-based estimation procedure for causal inference under such designs. Sampling variances of the point estimates are derived in closed form as linear combinations of the between- and within-block covariances of the potential outcomes. Results are compared to those under complete randomizations as measures of design efficiency. Interval estimates are constructed based on conservative estimates of the sampling variances, and the frequency coverage properties evaluated via simulation. Superiority over existing model-based alternatives is reported under a variety of settings for both binary and continuous outcomes.
Chapter 6. Causal inference compares the differences in outcomes over a particular set of experiment units. Whereas the randomization-based Neymanian inference focuses on the experimental units directly involved in the study, the introduction of Bayesian inferential framework provides a principled way to extend such finite population concerns to the super-population (Rubin, 1978). We outline in this chapter the explicit procedure for analyzing 2-by-2 split-plot designs under this framework, and illustrate the various technical issues in the actual implementation via examples. / Statistics
|
485 |
A Comparison of Variance and Renyi's Entropy with Application to Machine LearningPeccarelli, Adric M. 15 December 2017 (has links)
<p> This research explores parametric and nonparametric similarities and disagreements between variance and the information theoretic measure of entropy, specifically Renyi’s entropy. A history and known relationships of the two different uncertainty measures is examined. Then, twenty discrete and continuous parametric families are tabulated with their respective variance and Renyi entropy functions ordered to understand the behavior of these two measures of uncertainty. Finally, an algorithm for variable selection using Renyi’s Quadratic Entropy and its kernel estimation is explored and compared to other popular selection methods using real data.</p><p>
|
486 |
A Monte Carlo study of the robustness of coefficient alpha.Shultz, Sharon G. January 1993 (has links)
The biasedness and efficiency of coefficient alpha were studied given various error score distribution, number of examinees, population reliability values, and number of subtests. A Monte Carlo methodology was used in which an additive true score matrix was constructed and an error score was added to each true score. The true scores were uniformly distributed while the error score distributions used were the normal, mixed normal, exponential, and the negative exponential. In addition, 1000 replications were used and the resulting sampling distributions were examined with respect to their skewness and kurtosis. The results indicated that, when 50, 100, and 200 examinees and 5, 10, 20, and 30 subtests were used, coefficient alpha was an unbiased estimator of population reliability values of 0.4, 0.6, 0.8, and 0.9. The efficiency of the estimate increased as the number of examinees or the population reliability increased, regardless of the error score distribution used. Of the four error score distributions used the normal was the most efficient. In addition, the sampling distributions tended to become less skewed and more leptokurtic as the number of examinees and the population reliability increased. Surprisingly, an increase in the number of subtests did not appear to affect the biasedness and efficiency of coefficient alpha or the skewness and kurtosis of its sampling distribution. The results are related to the literature and suggestions for future study are given.
|
487 |
Simulation, estimation des paramètres et prédiction pour un processus de Kendall.Mangin, Christian. January 1994 (has links)
Le processus de Kendall a ete propose par Nucho (1979) comme une description adequate de la fluctuation de la demande entre deux points. Nous utilisons une simulation pour tester une methode d'estimation des parametres du processus. Nous developpons un algorithme pour calculer un intervalle de prediction de la demande a 95% de certitude, a t unites de temps dans le futur, a partir d'une realisation du processus $t\sp*$ unites de temps dans le passe. Nous verifions l'applicabilite de ce modele sur une base de donnees decrivant les fluctuations de la demande sur un reseau commercial.
|
488 |
The use of Spearman's footrule in testing for trend when the data is incomplete.Charbonneau, Martin. January 1994 (has links)
Abstract Not Available.
|
489 |
Linear regression with spatially correlated data.Bocci, Cynthia Jacqueline. January 1999 (has links)
In this dissertation, the analysis of spatial data through regression is investigated. Multiple observations taken from sites are assumed to be spatially dependent. Our linear model includes a pure error and a spatial error term whose covariance structure is given by an unknown linear combination of 2 known covariograms. The pure and spatial error terms also have separate scale parameters. Our first concern is with the estimation of the parameters of this model. An algorithm to estimate these parameters is proposed as well as a consistent estimator for one of the spatial parameters. Numerical simulations support the use of our algorithm. The second main issue is that of asymptotics. To that end, a formula for the inverse of the variance-covariance matrix of observations is developed. Limits of the asymptotic variance of the parameter estimates as the number of observations per site increases are found with this formula. On the other hand, specific sampling schemes are studied when considering the asymptotics for the number of sites going to infinity. From the simulation and asymptotic results, some rules for experimental design are given. Extensions to more general models are made and areas of future research including possible applications are suggested.
|
490 |
Multivariate non-parametric tests of trend in the presence of missing data.Park, Jincheol. January 2000 (has links)
When testing for trend one may be interested in either a monotone trend or a step trend. The former assumes that the population shifts monotonically over time without specifing when the shift occurrs. The latter assumes that the observations recorded before some specific time belong to a different population from the one recorded after that time. Our interest will be focused on tests for monotone trend. There exist parametric as well as nonparametric methods, univariate and multi-variate, to test for monotone trend. Practically, it occurs more often than not that some portion of the collected data are missing. There is at present a way to analyze incomplete data in the univariate case. In this work, we introduce nonparametric multivariate test statistics to test for monotone trend in the presence of missing data and deduce some corresponding asymptotic properties. (Abstract shortened by UMI.)
|
Page generated in 0.11 seconds