Spelling suggestions: "subject:"estatistics"" "subject:"cstatistics""
401 |
Topics in Computational AdvertisingAu, Timothy Chun-Wai January 2014 (has links)
<p>Computational advertising is an emerging scientific discipline that incorporates tools and ideas from fields such as statistics, computer science, and economics. Although a consequence of the rapid growth of the Internet, computational advertising has since helped transform the online advertising business into a multi-billion dollar industry.</p><p>The fundamental goal of computational advertising is to determine the ``best'' online ad to display to any given user. This ``best'' ad, however, changes depending upon the specific context that is under consideration. This leads to a variety of different problems, three of which are discussed in this thesis.</p><p>Chapter 1 briefly introduces the topics of online advertising and computational advertising. Chapter 2 proposes a numerical method to approximate the pure strategy Nash equilibrium bidding functions in an independent private value first-price sealed-bid auction where bidders draw their types from continuous and atomless distributions---a setting in which solutions cannot generally be analytically derived, despite the fact that they are known to exist and be unique. Chapter 3 proposes a cross-domain recommender system that is a multiple-domain extension of the Bayesian Probabilistic Matrix Factorization model. Chapter 4 discuss some of the tools and challenges of text mining by using the Trayvon Martin shooting incident as a case study in analyzing the lexical content and network connectivity structure of the political blogosphere. Finally, Chapter 5 presents some concluding remarks and briefly discusses other problems in computational advertising.</p> / Dissertation
|
402 |
Nonparametric Bayes for Big DataYang, Yun January 2014 (has links)
<p>Classical asymptotic theory deals with models in which the sample size $n$ goes to infinity with the number of parameters $p$ being fixed. However, rapid advancement of technology has empowered today's scientists to collect a huge number of explanatory variables</p><p>to predict a response. Many modern applications in science and engineering belong to the ``big data" regime in which both $p$ and $n$ may be very large. A variety of genomic applications even have $p$ substantially greater than $n$. With the advent of MCMC, Bayesian approaches exploded in popularity. Bayesian inference often allows easier interpretability than frequentist inference. Therefore, it becomes important to understand and evaluate</p><p>Bayesian procedures for ``big data" from a frequentist perspective.</p><p>In this dissertation, we address a number of questions related to solving large-scale statistical problems via Bayesian nonparametric methods.</p><p>It is well-known that classical estimators can be inconsistent in the high-dimensional regime without any constraints on the model. Therefore, imposing additional low-dimensional structures on the high-dimensional ambient space becomes inevitable. In the first two chapters of the thesis, we study the prediction performance of high-dimensional nonparametric regression from a minimax point of view. We consider two different low-dimensional constraints: 1. the response depends only on a small subset of the covariates; 2. the covariates lie on a low dimensional manifold in the original high dimensional ambient space. We also provide Bayesian nonparametric methods based on Gaussian process priors that are shown to be adaptive to unknown smoothness or low-dimensional manifold structure by attaining minimax convergence rates up to log factors. In chapter 3, we consider high-dimensional classification problems where all data are of categorical nature. We build a parsimonious model based on Bayesian tensor factorization for classification while doing inferences on the important predictors.</p><p>It is generally believed that ensemble approaches, which combine multiple algorithms or models, can outperform any single algorithm at machine learning tasks, such as prediction. In chapter 5, we propose Bayesian convex and linear aggregation approaches motivated by regression applications. We show that the proposed approach is minimax optimal when the true data-generating model is a convex or linear combination of models in the list. Moreover, the method can adapt to sparsity structure in which certain models should receive zero weights, and the method is tuning parameter free unlike competitors. More generally, under an M-open view when the truth falls outside the space of all convex/linear combinations, our theory suggests that the posterior measure tends to concentrate on the best approximation of the truth at the minimax rate.</p><p>Chapter 6 is devoted to sequential Markov chain Monte Carlo algorithms for Bayesian on-line learning of big data. The last chapter attempts to justify the use of posterior distribution to conduct statistical inferences for semiparametric estimation problems (the semiparametric Bernstein von-Mises theorem) from a frequentist perspective.</p> / Dissertation
|
403 |
Survey analysis| Methodology and application using CHIS dataLevy, Melanie E. 05 December 2014 (has links)
<p> Over the past hundred years, advancements in survey research and understanding of survey methodology and analysis have removed major biases when small numbers of respondents can speak for larger groups in addition to the ability of modem polls to support inferences about populations. This project presents a brief history of survey methodology and utilizes common applied statistical procedures using the 2009 California Health Interview Survey (CHIS). Survey methodology and analysis will be explored through examples including survey linear regression analysis, canonical correlation and multinomial logistic regression. </p><p> This project's goal is to create greater understanding of the survey analysis process, as well as, some of the challenges survey researchers face. With this knowledge more procedures can be adapted to incorporate survey design to expand survey methodology and analysis to reach more diverse research needs. </p>
|
404 |
Doubly censored prevalent cohort survival dataGuo, Hui, 1974- January 2006 (has links)
When survival data are collected as part of a prevalent cohort study, the recruited cases have already experienced their initiating event. When these initiating times are generated from a Stationary Poisson Process, these survival data are termed "length-biased". Our interest is to estimate the Survival function, from the initiating event time, which is often known only with uncertainty, to the failure or censoring time. We derive the likelihood function and propose two methods to estimate the survival function incorporating the possible uncertainty. We verify the methods by the means of simulation. We also illustrate the methods using a real set of data on survival with dementia collected as part of the Canadian Study of Health and Aging (CSHA).
|
405 |
Longitudinal data with change-pointsZhou, Wei, 1968 Jan. 17- January 2005 (has links)
Longitudinal data often consist of a set of covariates and repeated measurements of an outcome over time. In clinical and health studies, a main concern is often the time that a treatment takes into effect, which is referred to as the change-point in this thesis. Modeling and measuring impact of a change in longitudinal data has not been thoroughly explored in the statistical literature. Viewing longitudinal data from the perspective of change-point analysis, we have a data structure that statisticians sometimes refer to as a multi-path change-point problem. / In multipath change-point problems, it is often of interest to assess the impact of covariates on the change-point. We model the effect of covariates through hazard of the change using Cox's proportional odds model. We take a maximum quasi-likelihood (MqL) approach in estimation and statistical inference. To find the MqL estimators of the unknown parameters in our model, we have used a variant of simulated annealing. By the means of simulation we compare our model with a model suggested by Diggle et al. for analyzing a set of longitudinal observations on the body weights of 27 cows.
|
406 |
Statistical analysis of reliability-validity studiesGuo, Tong, 1968- January 1998 (has links)
Reliability and validity studies are common in epidemiology, especially in the course of the development of measure instruments. Reliability is typically assessed quantitatively, through an appropriate coefficient, such as the Intra-class Correlation Coefficient (ICC). Validity, on the other hand, is assessed more informally through a series of tests and checks: for instance, construct validity may be established by testing the significance of factors that are supposed to influence the measure at hand. In general an ICC is calculated as a ratio of variance components to the total variance. Therefore the first step in the calculation of an ICC is the estimation of variance components from an appropriate statistical model. This thesis presents two approaches to the estimation of variance components in the context of reliability and validity studies: One is the ANOVA approach, based on the method of moments and valid especially for the case of balanced data. The other is the mixed linear model approach, for the more general case of unbalanced data. Furthermore, a general framework is developed which permits treatment of reliability and validity within the same statistical model, By means of this statistical model, a special case of the mixed linear model, appropriate ICCs for both reliability and validity can be computed, while construct validity can be established by testing the significance of appropriate fixed effects. The delta method and the bootstrap are proposed for the calculation of the standard errors and confidence intervals of the ICCs. Finally, an example of case-vignette study is presented. All calculations were carried out using the SAS system.
|
407 |
Combining the generalized linear model and spline smoothing to analyze examination dataWang, Xiaohui, 1969- January 1993 (has links)
This thesis aims to propose and specify a way to analyze test data. In order to analyze test data, it is necessary to estimate a special binary regression relation, called the item characteristic curve. The item characteristic curve is a relationship between the probabilities of examinees answering an item correctly and their abilities. / The statistical tools used in this thesis are the generalized linear models and spline smoothing. The method tries to combine the advantages of both parametric modeling and nonparametric regression to get a good estimate of the item characteristic curve. A special basis for spline smoothing is proposed which is motivated by the properties of the item characteristic curve. Based on the estimate of the item characteristic curve by this method, a more stable estimate of the item information function can be generated. Some illustrative analysis of simulated data are presented. The results seem to indicate that this method does have the advantages of both parametric modeling and nonparametric regression: it is faster to compute and more flexible than the methods using parametric models, for example, the three-parameter model in psychometrics, and on the other hand, it generates more stable estimate of derivatives than the purely nonparametric regression.
|
408 |
Laws of large numbers for sequences and arrays of random variablesTilahun, Gelila. January 1996 (has links)
This thesis presents an up-to-date survey of results concerning laws of large numbers for sequences and arrays of random variables. We begin with Kolmogorov's pioneering result, the strong law of large numbers, and preceed through to Hu et al.'s, and Gut's recent result for weakly dominated random variables, for which we provide a simpler proof. We insist in particular on the techniques of proof of Etemadi and Jamison et al. Furthermore, analogues to the Marcinkiewicz-Zygmund theorem are given. This thesis illustrates the trade-off between the existence of higher moments and non i.i.d sequences and arrays of random variables to obtain the strong law of large numbers.
|
409 |
Evaluation of the performance of the generalized estimating equations method for the analysis of crossover designsValois, Marie-France. January 1997 (has links)
Crossover designs are widely used in clinical trials. The main advantage of this type of design is that the treatments are compared within subjects. That is, every subject provides a direct comparison of the treatments he or she has received. In general, a smaller number of subjects is needed to obtain the same precision than with a cross-sectional design. However, because of the correlations within subjects arising from the repeated measurements, the usual analysis of variance based on ordinary least squares (OLS) may be inappropriate to analyze crossover designs. Some approximate likelihood based tests that take into account the structure of the covariance matrix have recently been proposed in the literature. / The aim of this thesis is to compare the performance of the OLS method and two of the approximate likelihood based tests to a non-likelihood based method, the generalized estimating equations, for testing the treatment and carryover effects, in crossover designs, under the assumption of multivariate normality.
|
410 |
A comparative study of approximate tests for cross-over designs using Stein's estimator of the covariance structure of the data /D'Angelo, Giuseppina. January 1997 (has links)
Cross-over designs are often used to compare treatments administered to subjects in different, pre-determined sequence with relatively few trials. The correlations within subjects arising from the repeated measurements may make the usual analysis of variance based on ordinary least squares (OLS) inappropriate to analyze such experiments. The aim of this thesis if to compare how the behavior of several tests used to analyze cross-over trails is affected by different structures and estimators of the covariance matrix. In particular, we replace the within-sequence sample dispersion matrix by Stein's estimator of the covariance matrix as a means of improving the robustness of two alternative tests, namely, a modified F-test approximation (MFA) and an empirical generalized least squares (EGLS) approach. Monte Carlo simulations were run to compare the performance of the two estimators of the covariance matrix using the OLS, MFA and EGLS methods. The results show that the modified F-test approximation gives adequate control of Type I error and that in certain cases, Stein's estimator of the covariance matrix offers an improvement over the within-sequence sample dispersion matrix in terms of robustness.
|
Page generated in 0.0606 seconds