Global ETD Search

11	Exploring network models under sampling Zhou, Shu January 1900 (has links) Master of Science / Department of Statistics / Perla Reyes / Networks are defined as sets of items and their connections. Interconnected items are represented by mathematical abstractions called vertices (or nodes), and the links connecting pairs of vertices are known as edges. Networks are easily seen in everyday life: a network of friends, the Internet, metabolic or citation networks. The increase of available data and the need to analyze network have resulted in the proliferation of models for networks. However, for networks with billions of nodes and edges, computation and inference might not be achieved within a reasonable amount of time or budget. A sampling approach seems a natural choice, but traditional models assume that we can have access to the entire network. Moreover, when data is only available for a sampled sub-network conclusions tend to be extrapolated to the whole network/population without regard to sampling error. The statistical problem this report addresses is the issue of how to sample a sub-network and then draw conclusions about the whole network. Are some sampling techniques better than others? Are there more efficient ways to estimate parameters of interest? In which way can we measure how effectively my method is reproducing the original network? We explore these questions with a simulation study on Mesa High School students' friendship network. First, to assess the characteristics of the whole network, we applied the traditional exponential random graph model (ERGM) and a stochastic blockmodel to the complete population of 205 students. Then, we drew simple random and stratified samples of 41 students, applied the traditional ERGM and the stochastic blockmodel again, and defined a way to generalized the sample findings to the population friendship network of 205 students. Finally, we used the degree distribution and other network statistics to compare the true friendship network with the projected one. We achieved the following important results: 1) as expected stratified sampling outperforms simple random sampling when selecting nodes; 2) ERGM without restrictions offers a poor estimate for most of the tested parameters; and 3) the Bayesian stochastic blockmodel estimation using a strati ed sample of nodes achieves the best results. Complex network sampling Statistics (0463)
12	Robust mixtures of regressions models Bai, Xiuqin January 1900 (has links) Master of Science / Department of Statistics / Weixin Yao / In the fitting of mixtures of linear regression models, the normal assumption has been traditionally used for the error term and then the regression parameters are estimated by the maximum likelihood estimate (MLE) using the EM algorithm. Under the normal assumption, the M step of the EM algorithm uses a weighted least squares estimate (LSE) for the regression parameters. It is well known that the LSE is sensitive to outliers or heavy tailed error distributions. In this report, we propose a robust mixture of linear regression model, which replaces the least square criterion with some robust criteria in the M step of the EM algorithm. In addition, we will use a simulation study to demonstrate how sensitive the traditional mixture regression estimation method is to outliers or heavy tailed error distributions and compare it with our proposed robust mixture regression estimation method. Based on our empirical studies, our proposed robust estimation method works comparably to the traditional estimation method when there are no outliers and the error is normally distributed but is much better if there are outliers or the error has heavy tails (such as t-distribution). A real data set application is also provided to illustrate the effectiveness of our proposed methodology. Robust Mixtures Regression Models Statistics (0463)
13	Efficacy of robust regression applied to fractional factorial treatment structures. McCants, Michael January 1900 (has links) Master of Science / Department of Statistics / James J. Higgins / Completely random and randomized block designs involving n factors at each of two levels are used to screen for the effects of a large number of factors. With such designs it may not be possible either because of costs or because of time to run each treatment combination more than once. In some cases, only a fraction of all the treatments may be run. With a large number of factors and limited observations, even one outlier can adversely affect the results. Robust regression methods are designed to down-weight the adverse affects of outliers. However, to our knowledge practitioners do not routinely apply robust regression methods in the context of fractional replication of 2^n factorial treatment structures. The purpose of this report is examine how robust regression methods perform in this context. Robust regression Fractional factorial Statistics (0463)
14	Individual mediating effects and the concept of terminal measures data Serasinghe, Roshan Niranjala January 1900 (has links) Doctor of Philosophy / Department of Statistics / Gary Gadbury / Researches in the fields in science and statistics often go beyond the two-variable cause-and-effect relationship, and also try to understand what connects the causal relationship and what changes the magnitude or direction of the causal relationship between two variables, predictor(T) and outcome (Y). A mediator (Z) is a third variable that links a cause and an effect, whereby T causes the Z and Z causes Y. In general, a given variable may be said to function as a mediator to the extent that it accounts for the relation between the predictor and the outcome (Baron and Kenny, 1986). The initial question regards the appropriate characterization of a mediation effect. Most studies, when comparing one or more treatments focus on an average mediating effect. This average mediating effect can be misleading when the mediating effects vary from subject to subject in the population. The primary focus of this research is to investigate individual mediating effects in a population, and to define a variance of these individual mediating effects. A concept called subject-mediator (treatment) interaction is presented and its role in evaluating a mediator’s behavior on a population of units is studied. This is done using a framework sometimes called a counterfactual model. Some common experimental designs that provide different knowledge about this interaction term are studied. The subgroup analysis is the most common analytic approach for examining heterogeneity of mediating effects. In mediation analysis, situations can arise where Z and Y cannot both be measured on an individual unit. We refer to such data as terminal measures data. We show a design where a mediating effect cannot be estimated in terminal measures data and another one where it can be, with an assumption. The assumption is linked to the idea of pseudo-replication. These ideas are discussed and a simulation study illustrates the issues involved when analyzing terminal measures data. We know of no methods that are currently available that specifically address terminal measures data. Individual indirect effects Heterogeneity Mediation Statistics (0463)
15	A simulation comparison of two methods for controlling the experiment-wise Type I error rate of correlated tests for contrasts in one-way completely randomized designs Jiao, Yuanfang January 1900 (has links) Master of Science / Department of Statistics / Paul I. Nelson / A Bonferroni and an ordered P-value solution to the problem of controlling the experiment-wise Type I error rate are studied and compared in terms of actual size and power when carrying out correlated tests. Although both of these solutions can be used in a wide variety of settings, here they are only investigated in the context of multiple testing that specified pairwise comparisons of means, selected before data are collected, are all equal to zero in a completely randomized, balanced, one factor design where the data are independent random samples from normal distributions all having the same variance. Simulations indicate that both methods are very similar and effective in controlling experiment wise type error at a nominal rate of 0.05. Because the ordered P-value method has, almost uniformly, slightly greater power, it is my recommendation for use in the setting of this report. Experiment-wise type I error Statistics (0463)
16	Nonparametric tests for longitudinal data Dong, Lei January 1900 (has links) Master of Science / Department of Statistics / Haiyan Wang / The purpose of this report is to numerically compare several tests that are applicable to longitudinal data when the experiment contains a large number of treatments or experimental conditions. Such data are increasingly common as technology advances. Of interest is to evaluate if there is any significant main effect of treatment or time, and their interactions. Traditional methods such as linear mixed-effects models (LME), generalized estimating equations (GEE), Wilks' lambda, Hotelling-Lawley, and Pillai's multivariate tests were developed under either parametric distributional assumptions or the assumption of large number of replications. A few recent tests, such as Zhang (2008), Bathke & Harrar (2008), and Bathke & Harrar (2008) were specially developed for the setting of large number of treatments with possibly small replications. In this report, I will present some numerical studies regarding these tests. Performance of these tests will be presented for data generated from several distributions. Longitudinal data Nonparametric tests Statistics (0463)
17	Some limit behaviors for the LS estimators in errors-in-variables regression model Chen, Shu January 1900 (has links) Master of Science / Department of Statistics / Weixing Song / There has been a continuing interest among statisticians in the problem of regression models wherein the independent variables are measured with error and there is considerable literature on the subject. In the following report, we discuss the errors-in-variables regression model: yi = β0 + β1xi + β2zi + ϵi,Xi = xi + ui,Zi = zi + vi with i.i.d. errors (ϵi, ui, vi), for i = 1, 2, ..., n and find the least square estimators for the parameters of interest. Both weak and strong consistency for the least square estimators βˆ0, βˆ1, and βˆ2 of the unknown parameters β0, β1, and β2 are obtained. Moreover, under regularity conditions, the asymptotic normalities of the estimators are reported. Errors-in-variables regression model Statistics (0463)
18	The robustness of confidence intervals for effect size in one way designs with respect to departures from normality Hembree, David January 1900 (has links) Master of Science / Department of Statistics / Paul Nelson / Effect size is a concept that was developed to bridge the gap between practical and statistical significance. In the context of completely randomized one way designs, the setting considered here, inference for effect size has only been developed under normality. This report is a simulation study investigating the robustness of nominal 0.95 confidence intervals for effect size with respect to departures from normality in terms of their coverage rates and lengths. In addition to the normal distribution, data are generated from four non-normal distributions: logistic, double exponential, extreme value, and uniform. The report discovers that the coverage rates of the logistic, double exponential, and extreme value distributions drop as effect size increases, while, as expected, the coverage rate of the normal distribution remains very steady at 0.95. In an interesting turn of events, the uniform distribution produced higher than 0.95 coverage rates, which increased with effect size. Overall, in the scope of the settings considered, normal theory confidence intervals for effect size are robust for small effect size and not robust for large effect size. Since the magnitude of effect size is typically not known, researchers are advised to investigate the assumption of normality before constructing normal theory confidence intervals for effect size. Effect size Robustness Confidence interval Statistics (0463)
19	Robust linear regression Bai, Xue January 1900 (has links) Master of Science / Department of Statistics / Weixin Yao / In practice, when applying a statistical method it often occurs that some observations deviate from the usual model assumptions. Least-squares (LS) estimators are very sensitive to outliers. Even one single atypical value may have a large effect on the regression parameter estimates. The goal of robust regression is to develop methods that are resistant to the possibility that one or several unknown outliers may occur anywhere in the data. In this paper, we review various robust regression methods including: M-estimate, LMS estimate, LTS estimate, S-estimate, [tau]-estimate, MM-estimate, GM-estimate, and REWLS estimate. Finally, we compare these robust estimates based on their robustness and efficiency through a simulation study. A real data set application is also provided to compare the robust estimates with traditional least squares estimator. Linear regression model Robust regression Statistics (0463)
20	Semi-parametric estimation in Tobit regression models Chen, Chunxia January 1900 (has links) Master of Science / Department of Statistics / Weixing Song / In the classical Tobit regression model, the regression error term is often assumed to have a zero mean normal distribution with unknown variance, and the regression function is assumed to be linear. If the normality assumption is violated, then the commonly used maximum likelihood estimate becomes inconsistent. Moreover, the likelihood function will be very complicated if the regression function is nonlinear even the error density is normal, which makes the maximum likelihood estimation procedure hard to implement. In the full nonparametric setup when both the regression function and the distribution of the error term [epsilon] are unknown, some nonparametric estimators for the regression function has been proposed. Although the assumption of knowing the distribution is strict, it is a widely adopted assumption in Tobit regression literature, and is also confirmed by many empirical studies conducted in the econometric research. In fact, a majority of the relevant research assumes that [epsilon] possesses a normal distribution with mean 0 and unknown standard deviation. In this report, we will try to develop a semi-parametric estimation procedure for the regression function by assuming that the error term follows a distribution from a class of 0-mean symmetric location and scale family. A minimum distance estimation procedure for estimating the parameters in the regression function when it has a specified parametric form is also constructed. Compare with the existing semiparametric and nonparametric methods in the literature, our method would be more efficient in that more information, in particular the knowledge of the distribution of [epsilon], is used. Moreover, the computation is relative inexpensive. Given lots of application does assume that [epsilon] has normal or other known distribution, the current work no doubt provides some more practical tools for statistical inference in Tobit regression model. Semi-parametric Tobit regression models Statistics (0463)

Search results