121 |
Rahapelaaminen Suomessa vuonna 2015 ja siihen liittyvät ongelmatErvasti, E. (Eetu) 23 January 2019 (has links)
Tämän tutkielman tarkoituksena on saada tietoa suomalaisten 15–74 vuotiaiden rahapelaamiseen käyttämästä rahamääräästä ja siihen vaikuttavista tekijöistä. Tarkoitus on myös tutkia rahapelaamisen aiheuttamia ongelmia rahapelaajien läheisille sekä selvittää suomalaisten mielipiteitä ongelmapelaamisesta Suomessa. Tarkoitus on myös selvittää vastaajien välisiä eroja sukupuolen, iän sekä koulutusasteen mukaan. Tutkielmassa käytetty ainesto pohjautuu Terveyden ja hyvinvoinnin laitoksen vuonna 2015 toteutettaman tutkimukseen, Rahapelitutkimus 2015.
|
122 |
Using Applied Statistics to Study a Pharmaceutical Manufacturing ProcessTiani, John P 30 April 2004 (has links)
The pharmaceutical manufacturing process of interest produces a suspension for inhalation. Currently, the product is manufactured on two lines. A third and fourth line are in the process of being commissioned and plans are currently in place to construct three additional lines. The manufacturing lines operate independently of one another. Each manufacturing line consists of two actives compounding tanks so their utilization can be rotated to improve manufacturing capacity. The objective of this project was to study the content uniformity assay values for the 0.25 mg/mL (0.5 mg) manufacturing process through the application of statistical techniques. The study focused on three separate topics: 1. Monitoring process behavior for content uniformity assay values 2. Ascertaining the equivalence of batches manufactured on Line 1 vs Line 2. 3. Monitoring the signal to noise ratio of the content uniformity assay values In order to accomplish the three tasks above, the following statistical techniques were applied: 1. Control chart techniques were applied to the data, including standard control chart techniques (x-bar and S), individuals control chart techniques, and modified limits. 2. An equivalence test for the means of the two processes was conducted. 3. A new control chart, the SNR chart, was developed and implemented. The results/conclusions of the application of statistical techniques were: 1. The content uniformity assay values were in statistical process control with respect to modified limit control chart techniques. 2. The Line 1 and 2 data were statistically equivalent. 3. The quantity (x-bar / s) was in statistical process control. The SNR control chart displayed superior performance to the Individuals control chart.
|
123 |
Missing plot techniquesWu, Jinglan January 2010 (has links)
Digitized by Kansas Correctional Industries
|
124 |
Statistical Methods for High Dimensional Variable SelectionLi, Kaiqiao 19 April 2019 (has links)
<p> This thesis focuses on high dimensional variable selection and addresses the limitation of existing penalized likelihood-based prediction models, as well as multiple hypothesis testing issues in jump detection. In the first project, we proposed a weighted sparse network learning method which allows users to first estimate a data driven network with sparsity property. The estimated network is then optimally combined using a weighted approach to a known or partially known network structure. We adapted the ℓ<sub>1</sub> penalties and proved the oracle property of our proposed model which aims to improve the accuracy of parameter estimation and achieves a parsimonious model in high dimensional setting. We further implemented a stability selection method for tuning the parameters and compared its performance to the cross-validation approach. We implemented our proposed framework for several generalized linear models including the Gaussian, logistic, and Cox proportional hazards (partial) models. We carried out extensive Monte Carlo simulations and compared the performance of our proposed model to the existing methods. Results showed that in the absence of prior information for constructing known network, our approach showed significant improvement over the elastic net models using data driven estimated network structure. On the other hand, if the prior network is correctly specified in advance, our prediction model significantly outperformed other methods. Results further showed that our proposed method is robust to network misspecification and the ℓ<sub>1</sub> penalty improves the prediction and variable selection regardless of the magnitude of the effects size. We also found that the stability selection method achieved a more robust parameter tuning results compared to the cross-validation approach, for all three phenotypes (continuous, binary and survival) considered in our simulation studies. Case studies on proteomic ovarian cancer and gene expression skin cutaneous melanoma further demonstrated that our proposed model achieved good operating characteristics in predicting response to platinum-based chemotherapy and survival risk. We further extended our work in statistical predictive learning in nonlinear prediction, where the traditional generalized linear models are insufficient. Nonlinear methods such as kernel methods show a great power in mapping the nonlinear space to a linear space, which can be easily incorporated into generalized linear models. This thesis demonstrated how to apply multiple kernel tricks to generalized linear model. Results from simulation shows that our proposed multiple kernel learning method can successfully identify the nonlinear likelihood functions under various scenarios. </p><p> The second project concerns jump detection in high frequency financial data. Nonparametric tests are popular and efficient methods for detecting jumps in high frequency financial data. Each method has its own advantageous and disadvantageous and their performance could be affected by the underlying noise and dynamic structure. To address this, we proposed a robust <i> p</i>-values pooling method which aims to combine the advantages of each method. We focus on model validation within a Monte Carlo framework to assess the reproducibility and false discovery rate. Reproducible analysis via correspondence curve and irreproducible discovery rate were analyzed with replicates to study local dependency and robustness across replicates. Extensive simulation studies of high frequency trading data at the minute level were carried out and the operating characteristics of these methods were compared via the false discovery rate control (FDR) framework. Our proposed method was robust across all scenario under reproducibility and FDR analysis. Finally, we applied the method to minute level data from the Limit Order Book System—the Efficient Reconstruction System (LOBSTER). An R package <b>JumpTest</b> implementing these methods is made available on the Comprehensive R Archive Network (CRAN).</p><p>
|
125 |
A Study of Some Issues of Goodness-of-Fit Tests for Logistic RegressionUnknown Date (has links)
Goodness-of-fit tests are important to assess how well a model fits a set of observations. Hosmer-Lemeshow (HL) test is a popular and commonly used method to assess the goodness-of-fit for logistic regression. However, there are two issues for using HL test. One of them is that we have to specify the number of partition groups and the different groups often suggest the different decisions. So in this study, we propose several grouping tests to combine multiple HL tests with varying the number of groups to make the decision instead of just using one arbitrary group or finding the optimum group. This is due to the reason that the best selection for the groups is data-dependent and it is not easy to find. The other drawback of HL test is that it is not powerful to detect the violation of missing interactions between continuous and dichotomous covariates. Therefore, we propose global and interaction tests in order to capture such violations. Simulation studies are carried out to assess the Type I errors and powers for all the proposed tests. These tests are illustrated by the bone mineral density data from NHANES III. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Summer Semester 2018. / July 17, 2018. / Includes bibliographical references. / Dan McGee, Professor Co-Directing Dissertation; Qing Mai, Professor Co-Directing Dissertation; Cathy Levenson, University Representative; Xufeng Niu, Committee Member.
|
126 |
Statistical methods for indirectly observed network dataMcCormick, Tyler H. January 2011 (has links)
Social networks have become an increasingly common framework for understanding and explaining social phenomena. Yet, despite an abundance of sophisticated models, social network research has yet to realize its full potential, in part because of the difficulty of collecting social network data. In many cases, particularly in the social sciences, collecting complete network data is logistically and financially challenging. In contrast, Aggregated Relational Data (ARD) measure network structure indirectly by asking respondents how many connections they have with members of a certain subpopulation (e.g. How many individuals with HIV/AIDS do you know?). These data require no special sampling procedure and are easily incorporated into existing surveys. This research develops a latent space model for ARD. This dissertation proposes statistical methods for methods for estimating social network and population characteristics using one type of social network data collected using standard surveys. First, a method to estimate both individual social network size (i.e., degree) and the distribution of network sizes in a population is prosed. A second method estimates the demographic characteristics of hard-to-reach groups, or latent demographic profiles. These groups, such as those with HIV/AIDS, unlawful immigrants, or the homeless, are often excluded from the sampling frame of standard social science surveys. A third method develops a latent space model for ARD. This method is similar in spirit to previous latent space models for networks (see Hoff, Raftery and Handcock (2002), for example) in that the dependence structure of the network is represented parsimoniously in a multidimensional geometric space. The key distinction from the complete network case is that instead of conditioning on the (latent) distance between two members of the network, the latent space model for ARD conditions on the expected distance between a survey respondent and the center of a subpopulation in the latent space. A spherical latent space facilitates tractable computation of this expectation. This model estimates relative homogeneity between groups in the population and variation in the propensity for interaction between respondents and group members.
|
127 |
Some Nonparametric Methods for Clinical Trials and High Dimensional DataWu, Xiaoru January 2011 (has links)
This dissertation addresses two problems from novel perspectives. In chapter 2, I propose an empirical likelihood based method to nonparametrically adjust for baseline covariates in randomized clinical trials and in chapter 3, I develop a survival analysis framework for multivariate K-sample problems. (I): Covariate adjustment is an important tool in the analysis of randomized clinical trials and observational studies. It can be used to increase efficiency and thus power, and to reduce possible bias. While most statistical tests in randomized clinical trials are nonparametric in nature, approaches for covariate adjustment typically rely on specific regression models, such as the linear model for a continuous outcome, the logistic regression model for a dichotomous outcome, and the Cox model for survival time. Several recent efforts have focused on model-free covariate adjustment. This thesis makes use of the empirical likelihood method and proposes a nonparametric approach to covariate adjustment. A major advantage of the new approach is that it automatically utilizes covariate information in an optimal way without fitting a nonparametric regression. The usual asymptotic properties, including the Wilks-type result of convergence to a chi-square distribution for the empirical likelihood ratio based test, and asymptotic normality for the corresponding maximum empirical likelihood estimator, are established. It is also shown that the resulting test is asymptotically most powerful and that the estimator for the treatment effect achieves the semiparametric efficiency bound. The new method is applied to the Global Use of Strategies to Open Occluded Coronary Arteries (GUSTO)-I trial. Extensive simulations are conducted, validating the theoretical findings. This work is not only useful for nonparametric covariate adjustment but also has theoretical value. It broadens the scope of the traditional empirical likelihood inference by allowing the number of constraints to grow with the sample size. (II): Motivated by applications in high-dimensional settings, I propose a novel approach to testing equality of two or more populations by constructing a class of intensity centered score processes. The resulting tests are analogous in spirit to the well-known class of weighted log-rank statistics that is widely used in survival analysis. The test statistics are nonparametric, computationally simple and applicable to high-dimensional data. We establish the usual large sample properties by showing that the underlying log-rank score process converges weakly to a Gaussian random field with zero mean under the null hypothesis, and with a drift under the contiguous alternatives. For the Kolmogorov-Smirnov-type and the Cramer-von Mises-type statistics, we also establish the consistency result for any fixed alternative. As a practical means to obtain approximate cutoff points for the test statistics, a simulation based resampling method is proposed, with theoretical justification given by establishing weak convergence for the randomly weighted log-rank score process. The new approach is applied to a study of brain activation measured by functional magnetic resonance imaging when performing two linguistic tasks and also to a prostate cancer DNA microarray data set.
|
128 |
On testing the change-point in the longitudinal bent line quantile regression modelSha, Nanshi January 2011 (has links)
The problem of detecting changes has been receiving considerable attention in various fields. In general, the change-point problem is to identify the location(s) in an ordered sequence that divides this sequence into groups, which follow different models. This dissertation considers the change-point problem in quantile regression for observational or clinical studies involving correlated data (e.g. longitudinal studies) . Our research is motivated by the lack of ideal inference procedures for such models. Our contributions are two-fold. First, we extend the previously reported work on the bent line quantile regression model [Li et al. (2011)] to a longitudinal framework. Second, we propose a score-type test for hypothesis testing of the change-point problem using rank-based inference. The proposed test in this thesis has several advantages over the existing inferential approaches. Most importantly, it circumvents the difficulties of estimating nuisance parameters (e.g. density function of unspecified error) as required for the Wald test in previous works and thus is more reliable in finite sample performance. Furthermore, we demonstrate, through a series of simulations, that the proposed methods also outperform the extensively used bootstrap methods by providing more accurate and computationally efficient confidence intervals. To illustrate the usage of our methods, we apply them to two datasets from real studies: the Finnish Longitudinal Growth Study and an AIDS clinical trial. In each case, the proposed approach sheds light on the response pattern by providing an estimated location of abrupt change along with its 95% confidence interval at any quantile of interest "” a key parameter with clinical implications. The proposed methods allow for different change-points at different quantile levels of the outcome. In this way, they offer a more comprehensive picture of the covariate effects on the response variable than is provided by other change-point models targeted exclusively on the conditional mean. We conclude that our framework and proposed methodology are valuable for studying the change-point problem involving longitudinal data.
|
129 |
Self-controlled methods for postmarketing drug safety surveillance in large-scale longitudinal dataSimpson, Shawn E. January 2011 (has links)
A primary objective in postmarketing drug safety surveillance is to ascertain the relationship between time-varying drug exposures and adverse events (AEs) related to health outcomes. Surveillance can be based on longitudinal observational databases (LODs), which contain time-stamped patient-level medical information including periods of drug exposure and dates of diagnoses. Due to its desirable properties, we focus on the self-controlled case series (SCCS) method for analysis in this context. SCCS implicitly controls for fixed multiplicative baseline covariates since each individual acts as their own control. In addition, only exposed cases are required for the analysis, which is computationally advantageous. In the first part of this work we present how the simple SCCS model can be applied to the surveillance problem, and compare the results of simple SCCS to those of existing methods. Many current surveillance methods are based on marginal associations between drug exposures and AEs. Such analyses ignore confounding drugs and interactions and have the potential to give misleading results. In order to avoid these difficulties, it is desirable for an analysis strategy to incorporate large numbers of time-varying potential confounders such as other drugs. In the second part of this work we propose the Bayesian multiple SCCS approach, which deals with high dimensionality and can provide a sparse solution via a Laplacian prior. We present details of the model and optimization procedure, as well as results of empirical investigations. SCCS is based on a conditional Poisson regression model, which assumes that events at different time points are conditionally independent given the covariate process. This requirement is problematic when the occurrence of an event can alter the future event risk. In a clinical setting, for example, patients who have a first myocardial infarction (MI) may be at higher subsequent risk for a second. In the third part of this work we propose the positive dependence self-controlled case series (PD-SCCS) method: a generalization of SCCS that allows the occurrence of an event to increase the future event risk, yet maintains the advantages of the original by controlling for fixed baseline covariates and relying solely on data from cases. We develop the model and compare the results of PD-SCCS and SCCS on example drug-AE pairs.
|
130 |
Contributions to Semiparametric Inference to Biased-Sampled and Financial DataSit, Tony January 2012 (has links)
This thesis develops statistical models and methods for the analysis of life-time and financial data under the umbrella of semiparametric framework. The first part studies the use of empirical likelihood on Levy processes that are used to model the dynamics exhibited in the financial data. The second part is a study of inferential procedure for survival data collected under various biased sampling schemes in transformation and the accelerated failure time models. During the last decade Levy processes with jumps have received increasing popularity for modelling market behaviour for both derivative pricing and risk management purposes. Chan et al. (2009) introduced the use of empirical likelihood methods to estimate the parameters of various diffusion processes via their characteristic functions which are readily available in most cases. Return series from the market are used for estimation. In addition to the return series, there are many derivatives actively traded in the market whose prices also contain information about parameters of the underlying process. This observation motivates us to combine the return series and the associated derivative prices observed at the market so as to provide a more reflective estimation with respect to the market movement and achieve a gain in efficiency. The usual asymptotic properties, including consistency and asymptotic normality, are established under suitable regularity conditions. We performed simulation and case studies to demonstrate the feasibility and effectiveness of the proposed method. The second part of this thesis investigates a unified estimation method for semiparametric linear transformation models and accelerated failure time model under general biased sampling schemes. The methodology proposed is first investigated in Paik (2009) in which the length-biased case is considered for transformation models. The new estimator is obtained from a set of counting process-based unbiased estimating equations, developed through introducing a general weighting scheme that offsets the sampling bias. The usual asymptotic properties, including consistency and asymptotic normality, are established under suitable regularity conditions. A closed-form formula is derived for the limiting variance and the plug-in estimator is shown to be consistent. We demonstrate the unified approach through the special cases of left truncation, length-bias, the case-cohort design and variants thereof. Simulation studies and applications to real data sets are also presented.
|
Page generated in 0.0401 seconds