31 |
Re-calibration and discrimination in survival risk prediction functionsSong, Linye 22 January 2016 (has links)
Risk prediction models are important tools intended to help clinicians make optimal treatment decisions. They are often developed on large reference samples for applications in different local cohorts. For example, consider transporting the US Framingham risk prediction function for coronary heart disease (CHD) to populations in Europe or Asia. In this process it is necessary to correctly re-calibrate the existing function for future applications.
In this thesis we propose a new re-calibration method which could be used when transporting the risk function from a reference to the local cohort. This new method is compared with the existing re-calibration methods through numerical simulations under various assumptions and on real-life population data. In a majority of settings it outperforms the existing methods. We also explore the strengths and limitations of each re-calibration method and provide guidance for practical use of these methods. The re-calibration methods described can be used for any risk prediction models based on Cox proportional hazard regression. To facilitate convenient application we present an easy to use SAS macro.
Another essential feature of a successful risk prediction model is characterized by its discrimination or its ability to separate those with events from those without events. One of the most popular measures of discrimination is the area under the Receiver Operating Characteristic (ROC) curve, often called the c statistic or just area under the curve (AUC). Various authors have extended the AUC from binary outcome applications to survival data. However, these extensions are not unique. In this thesis we compare four of these extensions using simulations and practical applications to the Framingham risk functions as well as a breast cancer risk model. We conclude that the extension proposed by Harrell and described in detail by Pencina & D'Agostino is a metric that is most consistent with the most appropriate definition of discrimination in survival. We construct a SAS code for its consistent estimator based on the work of Uno et al. We also notice large differences in magnitude between various C indices calculated on the same data and caution against comparisons across different C indices.
|
32 |
Polygenic prediction and GWAS of depression, PTSD, and suicidal ideation/self-harm in a Peruvian cohortShen, Hanyang, Gelaye, Bizu, Huang, Hailiang, Rondon, Marta B., Sanchez, Sixto, Duncan, Laramie E. 01 September 2020 (has links)
LED and HS have been funded by startup funds from Stanford and a pilot grant to LED from the Stanford Center for Clinical and Translation Research and Education (UL1 TR001085, PI Greenberg). LED has also been funded by Cohen Veterans Bioscience (CVB), and she is part of the CVB Working Group for PTSD Adaptive Platform Trial. BG has been funded by the NIH (R01-HD-059835, PI Williams) and CVB. HH has been funded by the NIH (NIH K01DK114379 and NIH R21AI139012), the Zhengxu and Ying He Foundation, and the Stanley Center for Psychiatric Research. MBR received funds from WPA Congress Mexico City 2018, Guayaquil CEPAM 2019, Asunción X CONGRESO LATINOAMERICANO DE LA FLAPB 2018, Guayaquil 2019 (Bago), and Lancet Psychiatry, London (commission on Violence against women) 2019. SS declares no potential conflict of interest. / Genome-wide approaches including polygenic risk scores (PRSs) are now widely used in medical research; however, few studies have been conducted in low- and middle-income countries (LMICs), especially in South America. This study was designed to test the transferability of psychiatric PRSs to individuals with different ancestral and cultural backgrounds and to provide genome-wide association study (GWAS) results for psychiatric outcomes in this sample. The PrOMIS cohort (N = 3308) was recruited from prenatal care clinics at the Instituto Nacional Materno Perinatal (INMP) in Lima, Peru. Three major psychiatric outcomes (depression, PTSD, and suicidal ideation and/or self-harm) were scored by interviewers using valid Spanish questionnaires. Illumina Multi-Ethnic Global chip was used for genotyping. Standard procedures for PRSs and GWAS were used along with extra steps to rule out confounding due to ancestry. Depression PRSs significantly predicted depression, PTSD, and suicidal ideation/self-harm and explained up to 0.6% of phenotypic variation (minimum p = 3.9 × 10−6). The associations were robust to sensitivity analyses using more homogeneous subgroups of participants and alternative choices of principal components. Successful polygenic prediction of three psychiatric phenotypes in this Peruvian cohort suggests that genetic influences on depression, PTSD, and suicidal ideation/self-harm are at least partially shared across global populations. These PRS and GWAS results from this large Peruvian cohort advance genetic research (and the potential for improved treatments) for diverse global populations. / National Institutes of Health / Revisión por pares
|
33 |
Gender-Choice Behavior Linkages: An Investigation in the Hospitality IndustryYavas, Ugur, Karatepe, Osman M., Babakus, Emin 01 January 2015 (has links)
Purpose-The purpose of this study is to investigate whether males and females differ on the emphases they place on core service and relational service in choosing a hotel. Design/Methodology-Data were gathered from the residents of a metro area in the United States. Three hundred and forty-one residents participated in the study. The Del statistic, an undertapped technique, was used. Findings-The results reveal that male and female guests are essentially the same in the importance they place on core and relational services in choosing a hotel. Originality of the research-Empirical research about the hotel choice behavior of female guests is scarce. Therefore, this study addresses this underresearched issue.
|
34 |
APPLICATIONS OF THE HARDY-WEINBERG PRINCIPLE TO DETECTION OF LINKAGE DISEQUILIBRIUM AND GENOTYPING ERRORS IN THE CONTEXT OF ASSOCIATION STUDIESLondono-Vasquez, Douglas 08 June 2007 (has links)
No description available.
|
35 |
Mixed Model Selection Based on the Conceptual Predictive StatisticWenren, Cheng 05 August 2014 (has links)
No description available.
|
36 |
Comparison of Prediction Intervals for the Gumbel DistributionFang, Lin 06 1900 (has links)
<p> The problem of obtaining a prediction interval at specified confidence level to contain k future observations from the Gumbel distribution, based on an observed sample from the same distribution, is considered. An existing method due to Hahn, which is originally valid for the normal, is adapted to the Gumbel case. Motivated by the equivalence between Hahn's prediction intervals and Bayesian predictive intervals for the normal, we develop Bayesian predictive intervals for the Gumbel in the case where the scale parameter b is both known and unknown. Furthermore, we perform comparison of Hahn's and Bayesian intervals. We find that the Bayesian is better in the b known case, while Hahn and Bayes perform about the same in the other case when b is unknown. We then consider the maximum of the Hahn's and Bayesian predicted lower limits which is shown to be a better predictor when b is unknown.
All the discussions are based on Monte Carlo simulations. In the end, the results are
applied to Ontario Power Generation data on feeder thicknesses.</p> / Thesis / Master of Science (MSc)
|
37 |
Cluster_Based Profile Monitoring in Phase I AnalysisChen, Yajuan 26 March 2014 (has links)
Profile monitoring is a well-known approach used in statistical process control where the quality of the product or process is characterized by a profile or a relationship between a response variable and one or more explanatory variables. Profile monitoring is conducted over two phases, labeled as Phase I and Phase II. In Phase I profile monitoring, regression methods are used to model each profile and to detect the possible presence of out-of-control profiles in the historical data set (HDS). The out-of-control profiles can be detected by using the statis-tic. However, previous methods of calculating the statistic are based on using all the data in the HDS including the data from the out-of-control process. Consequently, the ability of using this method can be distorted if the HDS contains data from the out-of-control process. This work provides a new profile monitoring methodology for Phase I analysis. The proposed method, referred to as the cluster-based profile monitoring method, incorporates a cluster analysis phase before calculating the statistic.
Before introducing our proposed cluster-based method in profile monitoring, this cluster-based method is demonstrated to work efficiently in robust regression, referred to as cluster-based bounded influence regression or CBI. It will be demonstrated that the CBI method provides a robust, efficient and high breakdown regression parameter estimator. The CBI method first represents the data space via a special set of points, referred to as anchor points. Then a collection of single-point-added ordinary least squares regression estimators forms the basis of a metric used in defining the similarity between any two observations. Cluster analysis then yields a main cluster containing at least half the observations, with the remaining observations comprising one or more minor clusters. An initial regression estimator arises from the main cluster, with a group-additive DFFITS argument used to carefully activate the minor clusters through a bounded influence regression frame work. CBI achieves a 50% breakdown point, is regression equivariant, scale and affine equivariant and distributionally is asymptotically normal. Case studies and Monte Carlo results demonstrate the performance advantage of CBI over other popular robust regression procedures regarding coefficient stabil-ity, scale estimation and standard errors.
The cluster-based method in Phase I profile monitoring first replaces the data from each sampled unit with an estimated profile, using some appropriate regression method. The estimated parameters for the parametric profiles are obtained from parametric models while the estimated parameters for the nonparametric profiles are obtained from the p-spline model. The cluster phase clusters the profiles based on their estimated parameters and this yields an initial main cluster which contains at least half the profiles. The initial estimated parameters for the population average (PA) profile are obtained by fitting a mixed model (parametric or nonparametric) to those profiles in the main cluster. Profiles that are not contained in the initial main cluster are iteratively added to the main cluster provided their statistics are "small" and the mixed model (parametric or nonparametric) is used to update the estimated parameters for the PA profile. Those profiles contained in the final main cluster are considered as resulting from the in-control process while those not included are considered as resulting from an out-of-control process. This cluster-based method has been applied to monitor both parametric and nonparametric profiles. A simulated example, a Monte Carlo study and an application to a real data set demonstrates the detail of the algorithm and the performance advantage of this proposed method over a non-cluster-based method is demonstrated with respect to more accurate estimates of the PA parameters and improved classification performance criteria.
When the profiles can be represented by vectors, the profile monitoring process is equivalent to the detection of multivariate outliers. For this reason, we also compared our proposed method to a popular method used to identify outliers when dealing with a multivariate response. Our study demonstrated that when the out-of-control process corresponds to a sustained shift, the cluster-based method using the successive difference estimator is clearly the superior method, among those methods we considered, based on all performance criteria. In addition, the influence of accurate Phase I estimates on the performance of Phase II control charts is presented to show the further advantage of the proposed method. A simple example and Monte Carlo results show that more accurate estimates from Phase I would provide more efficient Phase II control charts. / Ph. D.
|
38 |
Educação matemática nos cursos superiores de tecnologia: revelações sobre a formação estatísticaCosta, Claudinei Aparecido da 19 June 2013 (has links)
Made available in DSpace on 2016-04-27T16:57:26Z (GMT). No. of bitstreams: 1
Claudinei Aparecido da Costa.pdf: 2907657 bytes, checksum: 1da84c41b206c8147c1d82c3b0dfc66d (MD5)
Previous issue date: 2013-06-19 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / This research aims to contribute to the comprehension of the potentialities and the settings Statistics in a learner training from Professional Education in Technological level by considering formative aspects inherent to this learning model. In a specific way, we purpose to: (1) do a documentary analysis from higher education degree in technology, the rising of these courses in Brazil, its legal aspects and curricular guidelines and its marks of Mathematics and, also, Statistics Education in the technological courses; (2) make a study about the importance of Statistics in technologist's instruction, its path over the last several decades, the rising of Statistics Education and the competences which compose its core: statistic literacy, reasoning and thinking; (3) bring forth curriculum analysis in action, by identifying information which can provide to us some hints to reconstruction of different meanings experienced in its history; (4) discuss the comprehension that is in curriculum of Professional Education in Technological level, by evaluating how the meanings that were given to Statistics in the organization and development of this curriculum. It is a qualitative research, taking into account the documentary analysis and oral interviews. The results have shown the predominant traditional focus on Statistic education process, with particularly highlighting issues to the absence of use of statistical programs in Statistic subject. The outgoing students indicated needs of changes and suggested that we must provide contextualized examples in student s areas of interest and we must appreciate the work group. This allow the active student participation to construction of knowledge, under the premise the statistic literacy, reasoning and thinking / Esta pesquisa tem como objetivo compreender a configuração e as potencialidades da
Estatística na formação dos alunos da Educação Profissional de nível Tecnológico,
considerando os aspectos formativos inerentes a esta modalidade de ensino. De
modo mais específico, propõe-se a: (1) realizar uma análise documental dos cursos
superiores de tecnologia, seu surgimento no Brasil, os aspectos legais, as diretrizes
curriculares e as marcas da Educação Matemática, bem como da Educação Estatística
no ensino tecnológico; (2) realizar um estudo sobre a importância da Estatística na
formação dos tecnólogos, sua trajetória no decorrer das últimas décadas, o
surgimento da Educação Estatística e as competências que compõem seu núcleo:
literacia, raciocínio e pensamento estatísticos; (3) produzir uma análise do currículo
em ação, identificando informações que nos proporcionem algumas pistas para a
reconstrução de seus diferentes significados ao longo da história; (4) discutir a
compreensão presente nos currículos de Educação Profissional de nível Tecnológico,
avaliando significados atribuídos à Estatística na organização e desenvolvimento
desses currículos. É uma pesquisa de natureza qualitativa, considerando a análise de
documentos e a entrevista como instrumentos de coleta de dados. Os resultados
revelam a predominância do enfoque tradicional no processo de ensino da
Estatística. Na opinião dos egressos, destaca-se a inexistência da utilização de
softwares estatísticos nesta disciplina. Indica-se a necessidade de mudança nessa
prática e sugere-se partir de exemplos contextualizados nas áreas de interesse dos
alunos e valorizar o trabalho em grupo. Isto permitirá a participação ativa deles na
construção do conhecimento, tendo como premissas a literacia, o raciocínio e o
pensamento estatístico
|
39 |
Distribution Theory of Some Nonparametric Statistics via Finite Markov Chain Imbedding TechniqueLee, Wan-Chen 16 April 2014 (has links)
The ranking method used for testing the equivalence of two distributions has been studied for decades and is widely adopted for its simplicity. However, due to the complexity of calculations, the power of the test is either estimated by normal approximation or found when an appropriate alternative is given. Here, via a Finite Markov chain imbedding (FMCI) technique, we are able to establish the marginal and joint distributions of the rank statistics considering the shift and scale parameters, respectively and simultaneously, under two continuous distribution functions. Furthermore, the procedures of distribution equivalence tests and their power functions are discussed. Numerical results of a joint distribution of two rank statistics under the standard normal distribution and the powers for a sequence of alternative normal distributions with mean from -20 to 20 and standard deviation from 1 to 9 and their reciprocals are presented. In addition, we discuss the powers of the rank statistics under the Lehmann alternatives.
Wallenstein et. al. (1993, 1994) discussed power via combinatorial calculations for the scan statistic against a pulse alternative; however, unless certain proper conditions are given, computational difficulties exist. Our work extends their results and provides
an alternative way to obtain the distribution of a scan statistic under various alternative conditions. An efficient and intuitive expression for the distribution as well as the power of the scan statistic are introduced via the FMCI. The numerical results of the exact power for a discrete scan statistic against various conditions are presented. Powers through the finite Markov chain imbedding method and a combinatorial algorithm for a continuous scan statistic against a pulse alternative of a higher risk for a disease on a specified subinterval time are also discussed and compared.
|
40 |
Inference for the K-sample problem based on precedence probabilitiesDey, Rajarshi January 1900 (has links)
Doctor of Philosophy / Department of Statistics / Paul I. Nelson / Rank based inference using independent random samples to compare K>1 continuous distributions, called the K-sample problem, based on precedence probabilities is developed and explored. There are many parametric and nonparametric approaches, most dealing with hypothesis testing, to this important, classical problem. Most existing tests are designed to detect differences among the location parameters of different distributions. Best known and most widely used of these is the F- test, which assumes normality. A comparable nonparametric test was developed by Kruskal and Wallis (1952). When dealing with location-scale families of distributions, both of these tests can perform poorly if the differences among the distributions are among their scale parameters and not in their location parameters. Overall, existing tests are not effective in detecting changes in both location and scale. In this dissertation, I propose a new class of rank-based, asymptotically distribution- free tests that are effective in detecting changes in both location and scale based on precedence probabilities. Let X_{i} be a random variable with distribution function F_{i} ; Also, let _pi_ be the set of all permutations of the numbers (1,2,...,K) . Then P(X_{i_{1}}<...<X_{i_{K}}) is a precedence probability if (i_{1},...,i_{K}) belongs to _pi_. Properties of these of tests are developed using the theory of U-statistics (Hoeffding, 1948). Some of these new tests are related to volumes under ROC (Receiver Operating Characteristic) surfaces, which are of particular interest in clinical trials whose goal is to use a score to separate subjects into diagnostic groups. Motivated by this goal, I propose three new index measures of the separation or similarity among two or more distributions. These indices may be used as “effect sizes”. In a related problem, Properties of precedence probabilities are obtained and a bootstrap algorithm is used to estimate an interval for them.
|
Page generated in 0.0784 seconds