Global ETD Search

11	Investigating Post-Earnings-Announcement Drift Using Principal Component Analysis and Association Rule Mining Schweickart, Ian R. W. 01 January 2017 (has links) Post-Earnings-Announcement Drift (PEAD) is commonly accepted in the fields of accounting and finance as evidence for stock market inefficiency. Less accepted are the numerous explanations for this anomaly. This project aims to investigate the cause for PEAD by harnessing the power of machine learning algorithms such as Principle Component Analysis (PCA) and a rule-based learning technique, applied to large stock market data sets. Based on the notion that the market is consumer driven, repeated occurrences of irrational behavior exhibited by traders in response to news events such as earnings reports are uncovered. The project produces findings in support of the PEAD anomaly using non-accounting nor financial methods. In particular, this project finds evidence for delayed price response exhibited in trader behavior, a common manifestation of the PEAD phenomenon. 64P20 91B84 Applied Statistics Other Applied Mathematics
12	Improved Standard Error Estimation for Maintaining the Validities of Inference in Small-Sample Cluster Randomized Trials and Longitudinal Studies Tanner, Whitney Ford 01 January 2018 (has links) Data arising from Cluster Randomized Trials (CRTs) and longitudinal studies are correlated and generalized estimating equations (GEE) are a popular analysis method for correlated data. Previous research has shown that analyses using GEE could result in liberal inference due to the use of the empirical sandwich covariance matrix estimator, which can yield negatively biased standard error estimates when the number of clusters or subjects is not large. Many techniques have been presented to correct this negative bias; However, use of these corrections can still result in biased standard error estimates and thus test sizes that are not consistently at their nominal level. Therefore, there is a need for an improved correction such that nominal type I error rates will consistently result. First, GEEs are becoming a popular choice for the analysis of data arising from CRTs. We study the use of recently developed corrections for empirical standard error estimation and the use of a combination of two popular corrections. In an extensive simulation study, we find that nominal type I error rates can be consistently attained when using an average of two popular corrections developed by Mancl and DeRouen (2001, Biometrics 57, 126-134) and Kauermann and Carroll (2001, Journal of the American Statistical Association 96, 1387-1396) (AVG MD KC). Use of this new correction was found to notably outperform the use of previously recommended corrections. Second, data arising from longitudinal studies are also commonly analyzed with GEE. We conduct a simulation study, finding two methods to attain nominal type I error rates more consistently than other methods in a variety of settings: First, a recently proposed method by Westgate and Burchett (2016, Statistics in Medicine 35, 3733-3744) that specifies both a covariance estimator and degrees of freedom, and second, AVG MD KC with degrees of freedom equaling the number of subjects minus the number of parameters in the marginal model. Finally, stepped wedge trials are an increasingly popular alternative to traditional parallel cluster randomized trials. Such trials often utilize a small number of clusters and numerous time intervals, and these components must be considered when choosing an analysis method. A generalized linear mixed model containing a random intercept and fixed time and intervention covariates is the most common analysis approach. However, the sole use of a random intercept applies assumptions that will be violated in practice. We show, using an extensive simulation study based on a motivating example and a more general design, alternative analysis methods are preferable for maintaining the validity of inference in small-sample stepped wedge trials with binary outcomes. First, we show the use of generalized estimating equations, with an appropriate bias correction and a degrees of freedom adjustment dependent on the study setting type, will result in nominal type I error rates. Second, we show the use of a cluster-level summary linear mixed model can also achieve nominal type I error rates for equal cluster size settings. GEE group randomized trials degrees of freedom test size Applied Statistics Biostatistics
13	Provision of Hospital-based Palliative Care and the Impact on Organizational and Patient Outcomes Roczen, Marisa L 01 January 2016 (has links) Hospital-based palliative care services aim to streamline medical care for patients with chronic and potentially life-limiting illnesses by focusing on individual patient needs, efficient use of hospital resources, and providing guidance for patients, patients’ families and clinical providers toward making optimal decisions concerning a patient’s care. This study examined the nature of palliative care provision in U.S. hospitals and its impact on selected organizational and patient outcomes, including hospital costs, length of stay, in-hospital mortality, and transfer to hospice. Hospital costs and length of stay are viewed as important economic indicators. Specifically, lower hospital costs may increase a hospital’s profit margin and shorter lengths of stay can enable patient turnover and efficiency of care. Higher rates of hospice transfers and lower in-hospital mortality may be considered positive outcomes from a patient perspective, as the majority of patients prefer to die at home or outside of the hospital setting. Several data sources were utilized to obtain information about patient, hospital, and county characteristics; patterns of hospitals’ palliative care provision; and patients’ hospital costs, length of stay, in-hospital mortality, and transfer to hospice (if a patient survived hospitalization). The study sample consisted of 3,763,339 patients; 348 urban, general, short-term, acute care, non-federal hospitals; and 111 counties located in six states over a 5-year study (2007-2011). Hospital-based palliative care provision was measured by the presence of three palliative care services, including inpatient palliative care consultation services (PAL), inpatient palliative care units (IPAL), and hospice programs (HOSPC). Derived from Institutional Theory, Resource Dependence Theory, and Donabedian’s Structure Process-Outcome framework, 13 hypotheses were tested using a hierarchical (generalized) linear modeling approach. The study findings suggested that hospital size was associated with a higher probability of hospital-based palliative care provision. Conversely, the presence of palliative care services through a hospital’s health system, network, or joint venture was associated with a lower probability of hospital-based palliative care provision. The study findings also indicated that hospitals with an IPAL or HOSPC incurred lower hospital costs, whereas hospitals with PAL incurred higher hospital costs. The presence of PAL, IPAL, and HOSPC was generally associated with a lower probability of in-hospital mortality and transfer to hospice. Finally, the effects of hospital-based palliative care services on length of stay were mixed, and further research is needed to understand this relationship. Health Services Research Multivariate Analysis Organizational Behavior and Theory Palliative Care
14	Model Specification Searches in Latent Growth Modeling: A Monte Carlo Study Kim, Min Jung 2012 May 1900 (has links) This dissertation investigated the optimal strategy for the model specification search in the latent growth modeling. Although developing an initial model based on the theory from prior research is favored, sometimes researchers may need to specify the starting model in the absence of theory. In this simulation study, the effectiveness of the start models in searching for the true population model was examined. The four possible start models adopted in this study were: the simplest mean and covariance structure model, the simplest mean and the most complex covariance structure model, the most complex mean and the simplest covariance structure model, and the most complex mean and covariance structure model. Six model selection criteria were used to determine the recovery of the true model: Likelihood ratio test (LRT), DeltaCFI, DeltaRMSEA, DeltaSRMR, DeltaAIC, and DeltaBIC. The results showed that specifying the most complex covariance structure (UN) with the most complex mean structure recovered the true mean trajectory most successfully with the average hit rate above 90% using the DeltaCFI, DeltaBIC, DeltaAIC, and DeltaSRMR. In searching for the true covariance structure, LRT, DeltaCFI, DeltaAIC, and DeltaBIC performed successfully regardless of the searching method with different start models. Latent growth modeling LGM Longitudinal data analysis Mean structure Specification search
15	Applying Localized Realized Volatility Modeling to Futures Indices Fu, Luella 01 January 2011 (has links) This thesis extends the application of the localized realized volatility model created by Ying Chen, Wolfgang Karl Härdle, and Uta Pigorsch to other futures markets, particularly the CAC 40 and the NI 225. The research attempted to replicate results though ultimately, those results were invalidated by procedural difficulties. Volatility Modeling Applied Statistics Statistical Models
16	Modelos para a análise de dados de contagens longitudinais com superdispersão: estimação INLA / Models for data analysis of longitudinal counts with overdispersion: INLA estimation Everton Batista da Rocha 04 September 2015 (has links) Em ensaios clínicos é muito comum a ocorrência de dados longitudinais discretos. Para sua análise é necessário levar em consideração que dados observados na mesma unidade experimental ao longo do tempo possam ser correlacionados. Além dessa correlação inerente aos dados é comum ocorrer o fenômeno de superdispersão (ou sobredispersão), em que, existe uma variabilidade nos dados além daquela captada pelo modelo. Um caso que pode acarretar a superdispersão é o excesso de zeros, podendo também a superdispersão ocorrer em valores não nulos, ou ainda, em ambos os casos. Molenberghs, Verbeke e Demétrio (2007) propuseram uma classe de modelos para acomodar simultaneamente a superdispersão e a correlação em dados de contagens: modelo Poisson, modelo Poisson-gama, modelo Poisson-normal e modelo Poisson-normal-gama (ou modelo combinado). Rizzato (2011) apresentou a abordagem bayesiana para o ajuste desses modelos por meio do Método de Monte Carlo com Cadeias de Markov (MCMC). Este trabalho, para modelar a incerteza relativa aos parâmetros desses modelos, considerou a abordagem bayesiana por meio de um método determinístico para a solução de integrais, INLA (do inglês, Integrated Nested Laplace Approximations). Além dessa classe de modelos, como objetivo, foram propostos outros quatros modelos que também consideram a correlação entre medidas longitudinais e a ocorrência de superdispersão, além da ocorrência de zeros estruturais e não estruturais (amostrais): modelo Poisson inacionado de zeros (ZIP), modelo binomial negativo inacionado de zeros (ZINB), modelo Poisson inacionado de zeros - normal (ZIP-normal) e modelo binomial negativo inacionado de zeros - normal (ZINB-normal). Para ilustrar a metodologia desenvolvida, um conjunto de dados reais referentes à contagens de ataques epilépticos sofridos por pacientes portadores de epilepsia submetidos a dois tratamentos (um placebo e uma nova droga) ao longo de 27 semanas foi considerado. A seleção de modelos foi realizada utilizando-se medidas preditivas baseadas em validação cruzada. Sob essas medidas, o modelo selecionado foi o modelo ZIP-normal, sob o modelo corrente na literatura, modelo combinado. As rotinas computacionais foram implementadas no programa R e são parte deste trabalho. / Discrete and longitudinal structures naturally arise in clinical trial data. Such data are usually correlated, particularly when the observations are made within the same experimental unit over time and, thus, statistical analyses must take this situation into account. Besides this typical correlation, overdispersion is another common phenomenon in discrete data, defined as a greater observed variability than that nominated by the statistical model. The causes of overdispersion are usually related to an excess of observed zeros (zero-ination), or an excess of observed positive specific values or even both. Molenberghs, Verbeke e Demétrio (2007) have developed a class of models that encompasses both overdispersion and correlation in count data: Poisson, Poisson-gama, Poisson-normal, Poissonnormal- gama (combined model) models. A Bayesian approach was presented by Rizzato (2011) to fit these models using the Markov Chain Monte Carlo method (MCMC). In this work, a Bayesian framework was adopted as well and, in order to consider the uncertainty related to the model parameters, the Integrated Nested Laplace Approximations (INLA) method was used. Along with the models considered in Rizzato (2011), another four new models were proposed including longitudinal correlation, overdispersion and zero-ination by structural and random zeros, namely: zero-inated Poisson (ZIP), zero-inated negative binomial (ZINB), zero-inated Poisson-normal (ZIP-normal) and the zero-inated negative binomial-normal (ZINB-normal) models. In order to illustrate the developed methodology, the models were fit to a real dataset, in which the response variable was taken to be the number of epileptic events per week in each individual. These individuals were split into two groups, one taking placebo and the other taking an experimental drug, and they observed up to 27 weeks. The model selection criteria were given by different predictive measures based on cross validation. In this setting, the ZIP-normal model was selected instead the usual model in the literature (combined model). The computational routines were implemented in R language and constitute a part of this work. Análise de dados longitudinais Contagens Inferência Bayesiana Superdispersão Bayesian inference Counts Longitudinal data analysis Overdispersion
17	The Generalized Monotone Incremental Forward Stagewise Method for Modeling Longitudinal, Clustered, and Overdispersed Count Data: Application Predicting Nuclear Bud and Micronuclei Frequencies Lehman, Rebecca 01 January 2017 (has links) With the influx of high-dimensional data there is an immediate need for statistical methods that are able to handle situations when the number of predictors greatly exceeds the number of samples. One such area of growth is in examining how environmental exposures to toxins impact the body long term. The cytokinesis-block micronucleus assay can measure the genotoxic effect of exposure as a count outcome. To investigate potential biomarkers, high-throughput assays that assess gene expression and methylation have been developed. It is of interest to identify biomarkers or molecular features that are associated with elevated micronuclei (MN) or nuclear bud (Nbud) frequency, measures of exposure to environmental toxins. Given our desire to model a count outcome (MN and Nbud frequency) using high-throughput genomic features as predictors, novel methods that can handle over-parameterized models need development. Overdispersion, when the variance of a count outcome is larger than its mean, is frequently observed with count response data. For situations where overdispersion is present, the negative binomial distribution is more appropriate. Furthermore, we expand the method to the longitudinal Poisson and longitudinal negative binomial settings for modeling a longitudinal or clustered outcome both when there is equidispersion and overdispersion. The method we have chosen to expand is the Generalized Monotone Incremental Forward Stagewise (GMIFS) method. We extend the GMIFS to the negative binomial distribution so it may be used to analyze a count outcome when both a high-dimensional predictor space and overdispersion are present. Our methods were compared to glmpath. We also extend the GMIFS to the longitudinal Poisson and longitudinal negative binomial distribution for analyzing a longitudinal outcome. Our methods were compared to glmmLasso and GLMMLasso. The developed methods were used to analyze two datasets, one from the Norwegian Mother and Child Cohort study and one from the breast cancer epigenomic study conducted by researchers at Virginia Commonwealth University. In both studies a count outcome measured exposure to potential genotoxins and either gene expression or high-throughput methylation data formed a high dimensional predictor space. Further, the breast cancer study was longitudinal such that outcomes and high-dimensional genomic features were collected at multiple time points during the study for each patient. Our goal is to identify biomarkers that are associated with elevated MN or NBud frequency. From the development of these methods, we hope to make available more comprehensive statistical models for analyzing count outcomes with high dimensional predictor spaces and either cross-sectional or longitudinal study designs. high-dimensional genomics longitudinal Biostatistics Microarrays
18	Chronic poverty concepts and measures : an application to Kazakhstan Kudebayeva, Alma January 2015 (has links) This thesis explores the concepts and measurements of chronic poverty, with application to Kazakhstan. A rigorous analysis of different approaches in the measurement of poverty and chronic poverty is presented in this study. Five matching techniques have been applied for the construction of unintended panel data based on KHBS 2001-2009. The substantial test of reliability, representativeness and robustness of the constructed panel data has examined. The attrition biases of the longitudinal data have been studied rigorously. The appropriate equivalence scale has been determined through regression analysis to the Kazakhstan HBS. The sensitivity of conventional and chronic poverty measures to various poverty lines and equivalence scales studied in this thesis. The stochastic dominance analysis of per adult equivalent consumption expenditures has been presented. The chronic poverty measures and determinants of chronically and transient poor have been estimated. It illustrates that the main correlates of chronic poverty are education, employment status of the head of household, household composition, the ownership of assets such as a dwelling other than main dwelling, a car, access to water in the house and location. The correlates of transient poverty are similar to chronic poverty; however some of them have opposite signs, for example the ethnicity of the head of household, household compositions, an ownership of a dwelling other than main dwelling, location in urban area and repayments of loan in 2008. The Oaxaca-Blinder decomposition analysis of the gap in consumption expenditures between chronically and transient poor, chronically poor and non-poor explains the differences through returns to endowments. Poverty transitions analysis illustrate improvement in poverty dynamics in later period of the study in 2006-2009. Long durations of poverty prevail among singles with children and couples with children. Poverty exit rates are higher than poverty entry rates for the whole period of 2001-2009. The multivariate hazard regression models are estimated to examine differences in people's experience of poverty over a period of time. For individuals who enter poverty, the total span of time that they spend in poverty consequently depends on both the chances of exit from poverty and the chances of re-entry to poverty. The results confirm the negative duration dependence of the hazards of poverty exit and re-entry for longer lengths of state. The only factor significantly positive influence on poverty exit is a location in Almaty. Many correlates of the model estimation have the same signs for the hazard rate of poverty exit and re-entries. These facts mean that these factors are common for transient poor, who are moving in and out poverty in given period of time. As defined before the existence of children under age six will increase the hazard rate of poverty re-entry. 339.4
19	A Longitudinal Analysis to Compare a Tailored Web-Based Intervention and Tailored Phone Counseling to Usual Care for Improving Beliefs of Colorectal Cancer Screening Dorman, Hannah Louise 07 1900 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / An analysis of longitudinal data collected about beliefs regarding colorectal cancer (CRC) screenings at three-time points was analyzed to determine whether the beliefs improved from either the Web-Based, Phone-Based, or Web + Phone interventions compared to Usual Care. A mixed linear model adjusting for baseline and controlling for covariates was used to determine the effects of the intervention; Web-Based intervention was the most efficacious in improving beliefs, and phone intervention was also efficacious for several beliefs, compared to usual care. Cancer Colorectal Cancer Breast Cancer Longitudinal Data Analysis Mixed Linear Model
20	A Review and Comparison of Models and Estimation Methods for Multivariate Longitudinal Data of Mixed Scale Type Codd, Casey 23 September 2014 (has links) No description available. Quantitative Psychology generalized linear mixed model longitudinal data analysis mixed model

Search results