Global ETD Search

121	Characterization of a Weighted Quantile Score Approach for Highly Correlated Data in Risk Analysis Scenarios Carrico, Caroline 29 March 2013 (has links) In risk evaluation, the effect of mixtures of environmental chemicals on a common adverse outcome is of interest. However, due to the high dimensionality and inherent correlations among chemicals that occur together, the traditional methods (e.g. ordinary or logistic regression) are unsuitable. We extend and characterize a weighted quantile score (WQS) approach to estimating an index for a set of highly correlated components. In the case with environmental chemicals, we use the WQS to identify “bad actors” and estimate body burden. The accuracy of the WQS was evaluated through extensive simulation studies in terms of validity (ability of the WQS to select the correct components) and reliability (the variability of the estimated weights across bootstrap samples). The WQS demonstrated high validity and reliability in scenarios with relatively high correlations with an outcome and moderate breakdown in cases where the correlation with the outcome was relatively small compared to the pairwise correlations. In cases where components are independent, weights can be interpreted as association with the outcome relative to the other components. In cases with complex correlation patterns, weights are influenced by both importance with the outcome and the correlation structure. The WQS also showed improvements over ordinary regression and LASSO in the simulations performed. To conclude, an application of this method on the association between environmental chemicals, nutrition and liver toxicity, as measured by ALT (alanine amino-transferase) is presented. The application identifies environmental chemicals (PCBs, dioxins, furans and heavy metals) that are associated with an increase in ALT and a set of nutrients that are identified as non-chemical stressors due to an association with an increase in ALT. Biostatistics Correlated data Mixtures Biostatistics Physical Sciences and Mathematics Statistics and Probability
122	Statistical Methods and Experimental Design for Inference Regarding Dose and/or Interaction Thresholds Along a Fixed-Ratio Ray Yeatts, Sharon Dziuba 01 January 2006 (has links) An alternative to the full factorial design, the ray design is appropriate for investigating a mixture of c chemicals, which are present according to a fixed mixing ratio, called the mixture ray. Using single chemical and mixture ray data, we can investigate interaction among the chemicals in a particular mixture. Statistical models have been used to describe the dose-response relationship of the single agents and the mixture; additivity is tested through the significance of model parameters associated with the coincidence of the additivity and mixture models.It is often assumed that a chemical or mixture must be administered above an unknown dose threshold in order to produce an effect different from background. Risk assessors often assume that interactions are a high-dose phenomenon, indicating that doses below the unknown interaction threshold are associated with additivity. We developed methodology that allows the user to simultaneously estimate the dose threshold and the interaction threshold. This methodology allows us to test for interaction and, secondarily, to test for a region of additivity. The methodology and optimal design characteristics were illustrated using a mixture of nine haloacetic acids.The application of statistical optimality criteria to the development of experimental designs is vital to the successful study of complex mixtures. Since the optimal design depends on the model of interest and the planned method of analysis, developments in statistical methodology should necessarily correspond to consideration of the experimental design characteristics necessary to implement them. The Flexible Single Chemical Required methodology is based on an implicit statement of additivity. We developed a method for constructing the parameter covariance matrix, which forms the basis of many alphabetic optimality criteria, for the implicit FSCR models. The method was demonstrated for a fixed-ratio mixture of 18 chemicals; the original mixture experiment comprises the first stage data, and the optimal second stage design was presented. Waldtype procedures for hypothesis testing in nonlinear models are based on a linear approximation. As a result, likelihood ratio-based procedures may be preferred over Waldtype procedures. We developed a procedure for using the likelihood ratio-based lower confidence bound as an optimality criterion, which can be used to find the optimal second stage design for improving the inference on a particular model parameter. The method was demonstrated for a single agent, as a means of improving the inference on the dose threshold. chemical interaction toxicology dose Biostatistics Physical Sciences and Mathematics Statistics and Probability
123	Comparing Bootstrap and Jackknife Variance Estimation Methods for Area Under the ROC Curve Using One-Stage Cluster Survey Data Dunning, Allison 15 June 2009 (has links) The purpose of this research is to examine the bootstrap and jackknife as methods for estimating the variance of the AUC from a study using a complex sampling design and to determine which characteristics of the sampling design effects this estimation. Data from a one-stage cluster sampling design of 10 clusters was examined. Factors included three true AUCs (.60, .75, and .90), three prevalence levels (50/50, 70/30, 90/10) (non-disease/disease), and finally three number of clusters sampled (2, 5, or 7). A simulated sample was constructed for each of the 27 combinations of AUC, prevalence and number of clusters. Estimates of the AUC obtained from both the bootstrap and jackknife methods provide unbiased estimates for the AUC. In general it was found that bootstrap variance estimation methods provided smaller variance estimates. For both the bootstrap and jackknife variance estimates, the rarer the disease in the population the higher the variance estimate. As the true area increased the variance estimate decreased for both the bootstrap and jackknife methods. For both the bootstrap and jackknife variance estimates, as number of clusters sampled increased the variance decreased, however the trend for the jackknife may be effected by outliers. The National Health and Nutrition Examination Survey (NHANES) conducted by the CDC is a complex survey which implements the use of the one-stage cluster sampling design. A subset of the 2001-2002 NHANES data was created looking only at adult women. A separate logistic regression analysis was conducted to determine if exposure to certain furans in the environment have an effect on abnormal levels of four hormones (FSH, LH, TSH, and T4) in women. Bootstrap and jackknife variance estimation techniques were applied to estimate the AUC and variances for the four logistic regressions. The AUC estimates provided by both the bootstrap and jackknife methods were similar, with the exception of LH. Unlike in the simulated study, the jackknife variance estimation method provided consistently smaller variance estimates than bootstrap. AUC estimates for all four hormones suggested that exposure to furans effects abnormal levels of hormones more than expected by chance. The bootstrap variance estimation technique provided better variance estimates for AUC when sampling many clusters. When only sampling a few clusters or as in the NHANES study where the entire population was treated as a single cluster, the jackknife variance estimation method provides smaller variance estimates for the AUC. Bootstrap Jackknife Biostatistics Physical Sciences and Mathematics Statistics and Probability
124	Probe Level Analysis of Affymetrix Microarray Data Kennedy, Richard Ellis 01 January 2008 (has links) The analysis of Affymetrix GeneChip® data is a complex, multistep process. Most often, methodscondense the multiple probe level intensities into single probeset level measures (such as RobustMulti-chip Average (RMA), dChip and Microarray Suite version 5.0 (MAS5)), which are thenfollowed by application of statistical tests to determine which genes are differentially expressed. An alternative approach is a probe-level analysis, which tests for differential expression directly using the probe-level data. Probe-level models offer the potential advantage of more accurately capturing sources of variation in microarray experiments. However, this has not been thoroughly investigated, since current research efforts have largely focused on the development of improved expression summary methods. This research project will review current approaches to analysis of probe-level data and discuss extensions of two examples, the S-Score and the Random Variance Model (RVM). The S-Score is a probe-level algorithm based on an error model in which the detected signal is proportional to the probe pair signal for highly expressed genes, but approaches a background level (rather than 0) for genes with low levels of expression. Initial results with the S-Score have been promising, but the method has been limited to two-chip comparisons. This project presents extensions to the S-Score that permit comparisons of multiple chips and "borrowing" of information across probes to increase statistical power. The RVM is a probeset-level algorithm that models the variance of the probeset intensities as a random sample from a common distribution to "borrow" information across genes. This project presents extensions to the RVM for probe-level data, using multivariate statistical theory to model the covariance among probes in a probeset. Both of these methods show the advantages of probe-level, rather than probeset-level, analysis in detecting differential gene expression for Afymetrix GeneChip data. Future research will focus on refining the probe-level models of both the S-Score and RVM algorithms to increase the sensitivity and specificity of microarray experiments. covariance probe Affymetrix microarray Biostatistics Physical Sciences and Mathematics Statistics and Probability
125	Power and Sample Size for Three-Level Cluster Designs Cunningham, Tina 05 November 2010 (has links) Over the past few decades, Cluster Randomized Trials (CRT) have become a design of choice in many research areas. One of the most critical issues in planning a CRT is to ensure that the study design is sensitive enough to capture the intervention effect. The assessment of power and sample size in such studies is often faced with many challenges due to several methodological difficulties. While studies on power and sample size for cluster designs with one and two levels are abundant, the evaluation of required sample size for three-level designs has been generally overlooked. First, the nesting effect introduces more than one intracluster correlation into the model. Second, the variance structure of the estimated treatment difference is more complicated. Third, sample size results required for several levels are needed. In this work, we developed sample size and power formulas for the three-level data structures based on the generalized linear mixed model approach. We derived explicit and general power and sample size equations for detecting a hypothesized effect on continuous Gaussian outcomes and binary outcomes. To confirm the accuracy of the formulas, we conducted several simulation studies and compared the results. To establish a connection between the theoretical formulas and their applications, we developed a SAS user-interface macro that allowed the researchers to estimate sample size for a three-level design for different scenarios. These scenarios depend on which randomization level is assigned and whether or not there is an interaction effect. Sample Size Cluster Designs Biostatistics Physical Sciences and Mathematics Statistics and Probability
126	Meta-Analysis of Open vs Closed Surgery of Mandibular Condyle Fractures Nussbaum, Marcy Lauren 01 January 2006 (has links) A review of the literature reveals a difference of opinion regarding whether open or closed reduction of condylar fractures produces the best results. It would be beneficial to critically analyze past studies that have directly compared the two methods in an attempt to answer this question. A Medline search for articles using the key words 'mandibular condyle fractures' and 'mandibular condyle fractures surgery' was performed. The articles chosen for the meta-analysis contained data on at least one of the following: postoperative maximum mouth opening, lateral excursion, protrusion, deviation on opening, asymmetry, and joint pain or muscle pain. Several common statistical methods were used to test for differences between open and closed surgery, including the weighted average method for fixed and random effects as well as the Mantel-Haenszel method for fixed effects. Some of the outcome variables were found to be statistically significant but were interpreted with caution because of the poor quality of the studies assessed. There is a need for more standardized data collection as well as patient randomization to treatment groups. meta-analysis Biostatistics Physical Sciences and Mathematics Statistics and Probability
127	Stereotype Logit Models for High Dimensional Data Williams, Andre 29 October 2010 (has links) Gene expression studies are of growing importance in the field of medicine. In fact, subtypes within the same disease have been shown to have differing gene expression profiles (Golub et al., 1999). Often, researchers are interested in differentiating a disease by a categorical classification indicative of disease progression. For example, it may be of interest to identify genes that are associated with progression and to accurately predict the state of progression using gene expression data. One challenge when modeling microarray gene expression data is that there are more genes (variables) than there are observations. In addition, the genes usually demonstrate a complex variance-covariance structure. Therefore, modeling a categorical variable reflecting disease progression using gene expression data presents the need for methods capable of handling an ordinal outcome in the presence of a high dimensional covariate space. In this research we present a method that combines the stereotype regression model (Anderson, 1984) with an elastic net penalty (Friedman et al., 2010) as a method capable of modeling an ordinal outcome for high-throughput genomic datasets. Results from applying the proposed method to both simulated and gene expression data will be reported and the effectiveness of the proposed method compared to a univariable and heuristic approach will be discussed. gene expression ordinal outcome Biostatistics Physical Sciences and Mathematics Statistics and Probability
128	Deriving Optimal Composite Scores: Relating Observational/Longitudinal Data with a Primary Endpoint Ellis, Rhonda 09 September 2009 (has links) In numerous clinical/experimental studies, multiple endpoints are measured on each subject. It is often not clear which of these endpoints should be designated as of primary importance. The desirability function approach is a way of combining multiple responses into a single unitless composite score. The response variables may include multiple types of data: binary, ordinal, count, interval data. Each response variable is transformed to a 0 to1 unitless scale with zero representing a completely undesirable response and one representing the ideal value. In desirability function methodology, weights on individual components can be incorporated to allow different levels of importance to be assigned to different outcomes. The assignment of the weight values are subjective and based on individual or group expert opinion. In this dissertation, it is our goal to find the weights or response variable transformations that optimize an external empirical objective criterion. For example, we find the optimal weights/transformations that minimize the generalized variance of a prediction regression model relating the score and response of an external variable in pre-clinical and clinical data. For application of the weighting/transformation scheme, initial weighting or transformation values must be obtained then calculation of the corresponding value of the composite score follows. Based on the selected empirical model for the analyses, parameter estimates are found using the usual iterative algorithms (e.g., Gauss Newton). A direct search algorithm (e.g., the Nelder-Mead simplex algorithm) is then used for the minimization of a given objective criterion i.e. generalized variance. The finding of optimal weights/transformations can also be viewed as a model building process. Here relative importance levels are given to each variable in the score and less important variables are minimized and essentially eliminated. Composite scores desirability functions Biostatistics Physical Sciences and Mathematics Statistics and Probability
129	DOES PAIR-MATCHING ON ORDERED BASELINE MEASURES INCREASE POWER: A SIMULATION STUDY Jin, Yan 18 July 2012 (has links) It has been shown that pair-matching on an ordered baseline with normally distributed measures reduces the variance of the estimated treatment effect (Park and Johnson, 2006). The main objective of this study is to examine if pair-matching improves the power when the distribution is a mixture of two normal distributions. Multiple scenarios with a combination of different sample sizes and parameters are simulated. The power curves are provided for three cases, with and without matching, as follows: analysis of post-intervention data only, adding baseline as a covariate, and classic pre-post comparison. The study shows that the additional variance reduction provided by pair-matching in the pre-post design is limited for high correlation. When correlation is low, there is a significant power increase. It is shown that the baseline pair-matching improves the power when the two means of a mixture normal distribution are widely spread. The pattern becomes clear for low correlation. pair-matching A Simulation Study Biostatistics Physical Sciences and Mathematics Statistics and Probability
130	Peptide Identification: Refining a Bayesian Stochastic Model Acquah, Theophilus Barnabas Kobina 01 May 2017 (has links) Notwithstanding the challenges associated with different methods of peptide identification, other methods have been explored over the years. The complexity, size and computational challenges of peptide-based data sets calls for more intrusion into this sphere. By relying on the prior information about the average relative abundances of bond cleavages and the prior probability of any specific amino acid sequence, we refine an already developed Bayesian approach in identifying peptides. The likelihood function is improved by adding additional ions to the model and its size is driven by two overall goodness of fit measures. In the face of the complexities associated with our posterior density, a Markov chain Monte Carlo algorithm coupled with simulated annealing is used to simulate candidate choices from the posterior distribution of the peptide sequence, where the peptide with the largest posterior density is estimated as the true peptide. Bayesian Protein Identification MCMC Peptides Ions Mathematics Statistics and Probability

Search results