Global ETD Search

1	COMPARISONS BETWEEN ALGORITHMS TO CONSTRUCT THE RECEIVER OPERATING CHARACTERISTIC (ROC) CURVE FOR MULTIPLE SCREENING/DIAGNOSTIC TESTS Hamid, Muhammad 08 1900 (has links) <p>Cancer can be an extremely aggressive disease, with a poor survival rate among the patients that are in the advanced stages of the disease. Early intervention can significantly affect the outcome of the disease. The requirement of early intervention necessitates a reliable cancer screening. Regular use of screening, followed by timely treatment when cancer is diagnosed, can help decrease the chances of occurrence of death due to cancer. There are several tests available for the early detection and diagnosis of cancer. When multiple diagnostic tests are performed on an individual or multiple disease markers are available it may be possible to combine the information to diagnose disease. By combining multiple tests we can optimize diagnostic accuracy. The combination of ultrasound and mammography as markers for cancer diagnosis could be useful for early intervention. Selecting a statistical tool capable of assessing the performance of a combination of different diagnostic tests is important in selecting the most suitable diagnostic standard. One way of determining the performance of any combination of diagnostic tests is through the use of the receiver operating characteristic (ROC) curve. Baker (2000) proposed three ranking algorithms that optimize the ROC curve. The objective of this study was to develop and select the ranking algorithm which provides the optimal area under the ROC curve to differentiate cancer from benign. Statistically, unordered algorithms proved to be the best among the three algorithms giving average AUCs of 0.96510, followed by Jagged Ordered Algorithm and Rectangular Ordered Algorithm giving average AUCs of 0.96396 and 0.94314 respectively. Clinically, ordered algorithms seem to be the better choice due to their convenience.</p> / Master of Science (MS) Statistics and Probability Statistics and Probability
2	Statistical Issues in a Meta-analysis of Studies of Integrated Treatment Programs for Women with Substance Use Problems and Their Children Liu, Jennifer January 2011 (has links) <p>Meta-analysis is a statistical technique for combining findings from independent studies. A meta-analysis was performed to evaluate the effectiveness of integrated treatment programs for women with substance use issues and their children. Primary outcomes included substance use, maternal and child well-being, length of treatment, and parenting. A total of 9 randomized controlled trials (RCTs) and 84 observational studies were included in the final analysis. The <em>p</em>-value and capture re-capture method, were used to combine studies using different measures of treatment effect and evaluate the completeness of the literature search, respectively. Modified weights incorporating study quality were used to assess the impact of study quality on treatment effects. We also conducted a sensitivity analysis of correlation coefficients on combined estimates as a method for handling missing data.</p> <p>Study quality adjusted weighting and traditional inverse variance weights provided different results for combined estimates of birth weight outcomes measured by standardized mean difference. The results from weighting by study quality provided a statistically significant result with a combined estimate of 0.2644 (95% CI: 0.0860, 0.4428), while the traditional method gave a non-significant combined estimate of 0.3032 with (95% CI: -0.0725,0.6788). The sensitivity analysis of correlation coefficients (r) on combined estimates of maternal depression effects were similar, with confidence intervals that narrowed as r increased. Values ofr = 0.2, 0.5, 0.65, 0.75, and 0.85 gave corresponding results (with 95% CI) of 0.67 (-0.10, 1.45), 0.67 (0.04, 1.3), 0.67 (0.12, 1.2),0.67 (0.18, 1.15), and 0.66 (0.25, 1.07). Robustness of the sensitivity analysis for study quality weighting and choice of correlation coefficient on combined estimates revealed benefits of integrated treatment programs for birth weight outcomes and maternal depression.</p> <p>Evidence of benefit for at least some of the clients was apparent for parenting attitude measured by the Adult-Adolescent Parenting Inventory (AAPI). Results for each subscale of the AAPI were reported by timing of assessments (=< 4,5-8, => 9 months). Combined <em>p</em>-values were 0.0006, <0.0001, <<0.0001 for Inappropriate Expectations, 0.1938, 0.1656, <<0.0001 for Lack of Empathy, 0.0007, <0.0001 <<0.0001 for Corporal Punishment, 0.0352, 0.0002, <<0.0001 for Role Reversal, and 0.5178 (5-8 months) for Power Independence. There was insufficient evidence for concluding a significant effect of treatment on neonatal behavioural assessments measured by Apgar scores. Combined <em>p</em>-values of 0.6980 and 0.3294 were obtained for the 1-minute and 5-minute Apgar, respectively.</p> <p>The number of missing articles estimated by the capture recapture method was 8 (95% CI: 2,24), which suggests a 90% capture rate of all relevant world literature. This result indicates that a sufficient amount of studies were retrieved to avoid bias in the results of the meta-analysis.</p> <p>Conclusions regarding the effectiveness of integrated treatment programs were limited by poor quality evidence from individual studies. We suggest the use of statistical methods such as the <em>p</em>-value, capture re-capture, study quality weighting, and sensitivity analysis of correlation coefficients to handle missing data to address meta-analytic research questions and direct higher quality research in the future.</p> / Master of Science (MS) Statistics and Probability Statistics and Probability
3	Bias and Efficiency of Logistic Regression involving a Binary Covariate with Missing Observations Zhao, Kai 08 1900 (has links) <p>In the statistical analysis of a health research study, it is quite common to have some missing data after data collection. Typically in a clinical trial, the treatment variable is completely recorded most of the time, but the associated covariates may not be. The multi-variable analysis is often conducted by including all the important medically relevant covariates with the expectation that a valid estimate of the treatment effect could be obtained by properly adjusting for these covariates . In this scenario, if the data of the covariates are Missing Not at Random (MNAR) , the situation becomes complicated. The estimate of the treatment effect obtained will be invalid. The situation when the data are Missing Completely at Random (MCAR) is interesting since a dilemma exists: if you include the covariates with a high missing proportion, the analysis loses power although the validity might be good. If the covariates with a high missing proportion are excluded, the validity might be of question but the precision is good. Although the literature suggests that the validity is more important, there might be cases where the precision would improve substantially with a little sacrifice on validity by omitting the covariate from the analysis. In this t hesis, this dilemma will be evaluated in the context of multivariable logistic regression with the hope that some of the results from this work would shed light on the understanding of the situation. This work is significant in that it could potentially change the data collection process. For example, in the research design stage, if we expect that a covariate would have a high rate of missingness, there might be little to gain by collecting this information. Furthermore, the results from this work may guide decisions about data collection. If we decide that a covariate does not need to be collected, then the relevant resources could be released to apply to other important aspects of a study.</p> / Master of Science (MS) Statistics and Probability Statistics and Probability
4	Inferences in the Interval Censored Exponential Regression Model Peng, Defen 12 1900 (has links) <p>The problem of estimation when the data are interval censored has been investigated by several authors. Lindsey and Ryan (1998) considered the application of conventional methods to interval mid (or end) points and showed that they tended to underestimate the standard errors of the estimated parameters and give potentially misleading results. MacKenzie (1999) and Blagojevic (2002) conjectured that the estimator of the parameter was artificially precise when analyzing inspection times as if they were exact when the "time to event" data followed an exponential distribution. In this thesis, we derive formulae for pseudo and true (or exact) likelihoods in the exponential regression model in order to examine the consequences for inference on parameters when the pseudo-likelihood is used in place of the true likelihood. We pay particular attention to the approximate bias of the maximum likelihood estimates in the case of the true likelihood. In particular we present analytical work which proves that the conjectures of Lindsey and Ryan (1998), MacKenzie (1999) and Blagojevic (2002) hold, at least for the exponential distribution with categorical or continuous covariates.<br /><br />We undertake a simulation study in order to quantify and analyze the relative performances of maximum likelihood estimation from both likelihoods. The numerical evidence suggests that the estimates from true likelihood are more accurate. We apply the proposed method to a set of real interval-censored data collected in a Medical Research Council (MRC, UK) multi-centre randomized controlled trial of teletherapy in the age-related macular disease (the ARMD) study.</p> / Master of Science (MS) Statistics and Probability Statistics and Probability
5	Variable Selection Methods for Population-based Genetic Association Studies: SPLS and HSIC Qin, Maochang January 2011 (has links) <p>This project aims to identify the single nucleotide polymorphisms(SNPs), which are associated with the muscle size and strength in Caucasian. Two methods sparse partial least squares (SPLS) and sparse Hilbert-Schmidt independence criterion (HSIC) were applied for dimension reduction and variables selection in the Functional SNPs Associated with Muscle Size and Strength(FAMuss) Study. The selection ability of two methods was compared by simulations. The genetic determinants of skeletal muscle size and strength before and after exercise training in Caucasian were selected by using these two methods.</p> / Master of Science (MS) Statistics and Probability Statistics and Probability
6	An Application of a Cox Model for Lifetimes of HIV Patients Sabina, Sanjel 09 1900 (has links) <p>In this project, an application of the Cox proportional hazard model is being considered. Cox proportional hazard model is fitted to estimate the effect of the covariates, age and drugs, on the survival of the HIV positive patients. These estimates also agree with the estimates obtained by using the numerical method. Likelihood ratio, Wald test and Score test are applied to test the significance of these estimates. Power for these test are performed by Monte Carlo simulation method. Simulated powers for sample size n = 10, 20 and 30, β = 0.1, 0.2, 0.4 and 1%, 5% and 10% are tabulated.</p> / Master of Science (MS) Statistics and Probability Statistics and Probability
7	Implementation of Fixed and Sequential Multilevel Acceptance Sampling: The R Package MFSAS Chen, Yalin 07 1900 (has links) <p>Manufacturers and consumers often use acceptance sampling to determine the acceptability of a lot from an outgoing production or incoming shipment base on a sample. Multilevel acceptance sampling for attributes is applied when the product has multiple levels of product quality or multiple types of (mutually exclusive) possible defects.</p> <p>The aim of this project is to develop an <strong>R</strong> package <strong>MFSAS</strong> which provides the tools to create, evaluate, plot, and display multilevel acceptance sampling plans for attributes for both fixed and sequential sampling. The Dirichlet recursive functions are used to calculate cumulative probabilities for several common multivariate distributions which are needed in the package.</p> / Master of Science (MS) Statistics and Probability Statistics and Probability
8	Bayesian Mixture Models Liu, Zhihui 08 1900 (has links) <p>Mixture distributions are typically used to model data in which each observation belongs to one of some number of different groups. They also provide a convenient and flexible class of models for density estimation. When the number of components <em>k</em> is assumed known, the Gibbs sampler can be used for Bayesian estimation of the component parameters. We present the implementation of the Gibbs sampler for mixtures of Normal distributions and show that, spurious modes can be avoided by introducing a Gamma prior in the Kiefer-Wolfowitz example.</p> <p>Adopting a Bayesian approach for mixture models has certain advantages; it is not without its problems. One typical problem associated with mixtures is nonidentifiability of the Gomponent parameters. This causes label switching in the Gibbs sampler output and makes inference for the individual components meaningless. We show that the usual approach to this problem by imposing simple identifiability constraints on the mixture parameters is sometimes inadequate, and present an alternative approach by arranging the mixture components in order of non-decreasing means whilst choosing priors that are slightly more informative. We illustrate the success of our approach on the fishery example.</p> <p>When the number of components <em>k</em> is considered unknown, more sophisticated methods are required to perform the Bayesian analysis. One method is the Reversible Jump MCMC algorithm described by Richardson and Green (1997), which they applied to univariate Normal mixtures. Alternatively, selection of <em>k</em> can be based on a comparison of models fitted with different numbers of components by some joint measures of model fit and model complexity. We review these methods and illustrate how to use them to compare competing mixture models using the acidity data.</p> <p><br />We conclude with some suggestions for further research.</p> / Master of Science (MS) Statistics and Probability Statistics and Probability
9	TEST STATISTICS AND Q-VALUES TO IDENTIFY DIFFERENTIALLY EXPRESSED GENES IN MICROARRAYS Ye, Chang 08 1900 (has links) Master of Science (MS) Statistics and Probability Statistics and Probability
10	The Measurement of Integrated Human Services Network (The Children’s Treatment Network of Simcoe York) Ye, Chenglin 12 1900 (has links) <p>Community-based human services have traditionally been provided by autonomous service agencies. They have their own funding source and independent process. Integration has been advocated as a key strategy to integrate different agencies together to provide multiple services for a targeted community. The Children’s Treatment Network (CTN) of Simcoe York is a network of agencies and organizations providing services to children with multiple needs and their families in Simcoe County and York Region. This study was designed to evaluate the different levels of integrated service approaches for children on outcomes. The study consisted of two parts: phase I and phase II measurement.</p> <p>Our project covered phase I measurement with the following objectives. Clinically, we aimed to evaluate agencies’ integration in the network, promote discussion, and determine any interrelationship between a network’s integration and its functioning. The statistical objectives were to quantify the network integration for agency, to represent the overall integration, to quantify the association between network’s integration and functioning and to assess the sensitivity of results.</p> <p>We measured agencies’ integration through measuring its agreement in collaboration with other agencies in the network. The higher agreement in collaboration indicates a better services integration. We defined four different agreement measures from different perspectives. The agreement based on group’s perception was defined to be the primary measure. We used mean difference, percentage and the Kappa statistic to measure the agreement for each agency. Correlation and regression analyses were applied in investigating the association between network’s integration and its functioning. The sensitivity of the results was analyzed by examining the re-sampling bias of bootstrapping regression models.</p> <p>Agreement measures were consistent for each agency. In Simcoe, agencies had an average agreement 0.874 (S.D. 0.213) in mean difference, 46.63 (S.D. 12.91) in percentage and 0.258 (S.D. 0.143) in Kappa. Agencies of York had average agreements of 0.662 (S.D. 0.204), 49.36 (S.D. 13.06) and 0.282 (S.D. 0.121), respectively. Agency 10 and 33 in Simcoe and Agency 14 in York were found to have the highest agreement. Agency 3 and 21 in Simcoe and Agency 8 and 9 in York, on the other hand, were found to have the lowest agreement. Different graphical displays illustrated that the overall agreement in collaboration was low and the agencies in York generally had a higher agreement. Correlation analysis showed that synergy and agencies’ perception of pros and cons were significantly correlated with the primary percentage agreement. In regression analysis, we did not find any significant functioning component. However, synergy was found to be much more associated with agreement than the other components. The estimates were 11.48% (-1.03%, 24.00%) and 11.21% (-2.48%, 24.90%) in un-weighted and weighted models respectively. Bootstrapping regression analysis showed that the results were robust to a change of sample.</p> <p>We concluded that the level of integration of CTN was low because the agencies generally had poor agreement in collaboration. Synergy was the most important component associated with the network’s integration. Other functioning components detected were also associated with the integrating process but were less clinically important. We discussed the statistical approaches used in other contexts and some of their strength and weaknesses. We also considered some key limitations of the study. This study was a baseline measurement of CTN of Simcoe York for further analysis. The results provided a basis for future enhancement of integration of the network. Our experiences also provided ideas for improving design and analysis in integrated network measurement.</p> / Master of Science (MS) Statistics and Probability Statistics and Probability

Search results