• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 12
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 20
  • 20
  • 4
  • 4
  • 3
  • 3
  • 3
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Model comparison and assessment by cross validation

Shen, Hui 11 1900 (has links)
Cross validation (CV) is widely used for model assessment and comparison. In this thesis, we first review and compare three v-fold CV strategies: best single CV, repeated and averaged CV and double CV. The mean squared errors of the CV strategies in estimating the best predictive performance are illustrated by using simulated and real data examples. The results show that repeated and averaged CV is a good strategy and outperforms the other two CV strategies for finite samples in terms of the mean squared error in estimating prediction accuracy and the probability of choosing an optimal model. In practice, when we need to compare many models, conducting repeated and averaged CV strategy is not computational feasible. We develop an efficient sequential methodology for model comparison based on CV. It also takes into account the randomness in CV. The number of models is reduced via an adaptive, multiplicity-adjusted sequential algorithm, where poor performers are quickly eliminated. By exploiting matching of individual observations, it is sometimes even possible to establish the statistically significant inferiority of some models with just one execution of CV. This adaptive and computationally efficient methodology is demonstrated on a large cheminformatics data set from PubChem. Cross validated mean squared error (CVMSE) is widely used to estimate the prediction mean squared error (MSE) of statistical methods. For linear models, we show how CVMSE depends on the number of folds, v, used in cross validation, the number of observations, and the number of model parameters. We establish that the bias of CVMSE in estimating the true MSE decreases with v and increases with model complexity. In particular, the bias may be very substantial for models with many parameters relative to the number of observations, even if v is large. These results are used to correct CVMSE for its bias. We compare our proposed bias correction with that of Burman (1989), through simulated and real examples. We also illustrate that our method of correcting for the bias of CVMSE may change the results of model selection.
2

Model comparison and assessment by cross validation

Shen, Hui 11 1900 (has links)
Cross validation (CV) is widely used for model assessment and comparison. In this thesis, we first review and compare three v-fold CV strategies: best single CV, repeated and averaged CV and double CV. The mean squared errors of the CV strategies in estimating the best predictive performance are illustrated by using simulated and real data examples. The results show that repeated and averaged CV is a good strategy and outperforms the other two CV strategies for finite samples in terms of the mean squared error in estimating prediction accuracy and the probability of choosing an optimal model. In practice, when we need to compare many models, conducting repeated and averaged CV strategy is not computational feasible. We develop an efficient sequential methodology for model comparison based on CV. It also takes into account the randomness in CV. The number of models is reduced via an adaptive, multiplicity-adjusted sequential algorithm, where poor performers are quickly eliminated. By exploiting matching of individual observations, it is sometimes even possible to establish the statistically significant inferiority of some models with just one execution of CV. This adaptive and computationally efficient methodology is demonstrated on a large cheminformatics data set from PubChem. Cross validated mean squared error (CVMSE) is widely used to estimate the prediction mean squared error (MSE) of statistical methods. For linear models, we show how CVMSE depends on the number of folds, v, used in cross validation, the number of observations, and the number of model parameters. We establish that the bias of CVMSE in estimating the true MSE decreases with v and increases with model complexity. In particular, the bias may be very substantial for models with many parameters relative to the number of observations, even if v is large. These results are used to correct CVMSE for its bias. We compare our proposed bias correction with that of Burman (1989), through simulated and real examples. We also illustrate that our method of correcting for the bias of CVMSE may change the results of model selection.
3

Model comparison and assessment by cross validation

Shen, Hui 11 1900 (has links)
Cross validation (CV) is widely used for model assessment and comparison. In this thesis, we first review and compare three v-fold CV strategies: best single CV, repeated and averaged CV and double CV. The mean squared errors of the CV strategies in estimating the best predictive performance are illustrated by using simulated and real data examples. The results show that repeated and averaged CV is a good strategy and outperforms the other two CV strategies for finite samples in terms of the mean squared error in estimating prediction accuracy and the probability of choosing an optimal model. In practice, when we need to compare many models, conducting repeated and averaged CV strategy is not computational feasible. We develop an efficient sequential methodology for model comparison based on CV. It also takes into account the randomness in CV. The number of models is reduced via an adaptive, multiplicity-adjusted sequential algorithm, where poor performers are quickly eliminated. By exploiting matching of individual observations, it is sometimes even possible to establish the statistically significant inferiority of some models with just one execution of CV. This adaptive and computationally efficient methodology is demonstrated on a large cheminformatics data set from PubChem. Cross validated mean squared error (CVMSE) is widely used to estimate the prediction mean squared error (MSE) of statistical methods. For linear models, we show how CVMSE depends on the number of folds, v, used in cross validation, the number of observations, and the number of model parameters. We establish that the bias of CVMSE in estimating the true MSE decreases with v and increases with model complexity. In particular, the bias may be very substantial for models with many parameters relative to the number of observations, even if v is large. These results are used to correct CVMSE for its bias. We compare our proposed bias correction with that of Burman (1989), through simulated and real examples. We also illustrate that our method of correcting for the bias of CVMSE may change the results of model selection. / Science, Faculty of / Statistics, Department of / Graduate
4

Modelling Issues in Three-state Progressive Processes

Kopciuk, Karen January 2001 (has links)
This dissertation focuses on several issues pertaining to three-state progressive stochastic processes. Casting survival data within a three-state framework is an effective way to incorporate intermediate events into an analysis. These events can yield valuable insights into treatment interventions and the natural history of a process, especially when the right censoring is heavy. Exploiting the uni-directional nature of these processes allows for more effective modelling of the types of incomplete data commonly encountered in practice, as well as time-dependent explanatory variables and different time scales. In Chapter 2, we extend the model developed by Frydman (1995) by incorporating explanatory variables and by permitting interval censoring for the time to the terminal event. The resulting model is quite general and combines features of the models proposed by Frydman (1995) and Kim <i>et al</i>. (1993). The decomposition theorem of Gu (1996) is used to show that all of the estimating equations arising from Frydman's log likelihood function are self-consistent. An AIDS data set analyzed by these authors is used to illustrate our regression approach. Estimating the standard errors of our regression model parameters, by adopting a piecewise constant approach for the baseline intensity parameters, is the focus of Chapter 3. We also develop data-driven algorithms which select changepoints for the intervals of support, based on the Akaike and Schwarz Information Criteria. A sensitivity study is conducted to evaluate these algorithms. The AIDS example is considered here once more; standard errors are estimated for several piecewise constant regression models selected by the model criteria. Our results indicate that for both the example and the sensitivity study, the resulting estimated standard errors of certain model parameters can be quite large. Chapter 4 evaluates the goodness-of-link function for the transition intensity between states 2 and 3 in the regression model we introduced in chapter 2. By embedding this hazard function in a one-parameter family of hazard functions, we can assess its dependence on the specific parametric form adopted. In a simulation study, the goodness-of-link parameter is estimated and its impact on the regression parameters is assessed. The logistic specification of the hazard function from state 2 to state 3 is appropriate for the discrete, parametric-based data sets considered, as well as for the AIDS data. We also investigate the uniqueness and consistency of the maximum likelihood estimates based on our regression model for these AIDS data. In Chapter 5 we consider the possible efficiency gains realized in estimating the survivor function when an intermediate auxiliary variable is incorporated into a time-to-event analysis. Both Markov and hybrid time scale frameworks are adopted in the resulting progressive three-state model. We consider three cases for the amount of information available about the auxiliary variable: the observation is completely unknown, known exactly, or known to be within an interval of time. In the Markov framework, our results suggest that observing subjects at just two time points provides as much information about the survivor function as knowing the exact time of the intermediate event. There was generally a greater loss of efficiency in the hybrid time setting. The final chapter identifies some directions for future research.
5

Screening for Insulin Resistance in Patients with Liver Disease in Tertiary Centers

Ahmed, Waheeda Siddiqui, Ahmed, Waheeda Siddiqui January 2016 (has links)
Background: Liver is a vital organ that plays a major role in glucose production and regulationthroughout the body (Musso et al., 2012). Liver disease has long been linked with insulin resistance (IR), dating back to 1906 (Megyesi et al., 1967). IR has been found to be prevalent in a range of liver diseases, including chronic Hepatitis C Virus (HCV), hemochromatosis, and alcoholic liver disease (Goswami et al., 2014). Liver disease is highly prevalent in the United States population with 30 million people (or one out of ten Americans) suffering from some type of liver disease (Peery et al., 2015). Although research demonstrates a significant relationship between liver disease and IR, the University of Arizona (UA) hepatology clinic does not currently screen liver disease patients for IR. Homeostatic model assessment for insulin resistance (HOMA-IR) score is used to study IR in non-insulin resistant population. HOMA-IR score is calculated using formula fasting plasma glucose (mmol/l) times fasting serum insulin (mU/l) divided by 22.5 (Bonora et al., 2002). Low HOMA-IR (HOMA< 2.0) values indicate high insulin sensitivity, whereas high HOMA-IR (HOMA> 2.0) values indicate low insulin sensitivity (insulin resistance) (Bonora et al., 2002). Objective: The purpose of this quality improvement (QI) project is to show the prevalence of IR in euglycemic liver disease patients at the UA hepatology clinic by using their HOMA-IR scores as a screening tool. By screening euglycemic liver disease patients for IR based on their HOMA-IR score, providers at the UA hepatology clinic can prevent liver disease progression and complications associated with IR early on. By doing so, the providers can improve the quality of care for liver disease patients. An essential part of calculating HOMA-IR is the availability of labs (serum glucose and serum insulin). A part of this QI project is to determine if the UA hepatology clinic has necessary labs to calculate HOMA-IR for euglycemic liver disease patients. A related matter is whether there is a correlation between liver disease patients' HOMA-IR score and Model for End-stage Liver Disease (MELD) score. If there is a direct correlation between HOMA-IR and MELD scores, providers can identify severity and progression of liver disease in euglycemic liver disease patients. Design: A case control retrospective study. Study Questions: 1) Do UA Hepatology clinic providers order sufficient labs (fasting plasma glucose and fasting plasma insulin) to calculate HOMA-IR in euglycemic patients? 2) What is the prevalence of IR in euglycemic liver patients indicated by HOMA-IR score? 3) Is there any correlation between HOMA-IR score and MELD score in euglycemic liver disease patients? Participants: Data will be collected from 1000 liver disease patients' at the UA hepatologyclinic, a tertiary level referral center. Settings: Banner University Medical Center (UMC) in Tucson, Arizona from January 1, 2011 until December 31, 2014. Measurements: HOMA-IR score using serum fasting glucose and serum fasting insulin levels laboratory values. MELD score to identify the severity of liver disease in euglycemic liver disease patients. Results: Among 1000 patients, 506 (60.5%) were found to have a previous diagnosis of T2DMand 395 (39.5 %) were euglycemic liver disease patients (Figure 1). Out of the 395 euglycemic liver disease patients, 217 (55%) participants were found to have both insulin level and glucose11level in their charts; 178 (45%) euglycemic liver disease patients were missing either insulin level or glucose level needed to calculate HOMA-IR score (Figure 2). Of the 217 euglycemic liver disease patients, 54.8% of had HOMA-IR> 2 and 45.2% patients had HOMA-IR<2 (Figure 3). The Pearson Correlation between HOMA-R>2 and MELD scores was 0.092 and the significance value using 2-tailed was 0.321 (Table 4). Conclusion: The results showed a significant high prevalence of IR in euglycemic patients with HOMA-IR score> 2 (54.8%) compare to those patients with HOMA-IR score<2 (45.2%). Furthermore, about 178 (45%) euglycemic liver disease patients were missing either insulin level or glucose level needed to calculate HOMA-IR score. This is a significant number of patients missing important labs to identify them as high risk for IR. This QI project identified HOMA-IRas an important screening tool that should be used both in hepatology clinics and primary healthcare settings. Use of such tool will lead to improved quality of care for euglycemic liver disease patients.
6

Modelling Issues in Three-state Progressive Processes

Kopciuk, Karen January 2001 (has links)
This dissertation focuses on several issues pertaining to three-state progressive stochastic processes. Casting survival data within a three-state framework is an effective way to incorporate intermediate events into an analysis. These events can yield valuable insights into treatment interventions and the natural history of a process, especially when the right censoring is heavy. Exploiting the uni-directional nature of these processes allows for more effective modelling of the types of incomplete data commonly encountered in practice, as well as time-dependent explanatory variables and different time scales. In Chapter 2, we extend the model developed by Frydman (1995) by incorporating explanatory variables and by permitting interval censoring for the time to the terminal event. The resulting model is quite general and combines features of the models proposed by Frydman (1995) and Kim <i>et al</i>. (1993). The decomposition theorem of Gu (1996) is used to show that all of the estimating equations arising from Frydman's log likelihood function are self-consistent. An AIDS data set analyzed by these authors is used to illustrate our regression approach. Estimating the standard errors of our regression model parameters, by adopting a piecewise constant approach for the baseline intensity parameters, is the focus of Chapter 3. We also develop data-driven algorithms which select changepoints for the intervals of support, based on the Akaike and Schwarz Information Criteria. A sensitivity study is conducted to evaluate these algorithms. The AIDS example is considered here once more; standard errors are estimated for several piecewise constant regression models selected by the model criteria. Our results indicate that for both the example and the sensitivity study, the resulting estimated standard errors of certain model parameters can be quite large. Chapter 4 evaluates the goodness-of-link function for the transition intensity between states 2 and 3 in the regression model we introduced in chapter 2. By embedding this hazard function in a one-parameter family of hazard functions, we can assess its dependence on the specific parametric form adopted. In a simulation study, the goodness-of-link parameter is estimated and its impact on the regression parameters is assessed. The logistic specification of the hazard function from state 2 to state 3 is appropriate for the discrete, parametric-based data sets considered, as well as for the AIDS data. We also investigate the uniqueness and consistency of the maximum likelihood estimates based on our regression model for these AIDS data. In Chapter 5 we consider the possible efficiency gains realized in estimating the survivor function when an intermediate auxiliary variable is incorporated into a time-to-event analysis. Both Markov and hybrid time scale frameworks are adopted in the resulting progressive three-state model. We consider three cases for the amount of information available about the auxiliary variable: the observation is completely unknown, known exactly, or known to be within an interval of time. In the Markov framework, our results suggest that observing subjects at just two time points provides as much information about the survivor function as knowing the exact time of the intermediate event. There was generally a greater loss of efficiency in the hybrid time setting. The final chapter identifies some directions for future research.
7

Modelling Land Susceptibility to Wind Erosion in Western Queensland, Australia

Mr Nicholas Webb Unknown Date (has links)
No description available.
8

Case and covariate influence: implications for model assessment

Duncan, Kristin A. 12 October 2004 (has links)
No description available.
9

eScience Approaches to Model Selection and Assessment : Applications in Bioinformatics

Eklund, Martin January 2009 (has links)
High-throughput experimental methods, such as DNA and protein microarrays, have become ubiquitous and indispensable tools in biology and biomedicine, and the number of high-throughput technologies is constantly increasing. They provide the power to measure thousands of properties of a biological system in a single experiment and have the potential to revolutionize our understanding of biology and medicine. However, the high expectations on high-throughput methods are challenged by the problem to statistically model the wealth of data in order to translate it into concrete biological knowledge, new drugs, and clinical practices. In particular, the huge number of properties measured in high-throughput experiments makes statistical model selection and assessment exigent. To use high-throughput data in critical applications, it must be warranted that the models we construct reflect the underlying biology and are not just hypotheses suggested by the data. We must furthermore have a clear picture of the risk of making incorrect decisions based on the models. The rapid improvements of computers and information technology have opened up new ways of how the problem of model selection and assessment can be approached. Specifically, eScience, i.e. computationally intensive science that is carried out in distributed network envi- ronments, provides computational power and means to efficiently access previously acquired scientific knowledge. This thesis investigates how we can use eScience to improve our chances of constructing biologically relevant models from high-throughput data. Novel methods for model selection and assessment that leverage on computational power and on prior scientific information to "guide" the model selection to models that a priori are likely to be relevant are proposed. In addition, a software system for deploying new methods and make them easily accessible to end users is presented.
10

Understanding energy-economy models: survey evidence from model users and developers in Canada

Craig, Kira 06 August 2021 (has links)
Energy-economy models are important tools used by policy-makers and researchers to design effective climate policy. However, there has been limited research that compares models against consistent characteristics to understand their impacts on climate policy projections. This can make it difficult for policy-makers to identify suitable models for their specific policy questions and develop effective climate policies. A web-based survey of energy-economy model users and developers in Canada’s public, private, and non-profit sectors (n=14) was conducted to systematically compare seventeen models against a framework of seven characteristics: technology characteristics, micro-, and macro-economic characteristics, policy representations, treatment of uncertainty, high-resolution spatial and temporal representations, and data transparency. It was found that for the most part, models represent technology, micro-, and macro-economic characteristics according to the classic typology of bottom-up, top-down, and hybrid models. However, our findings show that several modelling evolutions have occurred. Some top-down models can explicitly represent technologies and some bottom-up models incorporate microeconomic characteristics. Models differ in the types of policies they can simulate, sometimes underrepresenting performance regulations, government procurement, and research and development programs. All models incorporate at least one type of uncertainty analysis, models infrequently have high-resolution spatial and/or temporal representations, and most models lack publicly accessible methodological documents. Implications for researchers and policy-makers that use energy-economy models and/or develop policies are discussed. / Graduate

Page generated in 0.1113 seconds