Global ETD Search

21	DEFT guessing: using inductive transfer to improve rule evaluation from limited data Reid, Mark Darren, Computer Science & Engineering, Faculty of Engineering, UNSW January 2007 (has links) Algorithms that learn sets of rules describing a concept from its examples have been widely studied in machine learning and have been applied to problems in medicine, molecular biology, planning and linguistics. Many of these algorithms used a separate-and-conquer strategy, repeatedly searching for rules that explain different parts of the example set. When examples are scarce, however, it is difficult for these algorithms to evaluate the relative quality of two or more rules which fit the examples equally well. This dissertation proposes, implements and examines a general technique for modifying rule evaluation in order to improve learning performance in these situations. This approach, called Description-based Evaluation Function Transfer (DEFT), adjusts the way rules are evaluated on a target concept by taking into account the performance of similar rules on a related support task that is supplied by a domain expert. Central to this approach is a novel theory of task similarity that is defined in terms of syntactic properties of rules, called descriptions, which define what it means for rules to be similar. Each description is associated with a prior distribution over classification probabilities derived from the support examples and a rule's evaluation on a target task is combined with the relevant prior using Bayes' rule. Given some natural conditions regarding the similarity of the target and support task, it is shown that modifying rule evaluation in this way is guaranteed to improve estimates of the true classification probabilities. Algorithms to efficiently implement Deft are described, analysed and used to measure the effect these improvements have on the quality of induced theories. Empirical studies of this implementation were carried out on two artificial and two real-world domains. The results show that the inductive transfer of evaluation bias based on rule similarity is an effective and practical way to improve learning when training examples are limited. Machine learning. Transfer learning. Inductive transfer. Empirical Bayes. Multitask learning. Computer programming. Logic programming. Induction (Logic)
22	Revisiting Empirical Bayes Methods and Applications to Special Types of Data Duan, Xiuwen 29 June 2021 (has links) Empirical Bayes methods have been around for a long time and have a wide range of applications. These methods provide a way in which historical data can be aggregated to provide estimates of the posterior mean. This thesis revisits some of the empirical Bayesian methods and develops new applications. We first look at a linear empirical Bayes estimator and apply it on ranking and symbolic data. Next, we consider Tweedie’s formula and show how it can be applied to analyze a microarray dataset. The application of the formula is simplified with the Pearson system of distributions. Saddlepoint approximations enable us to generalize several results in this direction. The results show that the proposed methods perform well in applications to real data sets. Empirical Bayes Ranking data Symbolic data Tweedie’s formula Pearson system Saddlepoint approximation
23	Study designs and statistical methods for pharmacogenomics and drug interaction studies Zhang, Pengyue 01 April 2016 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / Adverse drug events (ADEs) are injuries resulting from drug-related medical interventions. ADEs can be either induced by a single drug or a drug-drug interaction (DDI). In order to prevent unnecessary ADEs, many regulatory agencies in public health maintain pharmacovigilance databases for detecting novel drug-ADE associations. However, pharmacovigilance databases usually contain a significant portion of false associations due to their nature structure (i.e. false drug-ADE associations caused by co-medications). Besides pharmacovigilance studies, the risks of ADEs can be minimized by understating their mechanisms, which include abnormal pharmacokinetics/pharmacodynamics due to genetic factors and synergistic effects between drugs. During the past decade, pharmacogenomics studies have successfully identified several predictive markers to reduce ADE risks. While, pharmacogenomics studies are usually limited by the sample size and budget. In this dissertation, we develop statistical methods for pharmacovigilance and pharmacogenomics studies. Firstly, we propose an empirical Bayes mixture model to identify significant drug-ADE associations. The proposed approach can be used for both signal generation and ranking. Following this approach, the portion of false associations from the detected signals can be well controlled. Secondly, we propose a mixture dose response model to investigate the functional relationship between increased dimensionality of drug combinations and the ADE risks. Moreover, this approach can be used to identify high-dimensional drug combinations that are associated with escalated ADE risks at a significantly low local false discovery rates. Finally, we proposed a cost-efficient design for pharmacogenomics studies. In order to pursue a further cost-efficiency, the proposed design involves both DNA pooling and two-stage design approach. Compared to traditional design, the cost under the proposed design will be reduced dramatically with an acceptable compromise on statistical power. The proposed methods are examined by extensive simulation studies. Furthermore, the proposed methods to analyze pharmacovigilance databases are applied to the FDA’s Adverse Reporting System database and a local electronic medical record (EMR) database. For different scenarios of pharmacogenomics study, optimized designs to detect a functioning rare allele are given as well. FAERS Drug-drug interaction Empirical Bayes Pharmacogenomics Pharmacovigilance Two-stage design
24	Safety Improvements On Multilane Arterials A Before And After Evaluation Using The Empirical Bayes Method Devarasetty, Prem Chand 01 January 2009 (has links) This study examines the safety effects of the improvements made on multi-lane arterials. The improvements were divided into two categories 1) corridor level improvements, and 2) intersection improvements. Empirical Bayes method, which is one of the most accepted approaches for conducting before-after evaluations, has been used to assess the safety effects of the improvement projects. Safety effects are estimated not only in terms of all crashes but also rear-end (most common type) as well as severe crashes (crashes involving incapacitating and/or fatal injuries) and also angle crashes for intersection improvements. The Safety Performance Functions (SPFs) used in this study are negative binomial crash frequency estimation models that use the information on ADT, length of the segments, speed limit, and number of lanes for corridors. And for intersections the explanatory variables used are ADT, number of lanes, speed limit on major road, and number of lanes on the minor road. GENMOD procedure in SAS was used to develop the SPFs. Corridor SPFs are segregated by crash groups (all, rear-end, and severe), length of the segments being evaluated, and land use (urban, suburban and rural). The results of the analysis show that the resulting changes in safety following corridor level improvements vary widely. Although the safety effect of projects involving the same type of improvement varied, the overall effectiveness of each of the corridor level improvements were found to be positive in terms of reduction in crashes of each crash type considered (total, severe, and rear-end) except for resurfacing projects where the total number of crashes slightly increased after the roadway section is resurfaced. Evaluating additional improvements carried out with resurfacing activities showed that all (other than sidewalk improvements for total crashes) of them consistently led to improvements in safety of multilane arterial sections. It leads to the inference that it may be a good idea to take up additional improvements if it is cost effective to do them along with resurfacing. It was also found that the addition of turning lanes (left and/or right) and paving shoulders were two improvements associated with a projectï¿½s relative performance in terms of reduction in rear-end crashes. No improvements were found to be associated with a resurfacing projectï¿½s relative performance in terms of changes in (i.e., reducing) severe crashes. For intersection improvements also the individual results of each project varied widely. Except for adding turn lane(s) all other improvements showed a positive impact on safety in terms of reducing the number of crashes for all the crash types (total, severe, angle, and rear-end) considered. Indicating that the design guidelines for this work type have to be revisited and safety aspect has to be considered while implementing them. In all it can be concluded that FDOT is doing a good job in selecting the sites for treatment and it is very successful in improving the safety of the sections being treated although the main objective(s) of the treatments are not necessarily safety related. Safety effectiveness Improvements Safety Performance Functions Empirical Bayes Multi-lane Arterials Crashes Civil Engineering Engineering
25	Robust Bayes in Hierarchical Modeling and Empirical BayesAnalysis in Multivariate Estimation Wang, Xiaomu January 2015 (has links) No description available. Statistics
26	Performing Network Level Crash Evaluation Using Skid Resistance McCarthy, Ross James 09 September 2015 (has links) Evaluation of crash count data as a function of roadway characteristics allows Departments of Transportation to predict expected average crash risks in order to assist in identifying segments that could benefit from various treatments. Currently, the evaluation is performed using negative binomial regression, as a function of average annual daily traffic (AADT) and other variables. For this thesis, a crash study was carried out for the interstate, primary and secondary routes, in the Salem District of Virginia. The data used in the study included the following information obtained from Virginia Department of Transportation (VDOT) records: 2010 to 2012 crash data, 2010 to 2012 AADT, and horizontal radius of curvature (CV). Additionally, tire-pavement friction or skid resistance was measured using a continuous friction measurement, fixed-slip device called a Grip Tester. In keeping with the current practice, negative binomial regression was used to relate the crash data to the AADT, skid resistance and CV. To determine which of the variables to include in the final models, the Akaike Information Criterion (AIC) and Log-Likelihood Ratio Tests were performed. By mathematically combining the information acquired from the negative binomial regression models and the information contained in the crash counts, the parameters of each network's true average crash risks were empirically estimated using the Empirical Bayes (EB) approach. The new estimated average crash risks were then used to rank segments according to their empirically estimated crash risk and to prioritize segments according to their expected crash reduction if a friction treatment were applied. / Master of Science skid resistance Poisson Poisson-Gamma Negative Binomial Safety Performance Function Empirical Bayes
27	Melhor preditor empírico aplicado aos modelos beta mistos / Empirical best predictor for mixed beta regression models Zerbeto, Ana Paula 21 February 2014 (has links) Os modelos beta mistos são amplamente utilizados na análise de dados que apresentam uma estrutura hierárquica e que assumem valores em um intervalo restrito conhecido. Com o objetivo de propor um método de predição dos componentes aleatórios destes, os resultados previamente obtidos na literatura para o preditor de Bayes empírico foram estendidos aos modelos de regressão beta com intercepto aleatório normalmente distribuído. O denominado melhor preditor empírico (MPE) proposto tem aplicação em duas situações diferentes: quando se deseja fazer predição sobre os efeitos individuais de novos elementos de grupos que já fizeram parte da base de ajuste e quando os grupos não pertenceram à tal base. Estudos de simulação foram delineados e seus resultados indicaram que o desempenho do MPE foi eficiente e satisfatório em diversos cenários. Ao utilizar-se da proposta na análise de dois bancos de dados da área da saúde, observou-se os mesmos resultados obtidos nas simulações nos dois casos abordados. Tanto nas simulações, quanto nas análises de dados reais, foram observados bons desempenhos. Assim, a metodologia proposta se mostrou promissora para o uso em modelos beta mistos, nos quais se deseja fazer predições. / The mixed beta regression models are extensively used to analyse data with hierarquical structure and that take values in a restricted and known interval. In order to propose a prediction method for their random components, the results previously obtained in the literature for the empirical Bayes predictor were extended to beta regression models with random intercept normally distributed. The proposed predictor, called empirical best predictor (EBP), can be applied in two situations: when the interest is predict individuals effects for new elements of groups that were already analysed by the fitted model and, also, for elements of new groups. Simulation studies were designed and their results indicated that the performance of EBP was efficient and satisfatory in most of scenarios. Using the propose to analyse two health databases, the same results of simulations were observed in both two cases of application, and good performances were observed. So, the proposed method is promissing for the use in predictions for mixed beta regression models. efeitos aleatórios empirical Bayes predictor mixed beta regression model modelo beta misto predição prediction preditor de Bayes empírico random effects
28	Melhor preditor empírico aplicado aos modelos beta mistos / Empirical best predictor for mixed beta regression models Ana Paula Zerbeto 21 February 2014 (has links) Os modelos beta mistos são amplamente utilizados na análise de dados que apresentam uma estrutura hierárquica e que assumem valores em um intervalo restrito conhecido. Com o objetivo de propor um método de predição dos componentes aleatórios destes, os resultados previamente obtidos na literatura para o preditor de Bayes empírico foram estendidos aos modelos de regressão beta com intercepto aleatório normalmente distribuído. O denominado melhor preditor empírico (MPE) proposto tem aplicação em duas situações diferentes: quando se deseja fazer predição sobre os efeitos individuais de novos elementos de grupos que já fizeram parte da base de ajuste e quando os grupos não pertenceram à tal base. Estudos de simulação foram delineados e seus resultados indicaram que o desempenho do MPE foi eficiente e satisfatório em diversos cenários. Ao utilizar-se da proposta na análise de dois bancos de dados da área da saúde, observou-se os mesmos resultados obtidos nas simulações nos dois casos abordados. Tanto nas simulações, quanto nas análises de dados reais, foram observados bons desempenhos. Assim, a metodologia proposta se mostrou promissora para o uso em modelos beta mistos, nos quais se deseja fazer predições. / The mixed beta regression models are extensively used to analyse data with hierarquical structure and that take values in a restricted and known interval. In order to propose a prediction method for their random components, the results previously obtained in the literature for the empirical Bayes predictor were extended to beta regression models with random intercept normally distributed. The proposed predictor, called empirical best predictor (EBP), can be applied in two situations: when the interest is predict individuals effects for new elements of groups that were already analysed by the fitted model and, also, for elements of new groups. Simulation studies were designed and their results indicated that the performance of EBP was efficient and satisfatory in most of scenarios. Using the propose to analyse two health databases, the same results of simulations were observed in both two cases of application, and good performances were observed. So, the proposed method is promissing for the use in predictions for mixed beta regression models. efeitos aleatórios modelo beta misto predição preditor de Bayes empírico empirical Bayes predictor mixed beta regression model prediction random effects
29	Comparing survival from cancer using population-based cancer registry data - methods and applications Yu, Xue Qin January 2007 (has links) Doctor of Philosophy / Over the past decade, population-based cancer registry data have been used increasingly worldwide to evaluate and improve the quality of cancer care. The utility of the conclusions from such studies relies heavily on the data quality and the methods used to analyse the data. Interpretation of comparative survival from such data, examining either temporal trends or geographical differences, is generally not easy. The observed differences could be due to methodological and statistical approaches or to real effects. For example, geographical differences in cancer survival could be due to a number of real factors, including access to primary health care, the availability of diagnostic and treatment facilities and the treatment actually given, or to artefact, such as lead-time bias, stage migration, sampling error or measurement error. Likewise, a temporal increase in survival could be the result of earlier diagnosis and improved treatment of cancer; it could also be due to artefact after the introduction of screening programs (adding lead time), changes in the definition of cancer, stage migration or several of these factors, producing both real and artefactual trends. In this thesis, I report methods that I modified and applied, some technical issues in the use of such data, and an analysis of data from the State of New South Wales (NSW), Australia, illustrating their use in evaluating and potentially improving the quality of cancer care, showing how data quality might affect the conclusions of such analyses. This thesis describes studies of comparative survival based on population-based cancer registry data, with three published papers and one accepted manuscript (subject to minor revision). In the first paper, I describe a modified method for estimating spatial variation in cancer survival using empirical Bayes methods (which was published in Cancer Causes and Control 2004). I demonstrate in this paper that the empirical Bayes method is preferable to standard approaches and show how it can be used to identify cancer types where a focus on reducing area differentials in survival might lead to important gains in survival. In the second paper (published in the European Journal of Cancer 2005), I apply this method to a more complete analysis of spatial variation in survival from colorectal cancer in NSW and show that estimates of spatial variation in colorectal cancer can help to identify subgroups of patients for whom better application of treatment guidelines could improve outcome. I also show how estimates of the numbers of lives that could be extended might assist in setting priorities for treatment improvement. In the third paper, I examine time trends in survival from 28 cancers in NSW between 1980 and 1996 (published in the International Journal of Cancer 2006) and conclude that for many cancers, falls in excess deaths in NSW from 1980 to 1996 are unlikely to be attributable to earlier diagnosis or stage migration; thus, advances in cancer treatment have probably contributed to them. In the accepted manuscript, I described an extension of the work reported in the second paper, investigating the accuracy of staging information recorded in the registry database and assessing the impact of error in its measurement on estimates of spatial variation in survival from colorectal cancer. The results indicate that misclassified registry stage can have an important impact on estimates of spatial variation in stage-specific survival from colorectal cancer. Thus, if cancer registry data are to be used effectively in evaluating and improving cancer care, the quality of stage data might have to be improved. Taken together, the four papers show that creative, informed use of population-based cancer registry data, with appropriate statistical methods and acknowledgement of the limitations of the data, can be a valuable tool for evaluating and possibly improving cancer care. Use of these findings to stimulate evaluation of the quality of cancer care should enhance the value of the investment in cancer registries. They should also stimulate improvement in the quality of cancer registry data, particularly that on stage at diagnosis. The methods developed in this thesis may also be used to improve estimation of geographical variation in other count-based health measures when the available data are sparse.
30	Prediction of recurrent events Fredette, Marc January 2004 (has links) In this thesis, we will study issues related to prediction problems and put an emphasis on those arising when recurrent events are involved. First we define the basic concepts of frequentist and Bayesian statistical prediction in the first chapter. In the second chapter, we study frequentist prediction intervals and their associated predictive distributions. We will then present an approach based on asymptotically uniform pivotals that is shown to dominate the plug-in approach under certain conditions. The following three chapters consider the prediction of recurrent events. The third chapter presents different prediction models when these events can be modeled using homogeneous Poisson processes. Amongst these models, those using random effects are shown to possess interesting features. In the fourth chapter, the time homogeneity assumption is relaxed and we present prediction models for non-homogeneous Poisson processes. The behavior of these models is then studied for prediction problems with a finite horizon. In the fifth chapter, we apply the concepts discussed previously to a warranty dataset coming from the automobile industry. The number of processes in this dataset being very large, we focus on methods providing computationally rapid prediction intervals. Finally, we discuss the possibilities of future research in the last chapter. Statistics Prediction methods random effects models longitudinal study nonhomogeneous poisson processes coverage probability calibration approximate pivotals empirical bayes

Search results