Global ETD Search

1	A comparison of hypothesis testing procedures for two population proportions Hort, Molly January 1900 (has links) Master of Science / Department of Statistics / John E. Boyer Jr / It has been shown that the most straightforward approach to testing for the difference of two independent population proportions, called the Wald procedure, tends to declare differences too often. Because of this poor performance, various researchers have proposed simple adjustments to the Wald approach that tend to provide significance levels closer to the nominal. Additionally, several tests that take advantage of different methodologies have been proposed. This paper extends the work of Tebbs and Roths (2008), who wrote an R program to compare confidence interval coverage for a variety of these procedures when used to estimate a contrast in two or more binomial parameters. Their program has been adapted to generate exact significance levels and power for the two parameter hypothesis testing situation. Several combinations of binomial parameters and sample sizes are considered. Recommendations for a choice of procedure are made for practical situations. proportion hypothesis test empirical bayes methods Wald Statistics (0463)
2	Predicting Maximal Oxygen Consumption (VO2max) Levels in Adolescents Shepherd, Brent A. 09 March 2012 (has links) (PDF) Maximal oxygen consumption (VO2max) is considered by many to be the best overall measure of an individual's cardiovascular health. Collecting the measurement, however, requires subjecting an individual to prolonged periods of intense exercise until their maximal level, the point at which their body uses no additional oxygen from the air despite increased exercise intensity, is reached. Collecting VO2max data also requires expensive equipment and great subject discomfort to get accurate results. Because of this inherent difficulty, it is often avoided despite its usefulness. In this research, we propose a set of Bayesian hierarchical models to predict VO2max levels in adolescents, ages 12 through 17, using less extreme measurements. Two models are developed separately, one that uses submaximal exercise data and one that uses physical fitness questionnaire data. The best submaximal model was found to include age, gender, BMI, heart rate, rate of perceived exertion, treadmill miles per hour, and an interaction between age and heart rate. The second model, designed for those with physical limitations, uses age, gender, BMI, and two separate questionnaire results measuring physical activity levels and functional ability levels, as well as an interaction between the physical activity level score and gender. Both models use separate model variances for males and females. VO2max MCMC Bayesian Hierarchical Models Bayes Methods Statistics and Probability
3	Integration strategies for toxicity data from an empirical perspective Yang, L., Neagu, Daniel January 2014 (has links) No / The recent development of information techniques, especially the state-of-the-art “big data” solutions, enables the extracting, gathering, and processing large amount of toxicity information from multiple sources. Facilitated by this technology advance, a framework named integrated testing strategies (ITS) has been proposed in the predictive toxicology domain, in an effort to intelligently jointly use multiple heterogeneous toxicity data records (through data fusion, grouping, interpolation/extrapolation etc.) for toxicity assessment. This will ultimately contribute to accelerating the development cycle of chemical products, reducing animal use, and decreasing development costs. Most of the current study in ITS is based on a group of consensus processes, termed weight of evidence (WoE), which quantitatively integrate all the relevant data instances towards the same endpoint into an integrated decision supported by data quality. Several WoE implementations for the particular case of toxicity data fusion have been presented in the literature, which are collectively studied in this paper. Noting that these uncertainty handling methodologies are usually not simply developed from conventional probability theory due to the unavailability of big datasets, this paper first investigates the mathematical foundations of these approaches. Then, the investigated data integration models are applied to a representative case in the predictive toxicology domain, with the experimental results compared and analysed.
4	Duomenų tyrybos empirinių Bajeso metodų tyrimas ir taikymas / Analysis and application of empirical Bayes methods in data mining Jakimauskas, Gintautas 23 April 2014 (has links) Darbo tyrimų objektas yra duomenų tyrybos empiriniai Bajeso metodai ir algoritmai, taikomi didelio matavimų skaičiaus didelių populiacijų duomenų analizei. Darbo tyrimų tikslas yra sudaryti metodus ir algoritmus didelių populiacijų neparametrinių hipotezių tikrinimui ir duomenų modelių parametrų vertinimui. Šiam tikslui pasiekti yra sprendžiami tokie uždaviniai: 1. Sudaryti didelio matavimo duomenų skaidymo algoritmą. 2. Pritaikyti didelio matavimo duomenų skaidymo algoritmą neparametrinėms hipotezėms tikrinti. 3. Pritaikyti empirinį Bajeso metodą daugiamačių duomenų komponenčių nepriklausomumo hipotezei tikrinti su skirtingais matematiniais modeliais, nustatant optimalų modelį ir atitinkamą empirinį Bajeso įvertinį. 4. Sudaryti didelių populiacijų retų įvykių dažnių vertinimo algoritmą panaudojant empirinį Bajeso metodą palyginant Puasono-gama ir Puasono-Gauso matematinius modelius. 5. Sudaryti retų įvykių logistinės regresijos algoritmą panaudojant empirinį Bajeso metodą. Darbo metu gauti nauji rezultatai įgalina atlikti didelio matavimo duomenų skaidymą; atlikti didelio matavimo nekoreliuotų duomenų pasirinktų komponenčių nepriklausomumo tikrinimą; parinkti didelių populiacijų retų įvykių optimalų modelį ir atitinkamą empirinį Bajeso įvertinį. Pateikta nesinguliarumo sąlyga Puasono-gama modelio atveju. / The research object is data mining empirical Bayes methods and algorithms applied in the analysis of large populations of large dimensions. The aim and objectives of the research are to create methods and algorithms for testing nonparametric hypotheses for large populations and for estimating the parameters of data models. The following problems are solved to reach these objectives: 1. To create an efficient data partitioning algorithm of large dimensional data. 2. To apply the data partitioning algorithm of large dimensional data in testing nonparametric hypotheses. 3. To apply the empirical Bayes method in testing the independence of components of large dimensional data vectors. 4. To develop an algorithm for estimating probabilities of rare events in large populations, using the empirical Bayes method and comparing Poisson-gamma and Poisson-Gaussian mathematical models, by selecting an optimal model and a respective empirical Bayes estimator. 5. To create an algorithm for logistic regression of rare events using the empirical Bayes method. The results obtained enables us to perform very fast and efficient partitioning of large dimensional data; testing the independence of selected components of large dimensional data; selecting the optimal model in the estimation of probabilities of rare events, using the Poisson-gamma and Poisson-Gaussian mathematical models and empirical Bayes estimators. The nonsingularity condition in the case of the Poisson-gamma model is presented. Informatics Empiriniai Bajeso metodai Duomenų tyryba Didelės populiacijos Empirical Bayes methods Data mining Large populations
5	Analysis and application of empirical Bayes methods in data mining / Duomenų tyrybos empirinių Bajeso metodų tyrimas ir taikymas Jakimauskas, Gintautas 23 April 2014 (has links) The research object is data mining empirical Bayes methods and algorithms applied in the analysis of large populations of large dimensions. The aim and objectives of the research are to create methods and algorithms for testing nonparametric hypotheses for large populations and for estimating the parameters of data models. The following problems are solved to reach these objectives: 1. To create an efficient data partitioning algorithm of large dimensional data. 2. To apply the data partitioning algorithm of large dimensional data in testing nonparametric hypotheses. 3. To apply the empirical Bayes method in testing the independence of components of large dimensional data vectors. 4. To develop an algorithm for estimating probabilities of rare events in large populations, using the empirical Bayes method and comparing Poisson-gamma and Poisson-Gaussian mathematical models, by selecting an optimal model and a respective empirical Bayes estimator. 5. To create an algorithm for logistic regression of rare events using the empirical Bayes method. The results obtained enables us to perform very fast and efficient partitioning of large dimensional data; testing the independence of selected components of large dimensional data; selecting the optimal model in the estimation of probabilities of rare events, using the Poisson-gamma and Poisson-Gaussian mathematical models and empirical Bayes estimators. The nonsingularity condition in the case of the Poisson-gamma model is presented. / Darbo tyrimų objektas yra duomenų tyrybos empiriniai Bajeso metodai ir algoritmai, taikomi didelio matavimų skaičiaus didelių populiacijų duomenų analizei. Darbo tyrimų tikslas yra sudaryti metodus ir algoritmus didelių populiacijų neparametrinių hipotezių tikrinimui ir duomenų modelių parametrų vertinimui. Šiam tikslui pasiekti yra sprendžiami tokie uždaviniai: 1. Sudaryti didelio matavimo duomenų skaidymo algoritmą. 2. Pritaikyti didelio matavimo duomenų skaidymo algoritmą neparametrinėms hipotezėms tikrinti. 3. Pritaikyti empirinį Bajeso metodą daugiamačių duomenų komponenčių nepriklausomumo hipotezei tikrinti su skirtingais matematiniais modeliais, nustatant optimalų modelį ir atitinkamą empirinį Bajeso įvertinį. 4. Sudaryti didelių populiacijų retų įvykių dažnių vertinimo algoritmą panaudojant empirinį Bajeso metodą palyginant Puasono-gama ir Puasono-Gauso matematinius modelius. 5. Sudaryti retų įvykių logistinės regresijos algoritmą panaudojant empirinį Bajeso metodą. Darbo metu gauti nauji rezultatai įgalina atlikti didelio matavimo duomenų skaidymą; atlikti didelio matavimo nekoreliuotų duomenų pasirinktų komponenčių nepriklausomumo tikrinimą; parinkti didelių populiacijų retų įvykių optimalų modelį ir atitinkamą empirinį Bajeso įvertinį. Pateikta nesinguliarumo sąlyga Puasono-gama modelio atveju. Informatics Empirical Bayes methods Data mining Large populations Empiriniai Bajeso metodai Duomenų tyryba Didelės populiacijos
6	ReGen: Optimizing Genetic Selection Algorithms for Heterogeneous Computing Winkleblack, Scott Kenneth Swinkleb 01 June 2014 (has links) (PDF) GenSel is a genetic selection analysis tool used to determine which genetic markers are informational for a given trait. Performing genetic selection related analyses is a time consuming and computationally expensive task. Due to an expected increase in the number of genotyped individuals, analysis times will increase dramatically. Therefore, optimization efforts must be made to keep analysis times reasonable. This thesis focuses on optimizing one of GenSel’s underlying algorithms for heterogeneous computing. The resulting algorithm exposes task-level parallelism and data-level parallelism present but inaccessible in the original algorithm. The heterogeneous computing solution, ReGen, outperforms the optimized CPU implementation achieving a 1.84 times speedup. Heterogeneous computing Bayes methods Bioinformatics Computer and Systems Architecture Other Computer Sciences Theory and Algorithms
7	Comparing survival from cancer using population-based cancer registry data - methods and applications Yu, Xue Qin January 2007 (has links) Doctor of Philosophy / Over the past decade, population-based cancer registry data have been used increasingly worldwide to evaluate and improve the quality of cancer care. The utility of the conclusions from such studies relies heavily on the data quality and the methods used to analyse the data. Interpretation of comparative survival from such data, examining either temporal trends or geographical differences, is generally not easy. The observed differences could be due to methodological and statistical approaches or to real effects. For example, geographical differences in cancer survival could be due to a number of real factors, including access to primary health care, the availability of diagnostic and treatment facilities and the treatment actually given, or to artefact, such as lead-time bias, stage migration, sampling error or measurement error. Likewise, a temporal increase in survival could be the result of earlier diagnosis and improved treatment of cancer; it could also be due to artefact after the introduction of screening programs (adding lead time), changes in the definition of cancer, stage migration or several of these factors, producing both real and artefactual trends. In this thesis, I report methods that I modified and applied, some technical issues in the use of such data, and an analysis of data from the State of New South Wales (NSW), Australia, illustrating their use in evaluating and potentially improving the quality of cancer care, showing how data quality might affect the conclusions of such analyses. This thesis describes studies of comparative survival based on population-based cancer registry data, with three published papers and one accepted manuscript (subject to minor revision). In the first paper, I describe a modified method for estimating spatial variation in cancer survival using empirical Bayes methods (which was published in Cancer Causes and Control 2004). I demonstrate in this paper that the empirical Bayes method is preferable to standard approaches and show how it can be used to identify cancer types where a focus on reducing area differentials in survival might lead to important gains in survival. In the second paper (published in the European Journal of Cancer 2005), I apply this method to a more complete analysis of spatial variation in survival from colorectal cancer in NSW and show that estimates of spatial variation in colorectal cancer can help to identify subgroups of patients for whom better application of treatment guidelines could improve outcome. I also show how estimates of the numbers of lives that could be extended might assist in setting priorities for treatment improvement. In the third paper, I examine time trends in survival from 28 cancers in NSW between 1980 and 1996 (published in the International Journal of Cancer 2006) and conclude that for many cancers, falls in excess deaths in NSW from 1980 to 1996 are unlikely to be attributable to earlier diagnosis or stage migration; thus, advances in cancer treatment have probably contributed to them. In the accepted manuscript, I described an extension of the work reported in the second paper, investigating the accuracy of staging information recorded in the registry database and assessing the impact of error in its measurement on estimates of spatial variation in survival from colorectal cancer. The results indicate that misclassified registry stage can have an important impact on estimates of spatial variation in stage-specific survival from colorectal cancer. Thus, if cancer registry data are to be used effectively in evaluating and improving cancer care, the quality of stage data might have to be improved. Taken together, the four papers show that creative, informed use of population-based cancer registry data, with appropriate statistical methods and acknowledgement of the limitations of the data, can be a valuable tool for evaluating and possibly improving cancer care. Use of these findings to stimulate evaluation of the quality of cancer care should enhance the value of the investment in cancer registries. They should also stimulate improvement in the quality of cancer registry data, particularly that on stage at diagnosis. The methods developed in this thesis may also be used to improve estimation of geographical variation in other count-based health measures when the available data are sparse.
8	Comparing survival from cancer using population-based cancer registry data - methods and applications Yu, Xue Qin January 2007 (has links) Doctor of Philosophy / Over the past decade, population-based cancer registry data have been used increasingly worldwide to evaluate and improve the quality of cancer care. The utility of the conclusions from such studies relies heavily on the data quality and the methods used to analyse the data. Interpretation of comparative survival from such data, examining either temporal trends or geographical differences, is generally not easy. The observed differences could be due to methodological and statistical approaches or to real effects. For example, geographical differences in cancer survival could be due to a number of real factors, including access to primary health care, the availability of diagnostic and treatment facilities and the treatment actually given, or to artefact, such as lead-time bias, stage migration, sampling error or measurement error. Likewise, a temporal increase in survival could be the result of earlier diagnosis and improved treatment of cancer; it could also be due to artefact after the introduction of screening programs (adding lead time), changes in the definition of cancer, stage migration or several of these factors, producing both real and artefactual trends. In this thesis, I report methods that I modified and applied, some technical issues in the use of such data, and an analysis of data from the State of New South Wales (NSW), Australia, illustrating their use in evaluating and potentially improving the quality of cancer care, showing how data quality might affect the conclusions of such analyses. This thesis describes studies of comparative survival based on population-based cancer registry data, with three published papers and one accepted manuscript (subject to minor revision). In the first paper, I describe a modified method for estimating spatial variation in cancer survival using empirical Bayes methods (which was published in Cancer Causes and Control 2004). I demonstrate in this paper that the empirical Bayes method is preferable to standard approaches and show how it can be used to identify cancer types where a focus on reducing area differentials in survival might lead to important gains in survival. In the second paper (published in the European Journal of Cancer 2005), I apply this method to a more complete analysis of spatial variation in survival from colorectal cancer in NSW and show that estimates of spatial variation in colorectal cancer can help to identify subgroups of patients for whom better application of treatment guidelines could improve outcome. I also show how estimates of the numbers of lives that could be extended might assist in setting priorities for treatment improvement. In the third paper, I examine time trends in survival from 28 cancers in NSW between 1980 and 1996 (published in the International Journal of Cancer 2006) and conclude that for many cancers, falls in excess deaths in NSW from 1980 to 1996 are unlikely to be attributable to earlier diagnosis or stage migration; thus, advances in cancer treatment have probably contributed to them. In the accepted manuscript, I described an extension of the work reported in the second paper, investigating the accuracy of staging information recorded in the registry database and assessing the impact of error in its measurement on estimates of spatial variation in survival from colorectal cancer. The results indicate that misclassified registry stage can have an important impact on estimates of spatial variation in stage-specific survival from colorectal cancer. Thus, if cancer registry data are to be used effectively in evaluating and improving cancer care, the quality of stage data might have to be improved. Taken together, the four papers show that creative, informed use of population-based cancer registry data, with appropriate statistical methods and acknowledgement of the limitations of the data, can be a valuable tool for evaluating and possibly improving cancer care. Use of these findings to stimulate evaluation of the quality of cancer care should enhance the value of the investment in cancer registries. They should also stimulate improvement in the quality of cancer registry data, particularly that on stage at diagnosis. The methods developed in this thesis may also be used to improve estimation of geographical variation in other count-based health measures when the available data are sparse.
9	Modélisation des données d'attractivité hospitalière par les modèles d'utilité / Modeling hospital attractivity data by using utility models Saley, Issa 29 November 2017 (has links) Savoir comment les patients choisissent les hôpitaux est d'une importance majeure non seulement pour les gestionnaires des hôpitaux mais aussi pour les décideurs. Il s'agit entre autres pour les premiers, de la gestion des flux et l'offre des soins et pour les seconds, l'implémentation des reformes dans le système de santé.Nous proposons dans cette thèse différentes modélisations des données d'admission de patients en fonction de la distance par rapport à un hôpital afin de prévoir le flux des patients et de comparer son attractivité par rapport à d'autres hôpitaux. Par exemple, nous avons utilisé des modèles bayésiens hiérarchiques pour des données de comptage avec possible dépendance spatiale. Des applications on été faites sur des données d'admission de patients dans la région de Languedoc-Roussillon.Nous avons aussi utilisé des modèles de choix discrets tels que les RUMs. Mais vu certaines limites qu'ils présentent pour notre objectif, nous avons relâché l'hypothèse de maximisation d'utilité pour une plus souple et selon laquelle un agent (patient) peut choisir un produit (donc hôpital) dès lors que l'utilité que lui procure ce produit a atteint un certain seuil de satisfaction, en considérant certains aspects. Une illustration de cette approche est faite sur trois hôpitaux de l'Hérault pour les séjours dus à l'asthme en 2009 pour calculer l'envergure territorial d'un hôpital donné . / Understanding how patients choose hospitals is of utmost importance for both hospitals administrators and healthcare decision makers; the formers for patients incoming tide and the laters for regulations.In this thesis, we present different methods of modelling patients admission data in order to forecast patients incoming tide and compare hospitals attractiveness.The two first method use counting data models with possible spatial dependancy. Illustration is done on patients admission data in Languedoc-Roussillon.The third method uses discrete choice models (RUMs). Due to some limitations of these models according to our goal, we introduce a new approach where we released the assumption of utility maximization for an utility-threshold ; that is to say that an agent (patient) can choose an alternative (hospital) since he thinks that he can obtain a certain level of satisfaction of doing so, according to some aspects. Illustration of the approach is done on 2009 asthma admission data in Hérault. Modèles d'utilités Methodes MCMC Méthodes bayésiennes Attractivité hospitalière Données avec excès de zéros Utility models MCMC methods Bayes methods Hospital attractiveness Zero-Inflated Data
10	Amélioration de la vitesse et de la qualité d'image du rendu basé image / Improving speed and image quality of image-based rendering Ortiz Cayón, Rodrigo 03 February 2017 (has links) Le rendu photo-réaliste traditionnel exige un effort manuel et des calculs intensifs pour créer des scènes et rendre des images réalistes. C'est principalement pour cette raison que la création de contenus pour l’imagerie numérique de haute qualité a été limitée aux experts et le rendu hautement réaliste nécessite encore des temps de calcul significatifs. Le rendu basé image (IBR) est une alternative qui a le potentiel de rendre les applications de création et de rendu de contenus de haute qualité accessibles aux utilisateurs occasionnels, puisqu'ils peuvent générer des images photo-réalistes de haute qualité sans subir les limitations mentionnées ci-dessus. Nous avons identifié trois limitations importantes des méthodes actuelles de rendu basé image : premièrement, chaque algorithme possède des forces et faiblesses différentes, en fonction de la qualité de la reconstruction 3D et du contenu de la scène, et un seul algorithme ne permet souvent pas d’obtenir la meilleure qualité de rendu partout dans l’image. Deuxièmement, ces algorithmes présentent de forts artefacts lors du rendu d’objets manquants ou partiellement reconstruits. Troisièmement, la plupart des méthodes souffrent encore d'artefacts visuels significatifs dans les régions de l’image où la reconstruction est de faible qualité. Dans l'ensemble, cette thèse propose plusieurs améliorations significatives du rendu basé image aussi bien en termes de vitesse de rendu que de qualité d’image. Ces nouvelles solutions sont basées sur le rendu sélectif, la substitution de modèle basé sur l'apprentissage, et la prédiction et la correction des erreurs de profondeur. / Traditional photo-realistic rendering requires intensive manual and computational effort to create scenes and render realistic images. Thus, creation of content for high quality digital imagery has been limited to experts and highly realistic rendering still requires significant computational time. Image-Based Rendering (IBR) is an alternative which has the potential of making high-quality content creation and rendering applications accessible to casual users, since they can generate high quality photo-realistic imagery without the limitations mentioned above. We identified three important shortcomings of current IBR methods: First, each algorithm has different strengths and weaknesses, depending on 3D reconstruction quality and scene content and often no single algorithm offers the best image quality everywhere in the image. Second, such algorithms present strong artifacts when rendering partially reconstructed objects or missing objects. Third, most methods still result in significant visual artifacts in image regions where reconstruction is poor. Overall, this thesis addresses significant shortcomings of IBR for both speed and image quality, offering novel and effective solutions based on selective rendering, learning-based model substitution and depth error prediction and correction. Rendu et modélisation basé image Rendu Méthode de Bayes Estimation Reconstruction Superpixels Deap learning Image-based rendering and modeling Rendering Bayes methods Estimation MVS reconstruction shape-preserving warp Superpixels Deap learning

Search results