Global ETD Search

171	a Bayesian test of independence of two categorical variables obtianed from a small area : an application to BMD and BMI zhou, jingran 19 December 2011 (has links) "Scientists usually need to understand the extent of the association of two attributes, and the data are typically presented in two-way categorical tables. In science, the chi-squared test is routinely used to analyze data from such tables. However, in many applications the chi-squared test can be defective. For example, when the sample size is small, the chi-squared test may not be applicable. The terms small area" and local area" are commonly used to denote a small geographical area, such as a county. If a survey has been carried out, the sample size within any particular small area may be too small to generate accurate estimates from the data, and a chi-squared test may be invalid (i.e., expected frequencies in some cells of the table are less than ?ve). To deal with this problem we use Bayesian small area estimation. Because it is used toorrow strength" from related or similar areas. It enhances the information of each area with common exchangeable information. We use a Bayesian model to estimate a Bayes factor to test the independence of the two variables. We apply the model to test for the independence between bone mineral density (BMD) and body mass index (BMI) from 31 counties and we compare the results with a direct Bayes factor test. We have also obtained numerical and sampling errors; both the numerical and sampling errors of our Bayes factor are small. Our model is shown to be much less sensitive to the speci?cation of the prior distribution than the direct Bayes factor test which is based on each area only." small area categorical variables independence Bayesian test
172	Supervised learning for back analysis of excavations in the observational method Jin, Yingyan January 2018 (has links) In the past few decades, demand for construction in underground spaces has increased dramatically in urban areas with high population densities. However, the impact of the construction of underground structures on surrounding infrastructure raises concerns since movements caused by deep excavations might damage adjacent buildings. Unfortunately, the prediction of geotechnical behaviour is difficult due to uncertainties and lack of information of on the underground environment. Therefore, to ensure safety, engineers tend to choose very conservative designs that result in requiring unnecessary material and longer construction time. The observational method, which was proposed by Peck in 1969, and formalised in Eurocode 7 in 1987, provides a way to avoid such redundancy by modifying the design based on the knowledge gathered during construction. The review process within the observational method is recognised as back analysis. Supervised learning can aid in this process, providing a systematic procedure to assess soil parameters based on monitoring data and prediction of the ground response. A probabilistic model is developed in this research to account for the uncertainties in the problem. Sequential Bayesian inference is used to update the soil parameters at each excavation stage when observations are available. The accuracy of the prediction for future stages improves at each stage. Meanwhile, the uncertainty contained in the prediction decreases, and therefore the confidence on the corresponding design also increases. Moreover, the Bayesian method integrates subjective engineering experience and objective observations in a rational and quantitative way, which enables the model to update soil parameters even when the amount of data is very limited. It also allows the use of the knowledge learnt from comparable ground conditions, which is particularly useful in the absence of site-specific information on ground conditions. Four probabilistic models are developed in this research. The first two incorporate empirical excavation design methods. These simple models are used to examine the practicality of the approach with several cases. The next two are coupled with a program called FREW, which is able to simulate the excavation process, still in a relatively simplistic way. The baseline model with simple assumptions on model error and another one is a more sophisticated model considering measurement error and spatial relationships among the observations. Their efficiency and accuracy are verified using a synthetic case and tested based on a case history from the London Crossrail project. In the end, the models are compared and their flexibility in different cases is discussed.
173	Determining effective methods of presenting Bayesian problems to a general audience Dewitt, Stephen Harrison January 2017 (has links) The thesis presents six experiments designed to further understanding of effective methods of presenting Bayesian problems to a general audience. The fi rst four experiments (Part I) focus on general Bayesian reasoning. The nal two experiments (Part II) focus speci fcally on the legal domain. Experiment one compares two leading theories for Bayesian presentation: Macchi's (2000) `nested sets' approach, and Krynski and Tenenbaum's (2007) `causal' approach. It also uses a think aloud protocol, requiring thought-process recording during solution. A nested sets framing effect is found, but no causal framing effect. From the think aloud data, a fi ve-stage solution process (the `nested sets' process), modal among successful individuals, is found. In experiment two, Macchi's approach is tested on a problem with greater ecological validity. An increase in accuracy is still seen. Experiment two also fi nds that conversion of the problem to integers by participants is highly associated with accuracy. Experiment three confi rms the null causal fi nding of experiment one and fi nds that the think aloud protocol itself increases accuracy. Experiment four experimentally tests whether prompting problem conversion to integers, and prompting individuals to follow the nested sets process improve accuracy. No effect is found for conversion, but an effect is found for the nested sets process prompt. Experiment fi ve tested whether statistically untrained individuals can undertake accurate Bayesian reasoning of a legal case including necessary forensic error rates (Fenton et al., 2014). No single individual is found to provide the normative answer. Instead a range of heuristics are found. Building upon this, experiment six compares two approaches to presenting the Bayesian output of a legal case: the popular event tree diagram, and the Bayesian network diagram recommended by (Fenton et al., 2014). Without inclusion of false positives and negatives the event-tree diagram was rated more trust worthy and easy to understand than the Bayesian network diagram. However, including these error types, this pattern reversed.
174	Extremal martingales with applications and a Bayesian approach to model selection Dümbgen, Moritz January 2015 (has links) No description available. 510
175	Bayesian hierarchical models for linear networks Al-Kaabawi, Zainab A. A. January 2018 (has links) A motorway network is handled as a linear network. The purpose of this study is to highlight dangerous motorways via estimating the intensity of accidents and study its pattern across the UK motorway network. Two mechanisms have been adopted to achieve this aim. The first, the motorway-specific intensity is estimated by modelling the point pattern of the accident data using a homogeneous Poisson process. The homogeneous Poisson process is used to model all intensities but heterogeneity across motorways is incorporated using two-level hierarchical models. The data structure is multilevel since each motorway consists of junctions that are joined by grouped segments. In the second mechanism, the segment-specific intensity is estimated by modelling the point pattern of the accident data. The homogeneous Poisson process is used to model accident data within segments but heterogeneity across segments is incorporated using three-level hierarchical models. A Bayesian method via Markov Chain Monte Carlo simulation algorithms is used in order to estimate the unknown parameters in the models and a sensitivity analysis to the prior choice is assessed. The performance of the proposed models is checked through a simulation study and an application to traffic accidents in 2016 on the UK motorway network. The performance of the three-level frequentist model was poor. The deviance information criterion (DIC) and the widely applicable information criterion (WAIC) are employed to choose between the two-level Bayesian hierarchical model and the three-level Bayesian hierarchical model, where the results showed that the best fitting model was the three-level Bayesian hierarchical model.
176	Bayesian criterion-based model selection in structural equation models. / CUHK electronic theses & dissertations collection January 2010 (has links) Structural equation models (SEMs) are commonly used in behavioral, educational, medical, and social sciences. Lots of software, such as EQS, LISREL, MPlus, and WinBUGS, can be used for the analysis of SEMs. Also many methods have been developed to analyze SEMs. One popular method is the Bayesian approach. An important issue in the Bayesian analysis of SEMs is model selection. In the literature, Bayes factor and deviance information criterion (DIC) are commonly used statistics for Bayesian model selection. However, as commented in Chen et al. (2004), Bayes factor relies on posterior model probabilities, in which proper prior distributions are needed. And specifying prior distributions for all models under consideration is usually a challenging task, in particular when the model space is large. In addition, it is well known that Bayes factor and posterior model probability are generally sensitive to the choice of the prior distributions of the parameters. Furthermore the computational burden of Bayes factor is heavy. Alternatively, criterion-based methods are attractive in the sense that they do not require proper prior distributions in general, and the computation is quite simple. One of commonly used criterion-based methods is DIC, which however assumes the posterior mean to be a good estimator. For some models like the mixture SEMs, WinBUGS does not provide the DIC values. Moreover, if the difference in DIC values is small, only reporting the model with the smallest DIC value may be misleading. In this thesis, motivated by the above limitations of the Bayes factor and DIC, a Bayesian model selection criterion called the Lv measure is considered. It is a combination of the posterior predictive variance and bias, and can be viewed as a Bayesian goodness-of-fit statistic. The calibration distribution of the Lv measure, defined as the prior predictive distribution of the difference between the Lv measures of the candidate model and the criterion minimizing model, is discussed to help understanding the Lv measure in detail. The computation of the Lv measure is quite simple, and the performance is satisfactory. Thus, it is an attractive model selection statistic. In this thesis, the application of the Lv measure to various kinds of SEMs will be studied, and some illustrative examples will be conducted to evaluate the performance of the Lv measure for model selection of SEMs. To compare different model selection methods, Bayes factor and DIC will also be computed. Moreover, different prior inputs and sample sizes are considered to check the impact of the prior information and sample size on the performance of the Lv measure. In this thesis, when the performances of two models are similar, the simpler one is selected. / Li, Yunxian. / Adviser: Song Xinyuan. / Source: Dissertation Abstracts International, Volume: 72-04, Section: B, page: . / Thesis (Ph.D.)--Chinese University of Hong Kong, 2010. / Includes bibliographical references (leaves 116-122). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. Ann Arbor, MI : ProQuest Information and Learning Company, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract also in Chinese. Bayesian statistical decision theory Structural equation modeling
177	Bayesian statistical analysis for nonrecursive nonlinear structural equation models. / CUHK electronic theses & dissertations collection January 2007 (has links) Keywords: Bayesian analysis, Finite mixture, Gibbs sampler, Langevin-Hasting sampler, MH sampler, Model comparison, Nonrecursive nonlinear structural equation model, Path sampling. / Structural equation models (SEMs) have been applied extensively to management, marketing, behavioral, and social sciences, etc for studying relationships among manifest and latent variables. Motivated by more complex data structures appeared in various fields, more complicated models have been recently developed. For the developments of SEMs, there is a usual assumption about the regression coefficient of the underlying latent variables. On themselves, more specifically, it is generally assumed that the structural equation modeling is recursive. However, in practice, nonrecursive SEMs are not uncommon. Thus, this fundamental assumption is not always appropriate. / The main objective of this thesis is to relax this assumption by developing some efficient procedures for some complex nonrecursive nonlinear SEMs (NNSEMs). The work in the thesis is based on Bayesian statistical analysis for NNSEMs. The first chapter introduces some background knowledge about NNSEMs. In chapter 2, Bayesian estimates of NNSEMs are given, then some statistical analysis topics such as standard error, model comparison, etc are discussed. In chapter 3, we develop an efficient hybrid MCMC algorithm to obtain Bayesian estimates for NNSEMs with mixed continuous and ordered categorical data. Also, some statistical analysis topics are discussed. In chapter 4, finite mixture NNSEMs are analyzed with the Bayesian approach. The newly developed methodologies are all illustrated with simulation studies and real examples. At last, some conclusion and discussions are included in Chapter 5. / Li, Yong. / "July 2007." / Adviser: Sik-yum Lee. / Source: Dissertation Abstracts International, Volume: 69-01, Section: B, page: 0398. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2007. / Includes bibliographical references (p. 99-111). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstracts in English and Chinese. / School code: 1307. Bayesian statistical decision theory Structural equation modeling
178	Toward a Robust and Universal Crowd Labeling Framework Khattak, Faiza Khan January 2017 (has links) The advent of fast and economical computers with large electronic storage has led to a large volume of data, most of which is unlabeled. While computers provide expeditious, accurate and low-cost computation, they still lag behind in many tasks that require human intelligence such as labeling medical images, videos or text. Consequently, current research focuses on a combination of computer accuracy and human intelligence to complete labeling task. In most cases labeling needs to be done by domain experts, however, because of the variability in expertise, experience, and intelligence of human beings, experts can be scarce. As an alternative to using domain experts, help is sought from non-experts, also known as Crowd, to complete tasks that cannot be readily automated. Since crowd labelers are non-expert, multiple labels per instance are acquired for quality purposes. The final label is obtained by com- bining these multiple labels. It is very common that the ground truth, instance difficulty, and the labeler ability are unknown entities. Therefore, the aggregation task becomes a “chicken and egg” problem to start with. Despite the fact that much research using machine learning and statistical techniques has been conducted in this area (e.g., [Dekel and Shamir, 2009; Hovy et al., 2013a; Liu et al., 2012; Donmez and Carbonell, 2008]), many questions remain unresolved, these include: (a) What are the best ways to evaluate labelers? (b) It is common to use expert-labeled instances (ground truth) to evaluate la- beler ability (e.g., [Le et al., 2010; Khattak and Salleb-Aouissi, 2011; Khattak and Salleb-Aouissi, 2012; Khattak and Salleb-Aouissi, 2013]). The question is, what should be the cardinality of the set of expert-labeled instances to have an accurate evaluation? (c) Which factors other than labeler expertise (e.g., difficulty of instance, prevalence of class, bias of a labeler toward a particular class) can affect the labeling accuracy? (d) Is there any optimal way to combine multiple labels to get the best labeling accuracy? (e) Should the labels provided by oppositional/malicious labelers be dis- carded and blocked? Or is there a way to use the “information” provided by oppositional/malicious labelers? (f) How can labelers and instances be evaluated if the ground truth is not known with certitude? In this thesis, we investigate these questions. We present methods that rely on few expert-labeled instances (usually 0.1% -10% of the dataset) to evaluate various parameters using a frequentist and a Bayesian approach. The estimated parameters are then used for label aggregation to produce one final label per instance. In the first part of this thesis, we propose a method called Expert Label Injected Crowd Esti- mation (ELICE) and extend it to different versions and variants. ELICE is based on a frequentist approach for estimating the underlying parameters. The first version of ELICE estimates the pa- rameters i.e., labeler expertise and data instance difficulty, using the accuracy of crowd labelers on expert-labeled instances [Khattak and Salleb-Aouissi, 2011; Khattak and Salleb-Aouissi, 2012]. The multiple labels for each instance are combined using weighted majority voting. These weights are the scores of labeler reliability on any given instance, which are obtained by inputting the pa- rameters in the logistic function. In the second version of ELICE [Khattak and Salleb-Aouissi, 2013], we introduce entropy as a way to estimate the uncertainty of labeling. This provides an advantage of differentiating between good, random and oppositional/malicious labelers. The aggregation of labels for ELICE version 2 flips the label (for binary classification) provided by the oppositional/malicious labeler thus utilizing the information that is generally discarded by other labeling methodologies. Both versions of ELICE have a cluster-based variant in which rather than making a random choice of instances from the whole dataset, clusters of data are first formed using any clustering approach e.g., K-means. Then an equal number of instances from each cluster are chosen randomly to get expert-labels. This is done to ensure equal representation of each class in the test dataset. Besides taking advantage of expert-labeled instances, the third version of ELICE [Khattak and Salleb-Aouissi, 2016], incorporates pairwise/circular comparison of labelers to labelers and in- stances to instances. The idea here is to improve accuracy by using the crowd labels, which unlike expert-labels, are available for the whole dataset and may provide a more comprehensive view of the labeler ability and instance difficulty. This is especially helpful for the case when the domain experts do not agree on one label and ground truth is not known for certain. Therefore, incorporating more information beyond expert labels can provide better results. We test the performance of ELICE on simulated labels as well as real labels obtained from Amazon Mechanical Turk. Results show that ELICE is effective as compared to state-of-the-art methods. All versions and variants of ELICE are capable of delaying phase transition. The main contribution of ELICE is that it makes the use of all possible information available from crowd and experts. Next, we also present a theoretical framework to estimate the number of expert-labeled instances needed to achieve certain labeling accuracy. Experiments are presented to demonstrate the utility of the theoretical bound. In the second part of this thesis, we present Crowd Labeling Using Bayesian Statistics (CLUBS) [Khattak and Salleb-Aouissi, 2015; Khattak et al., 2016b; Khattak et al., 2016a], a new approach for crowd labeling to estimate labeler and instance parameters along with label aggregation. Our approach is inspired by Item Response Theory (IRT). We introduce new parameters and refine the existing IRT parameters to fit the crowd labeling scenario. The main challenge is that unlike IRT, in the crowd labeling case, the ground truth is not known and has to be estimated based on the parameters. To overcome this challenge, we acquire expert-labels for a small fraction of instances in the dataset. Our model estimates the parameters based on the expert-labeled instances. The estimated parameters are used for weighted aggregation of crowd labels for the rest of the dataset. Experiments conducted on synthetic data and real datasets with heterogeneous quality crowd-labels show that our methods perform better than many state-of-the-art crowd labeling methods. We also conduct significance tests between our methods and other state-of-the-art methods to check the significance of the accuracy of these methods. The results show the superiority of our method in most cases. Moreover, we present experiments to demonstrate the impact of the accuracy of final aggregated labels when used as training data. The results essentially emphasize the need for high accuracy of the aggregated labels. In the last part of the thesis, we present past and contemporary research related to crowd la- beling. We conclude with future of crowd labeling and further research directions. To summarize, in this thesis, we have investigated different methods for estimating crowd labeling parameters and using them for label aggregation. We hope that our contribution will be useful to the crowd labeling community. Computer science Labels Bayesian statistical decision theory
179	Detecting short adjacent repeats in multiple sequences: a Bayesian approach. January 2010 (has links) Li, Qiwei. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2010. / Includes bibliographical references (p. 75-85). / Abstracts in English and Chinese. / Abstract --- p.i / Acknowledgement --- p.iv / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Repetitive DNA Sequence --- p.3 / Chapter 1.1.1 --- Definition and Categorization of Repeti- tive DNA Sequence --- p.3 / Chapter 1.1.2 --- Definition and Categorization of Tandem Repeats --- p.4 / Chapter 1.1.3 --- Definition and Categorization of Interspersed Repeats --- p.6 / Chapter 1.2 --- Research Significance --- p.7 / Chapter 1.3 --- Contributions --- p.9 / Chapter 1.4 --- Thesis Organization --- p.11 / Chapter 2 --- Literature Review and Overview of Our Method --- p.13 / Chapter 2.1 --- Existing Methods --- p.14 / Chapter 2.2 --- Overview of Our Method --- p.17 / Chapter 3 --- Theoretical Background --- p.22 / Chapter 3.1 --- Multinomial Distributions --- p.23 / Chapter 3.2 --- Dirichlet Distribution --- p.23 / Chapter 3.3 --- Metropolis-Hastings Sampling --- p.25 / Chapter 3.4 --- Gibbs Sampling --- p.26 / Chapter 4 --- Problem Description --- p.28 / Chapter 4.1 --- Generative Model --- p.29 / Chapter 4.1.1 --- Input Data R --- p.31 / Chapter 4.1.2 --- Parameters A (Repeat Segment Starting Positions) --- p.32 / Chapter 4.1.3 --- Parameters S (Repeat Segment Structures) --- p.33 / Chapter 4.1.4 --- Parameters θ(Motif Matrix) --- p.35 / Chapter 4.1.5 --- Parameters Φ (Background Distribution) . --- p.36 / Chapter 4.1.6 --- An Example of the Model Schematic Di- agram --- p.37 / Chapter 4.2 --- Parameter Structure --- p.38 / Chapter 4.3 --- Posterior Distribution --- p.40 / Chapter 4.3.1 --- The Full Posterior Distribution --- p.41 / Chapter 4.3.2 --- The Collapsed Posterior Distribution --- p.42 / Chapter 4.4 --- Conclusion --- p.43 / Chapter 5 --- Methodology --- p.45 / Chapter 5.1 --- Schematic Procedure --- p.46 / Chapter 5.1.1 --- The Basic Schematic Procedure --- p.46 / Chapter 5.1.2 --- The Improved Schematic Procedure --- p.47 / Chapter 5.2 --- Initialization --- p.49 / Chapter 5.3 --- Predictive Update Step for θn and Φn --- p.50 / Chapter 5.4 --- Gibbs Sampling Step for an --- p.50 / Chapter 5.5 --- Metropolis-Hastings Sampling Step for sn --- p.51 / Chapter 5.5.1 --- Rear Indel Move --- p.53 / Chapter 5.5.2 --- Partial Shift Move --- p.56 / Chapter 5.5.3 --- Front Indel Move --- p.56 / Chapter 5.6 --- Phase Shifts --- p.57 / Chapter 5.7 --- Conclusion --- p.58 / Chapter 6 --- Results and Discussion --- p.60 / Chapter 6.1 --- Settings --- p.61 / Chapter 6.2 --- Experiment on Synthetic Data --- p.63 / Chapter 6.3 --- Experiment on Real Data --- p.69 / Chapter 7 --- Conclusion and Future Work --- p.72 / Chapter 7.1 --- Conclusion --- p.72 / Chapter 7.2 --- Future Work --- p.74 / Bibliography --- p.75 Sequences (Mathematics) Bayesian statistical decision theory
180	Bayesian inference of point-source waves based on a set of independent noisy detectors / CUHK electronic theses & dissertations collection January 2015 (has links) Waves are everywhere. Biological waves, such as gastric slow waves, and electromagnetic waves, such as TV signals and radio waves, are typical examples that we encounter in everyday life. Many waves are emitted from a point source, whose wavefront can be approximated by a line if the point source is far away. When an experimenter records a propagating wave, the data is subject to noise contamination, posing great diffculty in wave analysis. In this thesis, we consider the situation where at most one wave propagates in a two-dimensional space at any particular time and the detector recordings are noisy. We introduce two parametric generative models for wave propagation and one parametric model for noise generation, and develop a multistage procedure which identifies the number of waves in a given data set, followed by an inference on important variables, including the location of the point source, the velocity of the wave and indicator variables of spikes under the Bayesian paradigm. The procedure is illustrated with two real-life examples. The first one is a study on the effect of potassium ion channels using cultured heart cells. The other is on the propagation characteristics of the Tokohu Tsunami in 2011. / 波是無處不在的。生物波如胃慢波，以及電磁波如電視信號和無線電波，都是我們在日常生活中常遇到的波的典型例子。許多波都是點源，而當波從一個遠的點源發射，其波陣面會近似一條直線。當實驗者記錄波數據時，數據很大機會受到雜訊污染，增加了分析波數據的難度。本文考慮在一個二維空間內，任何特定的時間中，最多只有一個波在傳播，而波數據受到雜訊污染。我們提出了兩個參數模型模擬波的產生和傳播，以及一個參數模型模擬雜訊的產生。我們並建立了一個多階段程序先識別數據中波的數量，然後根據貝葉斯理論，將尖峰訊號分類成波尖峰訊號或雜訊尖峰訊號，以及對波尖峰訊號的重要參數，包括點源的位置和波的速度進行估算。本文提出的方法將應用於兩組真實數據上。第一組是關於細胞鉀離子通道如何影響心肌培養細胞研究，而另一組則分析2011年日本東北海嘯的傳播特性。 / Lau, Yuk Fai. / Thesis M.Phil. Chinese University of Hong Kong 2015. / Includes bibliographical references (leaves 71-74). / Abstracts also in Chinese. / Title from PDF title page (viewed on 18, October, 2016). / Detailed summary in vernacular field only. Bayesian statistical decision theory QA279.5 .L386 2015

Search results