Global ETD Search

651	Toward a Robust and Universal Crowd Labeling Framework Khattak, Faiza Khan January 2017 (has links) The advent of fast and economical computers with large electronic storage has led to a large volume of data, most of which is unlabeled. While computers provide expeditious, accurate and low-cost computation, they still lag behind in many tasks that require human intelligence such as labeling medical images, videos or text. Consequently, current research focuses on a combination of computer accuracy and human intelligence to complete labeling task. In most cases labeling needs to be done by domain experts, however, because of the variability in expertise, experience, and intelligence of human beings, experts can be scarce. As an alternative to using domain experts, help is sought from non-experts, also known as Crowd, to complete tasks that cannot be readily automated. Since crowd labelers are non-expert, multiple labels per instance are acquired for quality purposes. The final label is obtained by com- bining these multiple labels. It is very common that the ground truth, instance difficulty, and the labeler ability are unknown entities. Therefore, the aggregation task becomes a “chicken and egg” problem to start with. Despite the fact that much research using machine learning and statistical techniques has been conducted in this area (e.g., [Dekel and Shamir, 2009; Hovy et al., 2013a; Liu et al., 2012; Donmez and Carbonell, 2008]), many questions remain unresolved, these include: (a) What are the best ways to evaluate labelers? (b) It is common to use expert-labeled instances (ground truth) to evaluate la- beler ability (e.g., [Le et al., 2010; Khattak and Salleb-Aouissi, 2011; Khattak and Salleb-Aouissi, 2012; Khattak and Salleb-Aouissi, 2013]). The question is, what should be the cardinality of the set of expert-labeled instances to have an accurate evaluation? (c) Which factors other than labeler expertise (e.g., difficulty of instance, prevalence of class, bias of a labeler toward a particular class) can affect the labeling accuracy? (d) Is there any optimal way to combine multiple labels to get the best labeling accuracy? (e) Should the labels provided by oppositional/malicious labelers be dis- carded and blocked? Or is there a way to use the “information” provided by oppositional/malicious labelers? (f) How can labelers and instances be evaluated if the ground truth is not known with certitude? In this thesis, we investigate these questions. We present methods that rely on few expert-labeled instances (usually 0.1% -10% of the dataset) to evaluate various parameters using a frequentist and a Bayesian approach. The estimated parameters are then used for label aggregation to produce one final label per instance. In the first part of this thesis, we propose a method called Expert Label Injected Crowd Esti- mation (ELICE) and extend it to different versions and variants. ELICE is based on a frequentist approach for estimating the underlying parameters. The first version of ELICE estimates the pa- rameters i.e., labeler expertise and data instance difficulty, using the accuracy of crowd labelers on expert-labeled instances [Khattak and Salleb-Aouissi, 2011; Khattak and Salleb-Aouissi, 2012]. The multiple labels for each instance are combined using weighted majority voting. These weights are the scores of labeler reliability on any given instance, which are obtained by inputting the pa- rameters in the logistic function. In the second version of ELICE [Khattak and Salleb-Aouissi, 2013], we introduce entropy as a way to estimate the uncertainty of labeling. This provides an advantage of differentiating between good, random and oppositional/malicious labelers. The aggregation of labels for ELICE version 2 flips the label (for binary classification) provided by the oppositional/malicious labeler thus utilizing the information that is generally discarded by other labeling methodologies. Both versions of ELICE have a cluster-based variant in which rather than making a random choice of instances from the whole dataset, clusters of data are first formed using any clustering approach e.g., K-means. Then an equal number of instances from each cluster are chosen randomly to get expert-labels. This is done to ensure equal representation of each class in the test dataset. Besides taking advantage of expert-labeled instances, the third version of ELICE [Khattak and Salleb-Aouissi, 2016], incorporates pairwise/circular comparison of labelers to labelers and in- stances to instances. The idea here is to improve accuracy by using the crowd labels, which unlike expert-labels, are available for the whole dataset and may provide a more comprehensive view of the labeler ability and instance difficulty. This is especially helpful for the case when the domain experts do not agree on one label and ground truth is not known for certain. Therefore, incorporating more information beyond expert labels can provide better results. We test the performance of ELICE on simulated labels as well as real labels obtained from Amazon Mechanical Turk. Results show that ELICE is effective as compared to state-of-the-art methods. All versions and variants of ELICE are capable of delaying phase transition. The main contribution of ELICE is that it makes the use of all possible information available from crowd and experts. Next, we also present a theoretical framework to estimate the number of expert-labeled instances needed to achieve certain labeling accuracy. Experiments are presented to demonstrate the utility of the theoretical bound. In the second part of this thesis, we present Crowd Labeling Using Bayesian Statistics (CLUBS) [Khattak and Salleb-Aouissi, 2015; Khattak et al., 2016b; Khattak et al., 2016a], a new approach for crowd labeling to estimate labeler and instance parameters along with label aggregation. Our approach is inspired by Item Response Theory (IRT). We introduce new parameters and refine the existing IRT parameters to fit the crowd labeling scenario. The main challenge is that unlike IRT, in the crowd labeling case, the ground truth is not known and has to be estimated based on the parameters. To overcome this challenge, we acquire expert-labels for a small fraction of instances in the dataset. Our model estimates the parameters based on the expert-labeled instances. The estimated parameters are used for weighted aggregation of crowd labels for the rest of the dataset. Experiments conducted on synthetic data and real datasets with heterogeneous quality crowd-labels show that our methods perform better than many state-of-the-art crowd labeling methods. We also conduct significance tests between our methods and other state-of-the-art methods to check the significance of the accuracy of these methods. The results show the superiority of our method in most cases. Moreover, we present experiments to demonstrate the impact of the accuracy of final aggregated labels when used as training data. The results essentially emphasize the need for high accuracy of the aggregated labels. In the last part of the thesis, we present past and contemporary research related to crowd la- beling. We conclude with future of crowd labeling and further research directions. To summarize, in this thesis, we have investigated different methods for estimating crowd labeling parameters and using them for label aggregation. We hope that our contribution will be useful to the crowd labeling community. Computer science Labels Bayesian statistical decision theory
652	Gravitation and phase transitions in the early universe Krauss, Lawrence Maxwell January 1982 (has links) Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Physics, 1982. / MICROFICHE COPY AVAILABLE IN ARCHIVES AND SCIENCE / Vita. / Includes bibliographical references. / by Lawrence Maxwell Krauss. / Ph.D. Physics. Cosmology Gravitation
653	Application of statistical methods to problems in epidemiological research Ho, Lai Ping 01 January 2003 (has links) No description available. China Epidemiology Hong Kong Research Statistical methods
654	Scale Setting and Topological Observables in Pure SU(2) LGT Clarke, David A. 31 January 2019 (has links) <p> In this dissertation, we investigate the approach of pure SU(2) lattice gauge theory to its continuum limit using the deconfinement temperature, six gradient scales, and six cooling scales. We find that cooling scales exhibit similarly good scaling behavior as gradient scales, while being computationally more efficient. In addition, we estimate systematic error in continuum limit extrapolations of scale ratios by comparing standard scaling to asymptotic scaling. Finally we study topological observables in pure SU(2) using cooling to smooth the gauge fields, and investigate the sensitivity of cooling scales to topological charge. We find that large numbers of cooling sweeps lead to metastable charge sectors, without destroying physical instantons, provided the lattice spacing is fine enough and the volume is large enough. Continuum limit estimates of the topological susceptibility are obtained, of which we favor χ<sup>1/4</sup>/<i>T<sub>c</sub></i> = 0.643(12). Differences between cooling scales in different topological sectors turn out to be too small to be detectable within our statistical error.</p><p>
655	Detecting short adjacent repeats in multiple sequences: a Bayesian approach. January 2010 (has links) Li, Qiwei. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2010. / Includes bibliographical references (p. 75-85). / Abstracts in English and Chinese. / Abstract --- p.i / Acknowledgement --- p.iv / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Repetitive DNA Sequence --- p.3 / Chapter 1.1.1 --- Definition and Categorization of Repeti- tive DNA Sequence --- p.3 / Chapter 1.1.2 --- Definition and Categorization of Tandem Repeats --- p.4 / Chapter 1.1.3 --- Definition and Categorization of Interspersed Repeats --- p.6 / Chapter 1.2 --- Research Significance --- p.7 / Chapter 1.3 --- Contributions --- p.9 / Chapter 1.4 --- Thesis Organization --- p.11 / Chapter 2 --- Literature Review and Overview of Our Method --- p.13 / Chapter 2.1 --- Existing Methods --- p.14 / Chapter 2.2 --- Overview of Our Method --- p.17 / Chapter 3 --- Theoretical Background --- p.22 / Chapter 3.1 --- Multinomial Distributions --- p.23 / Chapter 3.2 --- Dirichlet Distribution --- p.23 / Chapter 3.3 --- Metropolis-Hastings Sampling --- p.25 / Chapter 3.4 --- Gibbs Sampling --- p.26 / Chapter 4 --- Problem Description --- p.28 / Chapter 4.1 --- Generative Model --- p.29 / Chapter 4.1.1 --- Input Data R --- p.31 / Chapter 4.1.2 --- Parameters A (Repeat Segment Starting Positions) --- p.32 / Chapter 4.1.3 --- Parameters S (Repeat Segment Structures) --- p.33 / Chapter 4.1.4 --- Parameters θ(Motif Matrix) --- p.35 / Chapter 4.1.5 --- Parameters Φ (Background Distribution) . --- p.36 / Chapter 4.1.6 --- An Example of the Model Schematic Di- agram --- p.37 / Chapter 4.2 --- Parameter Structure --- p.38 / Chapter 4.3 --- Posterior Distribution --- p.40 / Chapter 4.3.1 --- The Full Posterior Distribution --- p.41 / Chapter 4.3.2 --- The Collapsed Posterior Distribution --- p.42 / Chapter 4.4 --- Conclusion --- p.43 / Chapter 5 --- Methodology --- p.45 / Chapter 5.1 --- Schematic Procedure --- p.46 / Chapter 5.1.1 --- The Basic Schematic Procedure --- p.46 / Chapter 5.1.2 --- The Improved Schematic Procedure --- p.47 / Chapter 5.2 --- Initialization --- p.49 / Chapter 5.3 --- Predictive Update Step for θn and Φn --- p.50 / Chapter 5.4 --- Gibbs Sampling Step for an --- p.50 / Chapter 5.5 --- Metropolis-Hastings Sampling Step for sn --- p.51 / Chapter 5.5.1 --- Rear Indel Move --- p.53 / Chapter 5.5.2 --- Partial Shift Move --- p.56 / Chapter 5.5.3 --- Front Indel Move --- p.56 / Chapter 5.6 --- Phase Shifts --- p.57 / Chapter 5.7 --- Conclusion --- p.58 / Chapter 6 --- Results and Discussion --- p.60 / Chapter 6.1 --- Settings --- p.61 / Chapter 6.2 --- Experiment on Synthetic Data --- p.63 / Chapter 6.3 --- Experiment on Real Data --- p.69 / Chapter 7 --- Conclusion and Future Work --- p.72 / Chapter 7.1 --- Conclusion --- p.72 / Chapter 7.2 --- Future Work --- p.74 / Bibliography --- p.75 Sequences (Mathematics) Bayesian statistical decision theory
656	Bayesian inference of point-source waves based on a set of independent noisy detectors / CUHK electronic theses & dissertations collection January 2015 (has links) Waves are everywhere. Biological waves, such as gastric slow waves, and electromagnetic waves, such as TV signals and radio waves, are typical examples that we encounter in everyday life. Many waves are emitted from a point source, whose wavefront can be approximated by a line if the point source is far away. When an experimenter records a propagating wave, the data is subject to noise contamination, posing great diffculty in wave analysis. In this thesis, we consider the situation where at most one wave propagates in a two-dimensional space at any particular time and the detector recordings are noisy. We introduce two parametric generative models for wave propagation and one parametric model for noise generation, and develop a multistage procedure which identifies the number of waves in a given data set, followed by an inference on important variables, including the location of the point source, the velocity of the wave and indicator variables of spikes under the Bayesian paradigm. The procedure is illustrated with two real-life examples. The first one is a study on the effect of potassium ion channels using cultured heart cells. The other is on the propagation characteristics of the Tokohu Tsunami in 2011. / 波是無處不在的。生物波如胃慢波，以及電磁波如電視信號和無線電波，都是我們在日常生活中常遇到的波的典型例子。許多波都是點源，而當波從一個遠的點源發射，其波陣面會近似一條直線。當實驗者記錄波數據時，數據很大機會受到雜訊污染，增加了分析波數據的難度。本文考慮在一個二維空間內，任何特定的時間中，最多只有一個波在傳播，而波數據受到雜訊污染。我們提出了兩個參數模型模擬波的產生和傳播，以及一個參數模型模擬雜訊的產生。我們並建立了一個多階段程序先識別數據中波的數量，然後根據貝葉斯理論，將尖峰訊號分類成波尖峰訊號或雜訊尖峰訊號，以及對波尖峰訊號的重要參數，包括點源的位置和波的速度進行估算。本文提出的方法將應用於兩組真實數據上。第一組是關於細胞鉀離子通道如何影響心肌培養細胞研究，而另一組則分析2011年日本東北海嘯的傳播特性。 / Lau, Yuk Fai. / Thesis M.Phil. Chinese University of Hong Kong 2015. / Includes bibliographical references (leaves 71-74). / Abstracts also in Chinese. / Title from PDF title page (viewed on 18, October, 2016). / Detailed summary in vernacular field only. Bayesian statistical decision theory QA279.5 .L386 2015
657	Properties of the maximum likelihood and Bayesian estimators of availability Kuo, Way January 2011 (has links) Typescript (photocopy). / Digitized by Kansas Correctional Industries Probabilities Bayesian statistical decision theory Statistics
658	Sobre a Equivalência entre Ferromagnetos com Campos Aleatórios e Antiferromagnetos Diluídos / On the equivalence between ferromagnets with random fields and diluted antiferromagnetos. Baeta Segundo, José Augusto 24 September 1990 (has links) Usando um método proposto por van Hemmen nós computamos a energia livre da versão Curie-Weiss do modelo de Ising antiferromagnético com diluição de sítio na presença de um campo magnético uniforme. A solução apresenta uma correspondência exata entre as termodinâmicas deste modelo de Ising ferromagnético na presença de um campo magnético aleatório. Os diagramas de fase são discutidos e mostra-se a existência de um ponto tricrítico. Apresentamos também uma derivação alternativa dos resultados de van Hemmen a qual permite uma comparação com os métodos usuais de campo médio contidos na literatura. A solução obtida a partir da transformação de Hubbard-Stratorovich permite o cálculo das flutuações do parâmetro de ordem dos respectivos modelos e a constatação de uma equivalência também a este nível, em particular com a igualdade dos expoentes críticas relevantes. / Using a method proposed by van Hemmen we compute the free energy of the Curie-Wiess version of the site-dilute antiferromagnetic Ising model in the presence of a uniform magnetic field. The solution displays an exact thermodynamic correspondence between this model and the Curie-Weiss version of the Ising model in the presence of a random magnetic field. The phase diagrams are discussed and a tricritical point is shown to exist. We present also an alternative derivation of van Hemmens results which allows an easy comparison with the usual mean-field methods used in the literature. The solution obtained via Hubbard-Stratorovich transformation allows the computation of the fluctuations of the order parameter in both models as to display their equivalence with equality of the relevant critical exponents. Física Mecânica estatística Physics Statistical mechanics
659	Modeling Subset Behavior: Prescriptive Analytics for Professional Basketball Data Bynum, Lucius 01 January 2018 (has links) Sports analytics problems have become increasingly prominent in the past decade. Modern image processing capabilities allow coaching staff to easily capture detailed game-time statistics on their players, opponents, team configurations, and plays. The challenge is to turn that data into meaningful insights for team managers and coaches. This project uses descriptive and predictive techniques on publicly available NBA basketball data to identify powerful combinations of players and predict how they will perform against other teams. Applied Statistics Other Applied Mathematics Statistical Models
660	A Statistical Analysis of Apprentice Program Dropouts and Completers in Utah: 1969-1974 Randle, Mark Douglas 01 May 1975 (has links) The purpose of this study is to examine a sample of former Utah apprentices who either completed or dropped out of a registered apprenticeship program during the five-year period from 1969 to 1974. Comparisons were made between the dropouts and completers in order to determine how the two groups differed and what factors influenced their decisions to complete or cancel their indentures. Significant differences were found between the two groups with respect to their opinions of the training they received as apprentices. Especially significant differences were seen between the dropouts; and completers' responses to the questions related to their on-the-job training. The study concludes with a discussion of the implications of the findings for the future course of action to be pursued by apprenticeship labor officials in the state. statistical analysis apprentice program dropout complete utah

Search results