Global ETD Search

11	Approaches to Estimation of Haplotype Frequencies and Haplotype-trait Associations Li, Xiaohong 01 February 2009 (has links) Characterizing the genetic contributors to complex disease traits will inevitably require consideration of haplotypic phase, the specific alignment of alleles on a single homologous chromosome. In population based studies, however, phase is generally unobservable as standard genotyping techniques provide investigators only with data on unphased genotypes. Several statistical methods have been described for estimating haplotype frequencies and their association with a trait in the context of phase ambiguity. These methods are limited, however, to diploid populations in which individuals have exactly two homologous chromosomes each and are thus not suitable for more general infectious disease settings. Specifically, in the context of Malaria and HIV, the number of infections is also unknown. In addition, for both diploid and non-diploid settings, the challenge of high-dimensionality and an unknown model of association remains. Our research includes: (1) extending the expectation-maximization approach of Excoffier and Slatkin to address the challenges of unobservable phase and the unknown numbers of infections; (2) extending the method of Lake et al. to estimate simultaneously both haplotype frequencies and the haplotype-trait associations in the non-diploid settings; and (3) application of two Bayesian approaches to the mixed modeling framework with unobservable cluster (haploype) identifiers, to address the challenges associated with high-dimensional data. Simulation studies are presented as well as applications to data arising from a cohort of children multiply infected with Malaria and a cohort of HIV infected individuals at risk for anti-retroviral associated dyslipidemia. This research is joint work with Drs. S.M. Rich, R.M. Yucel, J. Staudenmayer and A.S. Foulkes. Bayesian Expectation maximization Genotypes Haplotypes Public Health
12	Comparing Approaches to Initializing the Expectation-Maximization Algorithm Dicintio, Sabrina 09 October 2012 (has links) The expectation-maximization (EM) algorithm is a widely utilized approach to max- imum likelihood estimation in the presence of missing data, this thesis focuses on its application within the model-based clustering framework. The performance of the EM algorithm can be highly dependent on how the algorithm is initialized. Several ways of initializing the EM algorithm have been proposed, however, the best method to use for initialization remains a somewhat controversial topic. From an attempt to obtain a superior method of initializing the EM algorithm, comes the concept of using multiple existing methods together in what will be called a `voting' procedure. This procedure will use several common initialization methods to cluster the data, then a nal starting ^zig matrix will be obtained in two ways. The hard `voting' method follows a majority rule, whereas the soft `voting' method takes an average of the multiple group memberships. The nal ^zig matrix obtained from both methods will dictate the starting values of ^ g; ^ g; and ^ g used to initialize the EM algorithm. initializing the EM algorithm the expectation maximization algorithm
13	A new normalized EM algorithm for clustering gene expression data Nguyen, Phuong Minh, Electrical Engineering & Telecommunications, Faculty of Engineering, UNSW January 2008 (has links) Microarray data clustering represents a basic exploratory tool to find groups of genes exhibiting similar expression patterns or to detect relevant classes of molecular subtypes. Among a wide range of clustering approaches proposed and applied in the gene expression community to analyze microarray data, mixture model-based clustering has received much attention to its sound statistical framework and its flexibility in data modeling. However, clustering algorithms following the model-based framework suffer from two serious drawbacks. The first drawback is that the performance of these algorithms critically depends on the starting values for their iterative clustering procedures. Additionally, they are not capable of working directly with very high dimensional data sets in the sample clustering problem where the dimension of the data is up to hundreds or thousands. The thesis focuses on the two challenges and includes the following contributions: First, the thesis introduces the statistical model of our proposed normalized Expectation Maximization (EM) algorithm followed by its clustering performance analysis on a number of real microarray data sets. The normalized EM is stable even with random initializations for its EM iterative procedure. The stability of the normalized EM is demonstrated through its performance comparison with other related clustering algorithms. Furthermore, the normalized EM is the first mixture model-based clustering approach to be capable of working directly with very high dimensional microarray data sets in the sample clustering problem, where the number of genes is much larger than the number of samples. This advantage of the normalized EM is illustrated through the comparison with the unnormalized EM (The conventional EM algorithm for Gaussian mixture model-based clustering). Besides, for experimental microarray data sets with the availability of class labels of data points, an interesting property of the convergence speed of the normalized EM with respect to the radius of the hypersphere in its corresponding statistical model is uncovered. Second, to support the performance comparison of different clusterings a new internal index is derived using fundamental concepts from information theory. This index allows the comparison of clustering approaches in which the closeness between data points is evaluated by their cosine similarity. The method for deriving this internal index can be utilized to design other new indexes for comparing clustering approaches which employ a common similarity measure. Clustering. Expectation Maximization (EM) algorithm. Microarray data.
14	Image Thresholding Technique Based On Fuzzy Partition And Entropy Maximization Zhao, Mansuo January 2005 (has links) Thresholding is a commonly used technique in image segmentation because of its fast and easy application. For this reason threshold selection is an important issue. There are two general approaches to threshold selection. One approach is based on the histogram of the image while the other is based on the gray scale information located in the local small areas. The histogram of an image contains some statistical data of the grayscale or color ingredients. In this thesis, an adaptive logical thresholding method is proposed for the binarization of blueprint images first. The new method exploits the geometric features of blueprint images. This is implemented by utilizing a robust windows operation, which is based on the assumption that the objects have &quote;C&quote; shape in a small area. We make use of multiple window sizes in the windows operation. This not only reduces computation time but also separates effectively thin lines from wide lines. Our method can automatically determine the threshold of images. Experiments show that our method is effective for blueprint images and achieves good results over a wide range of images. Second, the fuzzy set theory, along with probability partition and maximum entropy theory, is explored to compute the threshold based on the histogram of the image. Fuzzy set theory has been widely used in many fields where the ambiguous phenomena exist since it was proposed by Zadeh in 1965. And many thresholding methods have also been developed by using this theory. The concept we are using here is called fuzzy partition. Fuzzy partition means that a histogram is parted into several groups by some fuzzy sets which represent the fuzzy membership of each group because our method is based on histogram of the image . Probability partition is associated with fuzzy partition. The probability distribution of each group is derived from the fuzzy partition. Entropy which originates from thermodynamic theory is introduced into communications theory as a commonly used criteria to measure the information transmitted through a channel. It is adopted by image processing as a measurement of the information contained in the processed images. Thus it is applied in our method as a criterion for selecting the optimal fuzzy sets which partition the histogram. To find the threshold, the histogram of the image is partitioned by fuzzy sets which satisfy a certain entropy restriction. The search for the best possible fuzzy sets becomes an important issue. There is no efficient method for the searching procedure. Therefore, expansion to multiple level thresholding with fuzzy partition becomes extremely time consuming or even impossible. In this thesis, the relationship between a probability partition (PP) and a fuzzy C-partition (FP) is studied. This relationship and the entropy approach are used to derive a thresholding technique to select the optimal fuzzy C-partition. The measure of the selection quality is the entropy function defined by the PP and FP. A necessary condition of the entropy function arriving at a maximum is derived. Based on this condition, an efficient search procedure for two-level thresholding is derived, which makes the search so efficient that extension to multilevel thresholding becomes possible. A novel fuzzy membership function is proposed in three-level thresholding which produces a better result because a new relationship among the fuzzy membership functions is presented. This new relationship gives more flexibility in the search for the optimal fuzzy sets, although it also increases the complication in the search for the fuzzy sets in multi-level thresholding. This complication is solved by a new method called the &quote;Onion-Peeling&quote; method. Because the relationship between the fuzzy membership functions is so complicated it is impossible to obtain the membership functions all at once. The search procedure is decomposed into several layers of three-level partitions except for the last layer which may be a two-level one. So the big problem is simplified to three-level partitions such that we can obtain the two outmost membership functions without worrying too much about the complicated intersections among the membership functions. The method is further revised for images with a dominant area of background or an object which affects the appearance of the histogram of the image. The histogram is the basis of our method as well as of many other methods. A &quote;bad&quote; shape of the histogram will result in a bad thresholded image. A quadtree scheme is adopted to decompose the image into homogeneous areas and heterogeneous areas. And a multi-resolution thresholding method based on quadtree and fuzzy partition is then devised to deal with these images. Extension of fuzzy partition methods to color images is also examined. An adaptive thresholding method for color images based on fuzzy partition is proposed which can determine the number of thresholding levels automatically. This thesis concludes that the &quote;C&quote; shape assumption and varying sizes of windows for windows operation contribute to a better segmentation of the blueprint images. The efficient search procedure for the optimal fuzzy sets in the fuzzy-2 partition of the histogram of the image accelerates the process so much that it enables the extension of it to multilevel thresholding. In three-level fuzzy partition the new relationship presentation among the three fuzzy membership functions makes more sense than the conventional assumption and, as a result, performs better. A novel method, the &quote;Onion-Peeling&quote; method, is devised for dealing with the complexity at the intersection among the multiple membership functions in the multilevel fuzzy partition. It decomposes the multilevel partition into the fuzzy-3 partitions and the fuzzy-2 partitions by transposing the partition space in the histogram. Thus it is efficient in multilevel thresholding. A multi-resolution method which applies the quadtree scheme to distinguish the heterogeneous areas from the homogeneous areas is designed for the images with large homogeneous areas which usually distorts the histogram of the image. The new histogram based on only the heterogeneous area is adopted for partition and outperforms the old one. While validity checks filter out the fragmented points which are only a small portion of the whole image. Thus it gives good thresholded images for human face images.
15	Inferential methods for censored bivariate normal data Kim, Jeong-Ae. Balakrishnan, N., January 1900 (has links) Thesis (Ph.D.)--McMaster University, 2004. / Supervisor: N. Balakrishnan. Includes bibliographical references (p. 186-191).
16	Improved iterative schemes for REML estimation of variance parameters in linear mixed models Knight, Emma Jane. January 2008 (has links) Thesis (Ph.D.) -- University of Adelaide, School of Agriculture, Food and Wine, Discipline of Biometrics SA, 2008. / "October 2008" Includes bibliography (p. 283-290) Also available in print form.
17	Statistical models for catch-at-length data with birth cohort information Chung, Sai-ho. January 2005 (has links) Thesis (Ph. D.)--University of Hong Kong, 2006. / Also available in print.
18	Sequence comparison and stochastic model based on multiorder Markov models Fang, Xiang. January 2009 (has links) Thesis (Ph.D.)--University of Nebraska-Lincoln, 2009. / Title from title screen (site viewed February 25, 2010). PDF text: ii, 93 p. : ill. ; 1 Mb. UMI publication number: AAT 3386580. Includes bibliographical references. Also available in microfilm and microfiche formats.
19	Distributed estimation in resource-constrained wireless sensor networks Li, Junlin. January 2008 (has links) Thesis (Ph.D)--Electrical and Computer Engineering, Georgia Institute of Technology, 2009. / Committee Chair: Ghassan AlRegib; Committee Member: Elliot Moore; Committee Member: Monson H. Hayes; Committee Member: Paul A. Work; Committee Member: Ying Zhang. Part of the SMARTech Electronic Thesis and Dissertation Collection.
20	Calibration of multivariate generalized hyperbolic distributions using the EM algorithm, with applications in risk management, portfolio optimization and portfolio credit risk Hu, Wenbo. Kercheval, Alec. January 2005 (has links) Thesis (Ph. D.)--Florida State University, 2005. / Advisor: Alec Kercheval, Florida State University, College of Arts and Sciences, Dept. of Mathemematics. Title and description from dissertation home page (viewed Jan. 26, 2006). Document formatted into pages; contains xii, 103 pages. Includes bibliographical references.

Search results