• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 20
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 32
  • 32
  • 32
  • 11
  • 11
  • 10
  • 8
  • 6
  • 5
  • 5
  • 5
  • 5
  • 5
  • 4
  • 4
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Comparing Approaches to Initializing the Expectation-Maximization Algorithm

Dicintio, Sabrina 09 October 2012 (has links)
The expectation-maximization (EM) algorithm is a widely utilized approach to max- imum likelihood estimation in the presence of missing data, this thesis focuses on its application within the model-based clustering framework. The performance of the EM algorithm can be highly dependent on how the algorithm is initialized. Several ways of initializing the EM algorithm have been proposed, however, the best method to use for initialization remains a somewhat controversial topic. From an attempt to obtain a superior method of initializing the EM algorithm, comes the concept of using multiple existing methods together in what will be called a `voting' procedure. This procedure will use several common initialization methods to cluster the data, then a nal starting ^zig matrix will be obtained in two ways. The hard `voting' method follows a majority rule, whereas the soft `voting' method takes an average of the multiple group memberships. The nal ^zig matrix obtained from both methods will dictate the starting values of ^ g; ^ g; and ^ g used to initialize the EM algorithm.
2

Segmentation of the Brain from MR Images

Caesar, Jenny January 2005 (has links)
<p>KTH, Division of Neuronic Engineering, have a finite element model of the head. However, this model does not contain detailed modeling of the brain. This thesis project consists of finding a method to extract brain tissues from T1-weighted MR images of the head. The method should be automatic to be suitable for patient individual modeling.</p><p>A summary of the most common segmentation methods is presented and one of the methods is implemented. The implemented method is based on the assumption that the probability density function (pdf) of an MR image can be described by parametric models. The intensity distribution of each tissue class is modeled as a Gaussian distribution. Thus, the total pdf is a sum of Gaussians. However, the voxel values are also influenced by intensity inhomogeneities, which affect the pdf. The implemented method is based on the expectation-maximization algorithm and it corrects for intensity inhomogeneities. The result from the algorithm is a classification of the voxels. The brain is extracted from the classified voxels using morphological operations.</p>
3

Statistical Learning in Drug Discovery via Clustering and Mixtures

Wang, Xu January 2007 (has links)
In drug discovery, thousands of compounds are assayed to detect activity against a biological target. The goal of drug discovery is to identify compounds that are active against the target (e.g. inhibit a virus). Statistical learning in drug discovery seeks to build a model that uses descriptors characterizing molecular structure to predict biological activity. However, the characteristics of drug discovery data can make it difficult to model the relationship between molecular descriptors and biological activity. Among these characteristics are the rarity of active compounds, the large volume of compounds tested by high-throughput screening, and the complexity of molecular structure and its relationship to activity. This thesis focuses on the design of statistical learning algorithms/models and their applications to drug discovery. The two main parts of the thesis are: an algorithm-based statistical method and a more formal model-based approach. Both approaches can facilitate and accelerate the process of developing new drugs. A unifying theme is the use of unsupervised methods as components of supervised learning algorithms/models. In the first part of the thesis, we explore a sequential screening approach, Cluster Structure-Activity Relationship Analysis (CSARA). Sequential screening integrates High Throughput Screening with mathematical modeling to sequentially select the best compounds. CSARA is a cluster-based and algorithm driven method. To gain further insight into this method, we use three carefully designed experiments to compare predictive accuracy with Recursive Partitioning, a popular structureactivity relationship analysis method. The experiments show that CSARA outperforms Recursive Partitioning. Comparisons include problems with many descriptor sets and situations in which many descriptors are not important for activity. In the second part of the thesis, we propose and develop constrained mixture discriminant analysis (CMDA), a model-based method. The main idea of CMDA is to model the distribution of the observations given the class label (e.g. active or inactive class) as a constrained mixture distribution, and then use Bayes’ rule to predict the probability of being active for each observation in the testing set. Constraints are used to deal with the otherwise explosive growth of the number of parameters with increasing dimensionality. CMDA is designed to solve several challenges in modeling drug data sets, such as multiple mechanisms, the rare target problem (i.e. imbalanced classes), and the identification of relevant subspaces of descriptors (i.e. variable selection). We focus on the CMDA1 model, in which univariate densities form the building blocks of the mixture components. Due to the unboundedness of the CMDA1 log likelihood function, it is easy for the EM algorithm to converge to degenerate solutions. A special Multi-Step EM algorithm is therefore developed and explored via several experimental comparisons. Using the multi-step EM algorithm, the CMDA1 model is compared to model-based clustering discriminant analysis (MclustDA). The CMDA1 model is either superior to or competitive with the MclustDA model, depending on which model generates the data. The CMDA1 model has better performance than the MclustDA model when the data are high-dimensional and unbalanced, an essential feature of the drug discovery problem! An alternate approach to the problem of degeneracy is penalized estimation. By introducing a group of simple penalty functions, we consider penalized maximum likelihood estimation of the CMDA1 and CMDA2 models. This strategy improves the convergence of the conventional EM algorithm, and helps avoid degenerate solutions. Extending techniques from Chen et al. (2007), we prove that the PMLE’s of the two-dimensional CMDA1 model can be asymptotically consistent.
4

Segmentation of the Brain from MR Images

Caesar, Jenny January 2005 (has links)
KTH, Division of Neuronic Engineering, have a finite element model of the head. However, this model does not contain detailed modeling of the brain. This thesis project consists of finding a method to extract brain tissues from T1-weighted MR images of the head. The method should be automatic to be suitable for patient individual modeling. A summary of the most common segmentation methods is presented and one of the methods is implemented. The implemented method is based on the assumption that the probability density function (pdf) of an MR image can be described by parametric models. The intensity distribution of each tissue class is modeled as a Gaussian distribution. Thus, the total pdf is a sum of Gaussians. However, the voxel values are also influenced by intensity inhomogeneities, which affect the pdf. The implemented method is based on the expectation-maximization algorithm and it corrects for intensity inhomogeneities. The result from the algorithm is a classification of the voxels. The brain is extracted from the classified voxels using morphological operations.
5

Statistical Learning in Drug Discovery via Clustering and Mixtures

Wang, Xu January 2007 (has links)
In drug discovery, thousands of compounds are assayed to detect activity against a biological target. The goal of drug discovery is to identify compounds that are active against the target (e.g. inhibit a virus). Statistical learning in drug discovery seeks to build a model that uses descriptors characterizing molecular structure to predict biological activity. However, the characteristics of drug discovery data can make it difficult to model the relationship between molecular descriptors and biological activity. Among these characteristics are the rarity of active compounds, the large volume of compounds tested by high-throughput screening, and the complexity of molecular structure and its relationship to activity. This thesis focuses on the design of statistical learning algorithms/models and their applications to drug discovery. The two main parts of the thesis are: an algorithm-based statistical method and a more formal model-based approach. Both approaches can facilitate and accelerate the process of developing new drugs. A unifying theme is the use of unsupervised methods as components of supervised learning algorithms/models. In the first part of the thesis, we explore a sequential screening approach, Cluster Structure-Activity Relationship Analysis (CSARA). Sequential screening integrates High Throughput Screening with mathematical modeling to sequentially select the best compounds. CSARA is a cluster-based and algorithm driven method. To gain further insight into this method, we use three carefully designed experiments to compare predictive accuracy with Recursive Partitioning, a popular structureactivity relationship analysis method. The experiments show that CSARA outperforms Recursive Partitioning. Comparisons include problems with many descriptor sets and situations in which many descriptors are not important for activity. In the second part of the thesis, we propose and develop constrained mixture discriminant analysis (CMDA), a model-based method. The main idea of CMDA is to model the distribution of the observations given the class label (e.g. active or inactive class) as a constrained mixture distribution, and then use Bayes’ rule to predict the probability of being active for each observation in the testing set. Constraints are used to deal with the otherwise explosive growth of the number of parameters with increasing dimensionality. CMDA is designed to solve several challenges in modeling drug data sets, such as multiple mechanisms, the rare target problem (i.e. imbalanced classes), and the identification of relevant subspaces of descriptors (i.e. variable selection). We focus on the CMDA1 model, in which univariate densities form the building blocks of the mixture components. Due to the unboundedness of the CMDA1 log likelihood function, it is easy for the EM algorithm to converge to degenerate solutions. A special Multi-Step EM algorithm is therefore developed and explored via several experimental comparisons. Using the multi-step EM algorithm, the CMDA1 model is compared to model-based clustering discriminant analysis (MclustDA). The CMDA1 model is either superior to or competitive with the MclustDA model, depending on which model generates the data. The CMDA1 model has better performance than the MclustDA model when the data are high-dimensional and unbalanced, an essential feature of the drug discovery problem! An alternate approach to the problem of degeneracy is penalized estimation. By introducing a group of simple penalty functions, we consider penalized maximum likelihood estimation of the CMDA1 and CMDA2 models. This strategy improves the convergence of the conventional EM algorithm, and helps avoid degenerate solutions. Extending techniques from Chen et al. (2007), we prove that the PMLE’s of the two-dimensional CMDA1 model can be asymptotically consistent.
6

EM-Based Joint Detection and Estimation for Two-Way Relay Network

Yen, Kai-wei 01 August 2012 (has links)
In this paper, the channel estimation problem for a two-way relay network (TWRN) based on two different wireless channel assumptions is considered. Previous works have proposed a training-based channel estimation method to obtain the channel state information (CSI). But in practice the channel change from one data block to another, which may cause the performance degradation due to the outdated CSI. To enhance the performance, the system has to insert more training signal. In order to improve the bandwidth efficiency, we propose a joint channel estimation and data detection method based on expectation-maximization (EM) algorithm. From the simulation results, the proposed method can combat the effect of fading channel and still the MSE results are very close to Cramer-Rao Lower Bound (CRLB) at the high signal-to-noise ratio (SNR) region. Additionally, as compare with the previous work, the proposed scheme also has a better detection performance for both time-varying and time-invariant channels.
7

Analysis of circular data in the dynamic model and mixture of von Mises distributions

Lan, Tian, active 2013 10 December 2013 (has links)
Analysis of circular data becomes more and more popular in many fields of studies. In this report, I present two statistical analysis of circular data using von Mises distributions. Firstly, the maximization-expectation algorithm is reviewed and used to classify and estimate circular data from the mixture of von Mises distributions. Secondly, Forward Filtering Backward Smoothing method via particle filtering is reviewed and implemented when circular data appears in the dynamic state-space models. / text
8

Towards Finding Optimal Mixture Of Subspaces For Data Classification

Musa, Mohamed Elhafiz Mustafa 01 October 2003 (has links) (PDF)
In pattern recognition, when data has different structures in different parts of the input space, fitting one global model can be slow and inaccurate. Learning methods can quickly learn the structure of the data in local regions, consequently, offering faster and more accurate model fitting. Breaking training data set into smaller subsets may lead to curse of dimensionality problem, as a training sample subset may not be enough for estimating the required set of parameters for the submodels. Increasing the size of training data may not be at hand in many situations. Interestingly, the data in local regions becomes more correlated. Therefore, by decorrelation methods we can reduce data dimensions and hence the number of parameters. In other words, we can find uncorrelated low dimensional subspaces that capture most of the data variability. The current subspace modelling methods have proved better performance than the global modelling methods for the given type of training data structure. Nevertheless these methods still need more research work as they are suffering from two limitations 2 There is no standard method to specify the optimal number of subspaces. &sup2 / There is no standard method to specify the optimal dimensionality for each subspace. In the current models these two parameters are determined beforehand. In this dissertation we propose and test algorithms that try to find a suboptimal number of principal subspaces and a suboptimal dimensionality for each principal subspaces automatically.
9

Gaussian copula modelling for integer-valued time series

Lennon, Hannah January 2016 (has links)
This thesis is concerned with the modelling of integer-valued time series. The data naturally occurs in various areas whenever a number of events are observed over time. The model considered in this study consists of a Gaussian copula with autoregressive-moving average (ARMA) dependence and discrete margins that can be specified, unspecified, with or without covariates. It can be interpreted as a 'digitised' ARMA model. An ARMA model is used for the latent process so that well-established methods in time series analysis can be used. Still the computation of the log-likelihood poses many problems because it is the sum of 2^N terms involving the Gaussian cumulative distribution function when N is the length of the time series. We consider an Monte Carlo Expectation-Maximisation (MCEM) algorithm for the maximum likelihood estimation of the model which works well for small to moderate N. Then an Approximate Bayesian Computation (ABC) method is developed to take advantage of the fact that data can be simulated easily from an ARMA model and digitised. A spectral comparison method is used in the rejection-acceptance step. This is shown to work well for large N. Finally we write the model in an R-vine copula representation and use a sequential algorithm for the computation of the log-likelihood. We evaluate the score and Hessian of the log-likelihood and give analytic solutions for the standard errors. The proposed methodologies are illustrated using simulation studies and highlight the advantages of incorporating classic ideas from time series analysis into modern methods of model fitting. For illustration we compare the three methods on US polio incidence data (Zeger, 1988) and we discuss their relative merits.
10

Hawkes Process Models for Unsupervised Learning on Uncertain Event Data

Haghdan, Maysam January 2017 (has links)
No description available.

Page generated in 0.1612 seconds