• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1
  • 1
  • Tagged with
  • 4
  • 4
  • 4
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

An investigation into fuzzy clustering quality and speed : fuzzy C-means with effective seeding

Stetco, Adrian January 2017 (has links)
Cluster analysis, the automatic procedure by which large data sets can be split into similar groups of objects (clusters), has innumerable applications in a wide range of problem domains. Improvements in clustering quality (as captured by internal validation indexes) and speed (number of iterations until cost function convergence), the main focus of this work, have many desirable consequences. They can result, for example, in faster and more precise detection of illness onset based on symptoms or it could provide investors with a rapid detection and visualization of patterns in financial time series and so on. Partitional clustering, one of the most popular ways of doing cluster analysis, can be classified into two main categories: hard (where the clusters discovered are disjoint) and soft (also known as fuzzy; clusters are non-disjoint, or overlapping). In this work we consider how improvements in the speed and solution quality of the soft partitional clustering algorithm Fuzzy C-means (FCM) can be achieved through more careful and informed initialization based on data content. By carefully selecting the cluster centers in a way which disperses the initial cluster centers through the data space, the resulting FCM++ approach samples starting cluster centers during the initialization phase. The cluster centers are well spread in the input space, resulting in both faster convergence times and higher quality solutions. Moreover, we allow the user to specify a parameter indicating how far and apart the cluster centers should be picked in the dataspace right at the beginning of the clustering procedure. We show FCM++'s superior behaviour in both convergence times and quality compared with existing methods, on a wide rangeof artificially generated and real data sets. We consider a case study where we propose a methodology based on FCM++for pattern discovery on synthetic and real world time series data. We discuss a method to utilize both Pearson correlation and Multi-Dimensional Scaling in order to reduce data dimensionality, remove noise and make the dataset easier to interpret and analyse. We show that by using FCM++ we can make an positive impact on the quality (with the Xie Beni index being lower in nine out of ten cases for FCM++) and speed (with on average 6.3 iterations compared with 22.6 iterations) when trying to cluster these lower dimensional, noise reduced, representations of the time series. This methodology provides a clearer picture of the cluster analysis results and helps in detecting similarly behaving time series which could otherwise come from any domain. Further, we investigate the use of Spherical Fuzzy C-Means (SFCM) with the seeding mechanism used for FCM++ on news text data retrieved from a popular British newspaper. The methodology allows us to visualize and group hundreds of news articles based on the topics discussed within. The positive impact made by SFCM++ translates into a faster process (with on average 12.2 iterations compared with the 16.8 needed by the standard SFCM) and a higher quality solution (with the Xie Beni being lower for SFCM++ in seven out of every ten runs).
2

Experiments in Image Segmentation for Automatic US License Plate Recognition

Diaz Acosta, Beatriz 09 July 2004 (has links)
License plate recognition/identification (LPR/I) applies image processing and character recognition technology to identify vehicles by automatically reading their license plates. In the United States, however, each state has its own standard-issue plates, plus several optional styles, which are referred to as special license plates or varieties. There is a clear absence of standardization and multi-colored, complex backgrounds are becoming more frequent in license plates. Commercially available optical character recognition (OCR) systems generally fail when confronted with textured or poorly contrasted backgrounds, therefore creating the need for proper image segmentation prior to classification. The image segmentation problem in LPR is examined in two stages: license plate region detection and license plate character extraction from background. Three different approaches for license plate detection in a scene are presented: region distance from eigenspace, border location by edge detection and the Hough transform, and text detection by spectral analysis. The experiments for character segmentation involve the RGB, HSV/HSI and 1976 CIE L*a*b* color spaces as well as their Karhunen-Loéve transforms. The segmentation techniques applied include multivariate hierarchical agglomerative clustering and minimum-variance color quantization. The trade-off between accuracy and computational expense is used to select a final reliable algorithm for license plate detection and character segmentation. The spectral analysis approach together with the K-L L*a*b* transformed color quantization are found experimentally as the best alternatives for the two identified image segmentation stages for US license plate recognition. / Master of Science
3

Shluková analýza pro funkcionální data / Cluster analysis for functional data

Zemanová, Barbora January 2012 (has links)
In this work we deal with cluster analysis for functional data. Functional data contain a set of subjects that are characterized by repeated measurements of a variable. Based on these measurements we want to split the subjects into groups (clusters). The subjects in a single cluster should be similar and differ from subjects in the other clusters. The first approach we use is the reduction of data dimension followed by the clustering method K-means. The second approach is to use a finite mixture of normal linear mixed models. We estimate parameters of the model by maximum likelihood using the EM algorithm. Throughout the work we apply all described procedures to real meteorological data.
4

Účinky vybraných opatření k prevenci malárie: analýza panelových dat / The Effects of Different Malaria Prevention Measures: Panel Data Analysis

Pavelková, Adéla January 2020 (has links)
The main aim of this diploma thesis was to explore the topic of malaria preventive measures. Concretely, to study which preventive measures are useful and to see how they are distributed around the world. For international organizations, this is very important as they need to know whether funds allocated for malaria aid are distributed effectively. This study is using manually compounded data from the World Health Organization for all countries threatened by malaria mostly from 2001 to 2018. For this purpose, panel data regression methods using robust standard errors, bootstrapping and cluster analysis were used. The results showed that generally, the most useful preventive measures are indoor-residual sprayings, a combination of sprayings and insecticide-treated nets and rapid diagnostic tests. Furthermore, the effect of the population living in rural areas is significant. Besides, gross domestic product is a very important factor for African countries. The stability analysis - bootstrapping - confirmed our results. However, we examined that insecticide-treated nets are still the most distributed measures. Doing the cluster analysis, we observed that countries on the same continent should not be treated similarly and we emphasized countries that should receive higher attention. Overall, the...

Page generated in 0.0824 seconds