• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 2
  • Tagged with
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Improved Approaches for Attribute Clustering Based on the Group Genetic Algorithm

Lin, Feng-Shih 09 September 2011 (has links)
Feature selection is a pre-processing step in data-mining and machine learning, and plays an important role for analyzing high-dimensional data. Appropriately selected features can not only reduce the complexity of the mining or learning process, but also improve the accuracy of results. In the past, the concept of performing the task of feature selection by attribute clustering was proposed. If similar attributes could be clustered into groups, attributes could be easily replaced by others in the same group when some attribute values were missed. Hong et al. also proposed several genetic algorithms for finding appropriate attribute clusters. Their approaches, however, suffered from the weakness that multiple chromosomes would represent the same attribute clustering result (feasible solution) due to the combinatorial property, thus causing a larger search space than needed. In this thesis, we thus attempt to improve the performance of the GA-based attribute-clustering process based on the grouping genetic algorithm (GGA). Two GGA-based attribute clustering approaches are proposed. In the first approach, the general GGA representation and operators are used to reduce the redundancy of chromosome representation for attribute clustering. In the second approach, a new encoding scheme with corresponding crossover and mutation operators are designed, and an improved fitness function is proposed to achieve better convergence speed and provide more flexible alternatives than the first one. At last, experiments are made to compare the efficiency and the accuracy of the proposed approaches and the previous ones.
2

Improving Classification and Attribute Clustering: An Iterative Semi-supervised Approach

Seifi, Farid January 2015 (has links)
This thesis proposes a novel approach to attribute clustering. It exploits the strength of semi-supervised learning to improve the quality of attribute clustering particularly when labeled data is limited. The significance of this work derives in part from the broad, and increasingly important, usage of attribute clustering to address outstanding problems within the machine learning community. This form of clustering has also been shown to have strong practical applications, being usable in heavyweight industrial applications. Although researchers have focused on supervised and unsupervised attribute clustering in recent years, semi-supervised attribute clustering has not received substantial attention. In this research, we propose an innovative two step iterative semi-supervised attribute clustering framework. This new framework, in each iteration, uses the result of attribute clustering to improve a classifier. It then uses the classifier to augment the training data used by attribute clustering in next iteration. This iterative framework outputs an improved classifier and attribute clustering at the same time. It gives more accurate clusters of attributes which better fit the real relations between attributes. In this study we proposed two new usages for attribute clustering to improve classification: solving the automatic view definition problem for multi-view learning and improving missing attribute-value handling at induction and prediction time. The application of these two new usages of attribute clustering in our proposed semi-supervised attribute clustering is evaluated using real world data sets from different domains.

Page generated in 0.1308 seconds