Global ETD Search

Return to search

Enhanced Similarity Matching by Grouping of Features

In this report we introduce a classification system named Grouping of Features (GoF), together with a theoretical exploration of some of the important concepts in the Instant Based Learning(IBL)-field that are related to this system.A dataset's original features are by the GoF-system grouped together into abstract features. Each of these groups may capture inherent structures in one of the classes in the data. A genetic algorithm is used to extract a tree of such groups that can be used for measuring similarity between samples. As each class may have different inherent structures, different trees of groups are found for the different classes. To adjust the importance of one group in regards to the classifier, the concept of power average is used. A group's power-average may let either the smallest or the largest value of its group dominate, or take any value in-between. Tests show that the GoF-system outperforms kNN at many classification tasks.The system started as a research project by Verdande Technology, and a set of algorithms had been fully or partially implemented before the start of this thesis project. There existed no documentation however, so we have built an understanding of the fields on which the system relies, analyzed their properties, documented this understanding in explicit method descriptions, and tested, modified and extended the original system.During this project we found that scaling or weighting features as a data pre-processing step or during classification often is crucial for the performance of the classification-algorithm. Our hypothesis then was that by letting the weights vary between features and between groups of features, more complex structures could be captured. This would also make the classifier less dependent on how the features are originally scaled. We therefore implemented the Weighted Grouping of Features, an extension of the GoF-system.Notable results in this thesis include a 95.48 percent and 100.00 percent correctly classified non-scaled UCI Wine dataset using the GoF- and WGoF-system, respectively.

ntnudaim:6964

MTDT datateknikk

Intelligente systemer

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:ntnu-20114
Date	January 2012
Creators	Landstad, Andreas Ståleson
Publisher	Norges teknisk-naturvitenskapelige universitet, Institutt for datateknikk og informasjonsvitenskap, Institutt for datateknikk og informasjonsvitenskap
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.002 seconds

Enhanced Similarity Matching by Grouping of Features

Description

Links & Downloads

Tags

Additional Fields