Global ETD Search

Return to search

Classification of heterogeneous data based on data type impact of similarity

Yes / Real-world datasets are increasingly heterogeneous, showing a mixture of numerical, categorical and other feature types. The main challenge for mining heterogeneous datasets is how to deal with heterogeneity present in the dataset records. Although some existing classifiers (such as decision trees) can handle heterogeneous data in specific circumstances, the performance of such models may be still improved, because heterogeneity involves specific adjustments to similarity measurements and calculations. Moreover, heterogeneous data is still treated inconsistently and in ad-hoc manner. In this paper, we study the problem of heterogeneous data classification: our purpose is to use heterogeneity as a positive feature of the data classification effort by using consistently the similarity between data objects. We address the heterogeneity issue by studying the impact of mixing data types in the calculation of data objects’ similarity. To reach our goal, we propose an algorithm to divide the initial data records based on pairwise similarity for classification subtasks with the aim to increase the quality of the data subsets and apply specialized classifier models on them. The performance of the proposed approach is evaluated on 10 publicly available heterogeneous data sets. The results show that the models achieve better performance for heterogeneous datasets when using the proposed similarity process.

Heterogeneous datasets

Similarity measures

Two-dimensional similarity space

Classification algorithms

Identifer	oai:union.ndltd.org:BRADFORD/oai:bradscholars.brad.ac.uk:10454/16760
Date	11 August 2018
Creators	Ali, N., Neagu, Daniel, Trundle, Paul R.
Source Sets	Bradford Scholars
Language	English
Detected Language	English
Type	Conference paper, Accepted Manuscript
Rights	© Springer Nature Switzerland AG 2019. Reproduced in accordance with the publisher's self-archiving policy. The final publication is available at Springer via https://doi.org/10.1007/978-3-319-97982-3_21.

Page generated in 0.0017 seconds

Classification of heterogeneous data based on data type impact of similarity

Description

Links & Downloads

Tags

Additional Fields