Global ETD Search

Return to search

Odhadování přesnosti klasifikačních metod na základě vlasnosti dat / Estimating performance of classifiers from dataset properties

The following thesis explores the impact of the dataset distributional prop- erties on classification performance. We use Gaussian copulas to generate 1000 artificial dataset and train classifiers on them. We train Generalized linear models, Distributed Random forest, Extremely randomized trees and Gradient boosting machines via H2O.ai machine learning platform accessed by R. Classi- fication performance on these datasets is evaluated and empirical observations on influence are presented. Secondly, we use real Australian credit dataset and predict which classifier is possibly going to work best. The predicted perfor- mance for any individual method is based on penalizing the differences between the Australian dataset and artificial datasets where the method performed com- paratively better, but it failed to predict correctly. 1

http://www.nusl.cz/ntk/nusl-388668

Identifer	oai:union.ndltd.org:nusl.cz/oai:invenio.nusl.cz:388668
Date	January 2018
Creators	Todt, Michal
Contributors	Polák, Petr, Baruník, Jozef
Source Sets	Czech ETDs
Language	English
Detected Language	English
Type	info:eu-repo/semantics/masterThesis
Rights	info:eu-repo/semantics/restrictedAccess

Page generated in 0.0018 seconds

Odhadování přesnosti klasifikačních metod na základě vlasnosti dat / Estimating performance of classifiers from dataset properties

Description

Links & Downloads

Tags

Additional Fields