Global ETD Search

Return to search

Contributions to Unsupervised and Semi-Supervised Learning

This thesis studies two problems in theoretical machine learning. The first
part of the thesis investigates the statistical stability of clustering
algorithms. In the second part, we study the relative advantage of having
unlabeled data in classification problems.

Clustering stability was proposed and used as a model selection method in
clustering tasks. The main idea of the method is that from a given data set
two independent samples are taken. Each sample individually is clustered with
the same clustering algorithm, with the same setting of its parameters. If the
two resulting clusterings turn out to be close in some metric, it is concluded
that the clustering algorithm and the setting of its parameters match the data
set, and that clusterings obtained are meaningful. We study asymptotic
properties of this method for certain types of cost minimizing clustering
algorithms and relate their asymptotic stability to the number of optimal
solutions of the underlying optimization problem.

In classification problems, it is often expensive to obtain labeled data, but
on the other hand, unlabeled data are often plentiful and cheap. We study how
the access to unlabeled data can decrease the amount of labeled data
needed in the worst-case sense. We propose an extension of the probably
approximately correct (PAC) model in which this question can be naturally
studied. We show that for certain basic tasks the access to unlabeled data
might, at best, halve the amount of labeled data needed.

http://hdl.handle.net/10012/4445

machine learning

statistics

unsupervised learning

semi-supervised learning

learning theory

Computer Science

Identifer	oai:union.ndltd.org:WATERLOO/oai:uwspace.uwaterloo.ca:10012/4445
Date	21 May 2009
Creators	Pal, David
Source Sets	University of Waterloo Electronic Theses Repository
Language	English
Detected Language	English
Type	Thesis or Dissertation

Page generated in 0.0024 seconds

Contributions to Unsupervised and Semi-Supervised Learning

Description

Links & Downloads

Tags

Additional Fields