Return to search

Semi-supervised Information Fusion for Clustering, Classification and Detection Applications

Information fusion techniques have been widely applied in many applications including clustering, classification, detection and etc. The major objective is to improve the performance using information derived from multiple sources as compared to using information obtained from any of the sources individually. In our previous work, we demonstrated the performance improvement of Electroencephalography(EEG) based seizure detection using information fusion. In the detection problem, the optimal fusion rule is usually derived under the assumption that local decisions are conditionally independent given the hypotheses. However, due to the fact that local detectors observe the same phenomenon, it is highly possible that local decisions are correlated. To address the issue of correlation, we implement the fusion rule sub-optimally by first estimating the unknown parameters under one of the hypotheses and then using them as known parameters to estimate the rest of unknown parameters.

In the aforementioned scenario, the hypotheses are uniquely defined, i.e., all local detectors follow the same labeling convention. However, in certain applications, the regions of interest (decisions, hypotheses, clusters and etc.) are not unique, i.e., may vary locally (from sources to sources). In this case, information fusion becomes more complicated. Historically, this problem was first observed in classification and clustering. In classification applications, the category information is pre-defined and training data is required. Therefore, a classification problem can be viewed as a detection problem by considering the pre-defined classes as the hypotheses in detection. However, information fusion in clustering applications is more difficult due to the lack of prior information and the correspondence problem caused by symbolic cluster labels.

In the literature, information fusion in clustering problem is usually referred to as clustering ensemble problem. Most of the existing clustering ensemble methods are unsupervised. In this thesis, we proposed two semi-supervised clustering ensemble algorithms (SEA). Similar to existing ensemble methods, SEA consists of two major steps: the generation and fusion of base clusterings. Analogous to distributed detection, we propose a distributed clustering system which consists of a base clustering generator and a decision fusion center. The role of the base clustering generator is to generate multiple base clusterings for the given data set. The role of the decision fusion center is to combine all base clusterings into a single consensus clustering. Although training data is not required by conventional clustering algorithms (usually unsupervised), in many applications expert opinions are always available to label a small portion of data observations. These labels can be utilized as the guidance information in the fusion process. Therefore, we design two operational modes for the fusion center according to the absence or presence of the training data. In the unsupervised mode, any existing unsupervised clustering ensemble methods can be implemented as the fusion rule. In the semi-supervised mode, the proposed semi-supervised clustering ensemble methods can be implemented.
In addition, a parallel distributed clustering system is also proposed to reduce the computational times of clustering high-volume data sets. Moreover, we also propose a new cluster detection algorithm based on SEA. It is implemented in the system to provide feedback information. When data observations from a new class (other than existing training classes) are detected, signal is sent out to request new training data or switching from the semi-supervised mode to the unsupervised mode. / Thesis / Doctor of Philosophy (PhD)

Identiferoai:union.ndltd.org:mcmaster.ca/oai:macsphere.mcmaster.ca:11375/21975
Date January 2017
CreatorsLi, Huaying
ContributorsJeremic, Aleksandar, Electrical and Computer Engineering
Source SetsMcMaster University
LanguageEnglish
Detected LanguageEnglish
TypeThesis

Page generated in 0.0047 seconds