Global ETD Search

Return to search

Multimodal Deep Learning for Multi-Label Classification and Ranking Problems

In recent years, deep neural network models have shown to outperform many state of the art algorithms. The reason for this is, unsupervised pretraining with multi-layered deep neural networks have shown to learn better features, which further improves many supervised tasks. These models not only automate the feature extraction process but also provide with robust features for various machine learning tasks. But the unsupervised pretraining and feature extraction using multi-layered networks are restricted only to the input features and not to the output. The performance of many supervised learning algorithms (or models) depends on how well the output dependencies are handled by these algorithms [Dembczy´nski et al., 2012]. Adapting the standard neural networks to handle these output dependencies for any speciﬁc type of problem has been an active area of research [Zhang and Zhou, 2006, Ribeiro et al., 2012].
On the other hand, inference into multimodal data is considered as a difﬁcult problem in machine learning and recently ‘deep multimodal neural networks’ have shown signiﬁcant results [Ngiam et al., 2011, Srivastava and Salakhutdinov, 2012]. Several problems like classiﬁcation with complete or missing modality data, generating the missing modality etc., are shown to perform very well with these models. In this work, we consider three nontrivial supervised learning tasks (i) multi-class classiﬁcation (MCC),
(ii) multi-label classiﬁcation (MLC) and (iii) label ranking (LR), mentioned in the order of increasing complexity of the output. While multi-class classiﬁcation deals with predicting one class for every instance, multi-label classiﬁcation deals with predicting more than one classes for every instance and label ranking deals with assigning a rank to each label for every instance. All the work in this ﬁeld is associated around formulating new error functions that can force network to identify the output dependencies.
Aim of our work is to adapt neural network to implicitly handle the feature extraction (dependencies) for output in the network structure, removing the need of hand crafted error functions. We show that the multimodal deep architectures can be adapted for these type of problems (or data) by considering labels as one of the modalities. This also brings unsupervised pretraining to the output along with the input. We show that these models can not only outperform standard deep neural networks, but also outperform standard adaptations of neural networks for individual domains under various metrics over several data sets considered by us. We can observe that the performance of our models over other models improves even more as the complexity of the output/ problem increases.

Neural Networks

Deep Neural Network Models

Neural Network Architecture

Multimodal Deep Neural Networks

Multimodal Deep Learning

Multi-Label Classification (MLC)

Multi-class Classification (MCC)

Label Ranking

Multimodal Neural Networks

Supervised Learning

Multilayer Neural Network

Perceptron Model

Computer Science

Identifer	oai:union.ndltd.org:IISc/oai:etd.iisc.ernet.in:2005/3681
Date	January 2015
Creators	Dubey, Abhishek
Contributors	Dukkipati, Ambedkar
Source Sets	India Institute of Science
Language	en_US
Detected Language	English
Type	Thesis
Relation	G26906

Page generated in 0.0028 seconds

Multimodal Deep Learning for Multi-Label Classification and Ranking Problems

Description

Links & Downloads

Tags

Additional Fields