Spelling suggestions: "subject:"2metric 1earning"" "subject:"2metric c1earning""
31 |
EFFICIENT INFERENCE AND DOMINANT-SET BASED CLUSTERING FOR FUNCTIONAL DATAXiang Wang (18396603) 03 June 2024 (has links)
<p dir="ltr">This dissertation addresses three progressively fundamental problems for functional data analysis: (1) To do efficient inference for the functional mean model accounting for within-subject correlation, we propose the refined and bias-corrected empirical likelihood method. (2) To identify functional subjects potentially from different populations, we propose the dominant-set based unsupervised clustering method using the similarity matrix. (3) To learn the similarity matrix from various similarity metrics for functional data clustering, we propose the modularity guided and dominant-set based semi-supervised clustering method.</p><p dir="ltr">In the first problem, the empirical likelihood method is utilized to do inference for the mean function of functional data by constructing the refined and bias-corrected estimating equation. The proposed estimating equation not only improves efficiency but also enables practically feasible empirical likelihood inference by properly incorporating within-subject correlation, which has not been achieved by previous studies.</p><p dir="ltr">In the second problem, the dominant-set based unsupervised clustering method is proposed to maximize the within-cluster similarity and applied to functional data with a flexible choice of similarity measures between curves. The proposed unsupervised clustering method is a hierarchical bipartition procedure under the penalized optimization framework with the tuning parameter selected by maximizing the clustering criterion called modularity of the resulting two clusters, which is inspired by the concept of dominant set in graph theory and solved by replicator dynamics in game theory. The advantage offered by this approach is not only robust to imbalanced sizes of groups but also to outliers, which overcomes the limitation of many existing clustering methods.</p><p dir="ltr">In the third problem, the metric-based semi-supervised clustering method is proposed with similarity metric learned by modularity maximization and followed by the above proposed dominant-set based clustering procedure. Under semi-supervised setting where some clustering memberships are known, the goal is to determine the best linear combination of candidate similarity metrics as the final metric to enhance the clustering performance. Besides the global metric-based algorithm, another algorithm is also proposed to learn individual metrics for each cluster, which permits overlapping membership for the clustering. This is innovatively different from many existing methods. This method is superiorly applicable to functional data with various similarity metrics between functional curves, while also exhibiting robustness to imbalanced sizes of groups, which are intrinsic to the dominant-set based clustering approach.</p><p dir="ltr">In all three problems, the advantages of the proposed methods are demonstrated through extensive empirical investigations using simulations as well as real data applications.</p>
|
32 |
Learning Pose and State-Invariant Object Representations for Fine-Grained Recognition and RetrievalRohan Sarkar (19065215) 11 July 2024 (has links)
<p dir="ltr">Object Recognition and Retrieval is a fundamental problem in Computer Vision that involves
recognizing objects and retrieving similar object images through visual queries. While
deep metric learning is commonly employed to learn image embeddings for solving such
problems, the representations learned using existing methods are not robust to changes in
viewpoint, pose, and object state, especially for fine-grained recognition and retrieval tasks.
To overcome these limitations, this dissertation aims to learn robust object representations
that remain invariant to such transformations for fine-grained tasks. First, it focuses on
learning dual pose-invariant embeddings to facilitate recognition and retrieval at both the
category and finer object-identity levels by learning category and object-identity specific representations
in separate embedding spaces simultaneously. For this, the PiRO framework is
introduced that utilizes an attention-based dual encoder architecture and novel pose-invariant
ranking losses for each embedding space to disentangle the category and object representations
while learning pose-invariant features. Second, the dissertation introduces ranking
losses that cluster multi-view images of an object together in both the embedding spaces
while simultaneously pulling the embeddings of two objects from the same category closer in
the category embedding space to learn fundamental category-specific attributes and pushing
them apart in the object embedding space to learn discriminative features to distinguish
between them. Third, the dissertation addresses state-invariance and introduces a novel ObjectsWithStateChange
dataset to facilitate research in recognizing fine-grained objects with
state changes involving structural transformations in addition to pose and viewpoint changes.
Fourth, it proposes a curriculum learning strategy to progressively sample object images that
are harder to distinguish for training the model, enhancing its ability to capture discriminative
features for fine-grained tasks amidst state changes and other transformations. Experimental
evaluations demonstrate significant improvements in object recognition and retrieval
performance compared to previous methods, validating the effectiveness of the proposed
approaches across several challenging datasets under various transformations.</p>
|
33 |
Données multimodales pour l'analyse d'imageGuillaumin, Matthieu 27 September 2010 (has links) (PDF)
La présente thèse s'intéresse à l'utilisation de méta-données textuelles pour l'analyse d'image. Nous cherchons à utiliser ces informations additionelles comme supervision faible pour l'apprentissage de modèles de reconnaissance visuelle. Nous avons observé un récent et grandissant intérêt pour les méthodes capables d'exploiter ce type de données car celles-ci peuvent potentiellement supprimer le besoin d'annotations manuelles, qui sont coûteuses en temps et en ressources. Nous concentrons nos efforts sur deux types de données visuelles associées à des informations textuelles. Tout d'abord, nous utilisons des images de dépêches qui sont accompagnées de légendes descriptives pour s'attaquer à plusieurs problèmes liés à la reconnaissance de visages. Parmi ces problèmes, la vérification de visages est la tâche consistant à décider si deux images représentent la même personne, et le nommage de visages cherche à associer les visages d'une base de données à leur noms corrects. Ensuite, nous explorons des modèles pour prédire automatiquement les labels pertinents pour des images, un problème connu sous le nom d'annotation automatique d'image. Ces modèles peuvent aussi être utilisés pour effectuer des recherches d'images à partir de mots-clés. Nous étudions enfin un scénario d'apprentissage multimodal semi-supervisé pour la catégorisation d'image. Dans ce cadre de travail, les labels sont supposés présents pour les données d'apprentissage, qu'elles soient manuellement annotées ou non, et absentes des données de test. Nos travaux se basent sur l'observation que la plupart de ces problèmes peuvent être résolus si des mesures de similarité parfaitement adaptées sont utilisées. Nous proposons donc de nouvelles approches qui combinent apprentissage de distance, modèles par plus proches voisins et méthodes par graphes pour apprendre, à partir de données visuelles et textuelles, des similarités visuelles spécifiques à chaque problème. Dans le cas des visages, nos similarités se concentrent sur l'identité des individus tandis que, pour les images, elles concernent des concepts sémantiques plus généraux. Expérimentalement, nos approches obtiennent des performances à l'état de l'art sur plusieurs bases de données complexes. Pour les deux types de données considérés, nous montrons clairement que l'apprentissage bénéficie de l'information textuelle supplémentaire résultant en l'amélioration de la performance des systèmes de reconnaissance visuelle.
|
34 |
Clustering exploratoire pour la segmentation de données clients / Exploratory clustering for customer data segmentationEl Moussawi, Adnan 25 September 2018 (has links)
Les travaux de cette thèse s’intéressent à l’exploration de la multiplicité des solutions de clustering. Le but est de proposer aux experts marketing un outil interactif d’exploration des données clients qui considère les préférences des experts sur l’espace des attributs. Nous donnons d’abord la définition d’un système de clustering exploratoire. Nous proposons ensuite une nouvelle méthode de clustering semi-supervisée qui considère des préférences quantitatives de l’utilisateur sur les attributs d’analyse et qui gère la sensibilité à ces préférences. Notre méthode tire profit de l’apprentissage de métrique pour trouver une solution de compromis entre la structure des données et les préférences de l’expert. Enfin, nous proposons un prototype de clustering exploratoire pour la segmentation des données de la relation client intégrant la nouvelle méthode de clustering proposée, mais aussi des fonctionnalités de visualisation et d’aide à l’interprétation de résultats permettant de réaliser un processus complet de clustering exploratoire. / The research work presented in this thesis focuses on the exploration of the multiplicity of clustering solutions. The goal is to provide to marketing experts an interactive tool for exploring customer data that considers expert preferences on the space of attributes. We first give the definition of an exploratory clustering system. Then, we propose a new semi-supervised clustering method that considers user’s quantitative preferences on the analysis attributes and manages the sensitivity to these preferences. Our method takes advantage of metric learning to find a compromise solution that is both well adapted to the data structure and consistent with the expert’s preferences. Finally, we propose a prototype of exploratory clustering for customer relationship data segmentation that integrates the proposed method. The prototype also integrates visual and interaction components essential for the implementation of the exploratory clustering process.
|
35 |
Triangular similarity metric learning : A siamese architecture approach / Apprentissage métrique de similarité triangulaire : Une approche d'architecture siamoisZheng, Lilei 10 May 2016 (has links)
Dans de nombreux problèmes d’apprentissage automatique et de reconnaissance des formes, il y a toujours un besoin de fonctions métriques appropriées pour mesurer la distance ou la similarité entre des données. La fonction métrique est une fonction qui définit une distance ou une similarité entre chaque paire d’éléments d’un ensemble de données. Dans cette thèse, nous proposons une nouvelle methode, Triangular Similarity Metric Learning (TSML), pour spécifier une fonction métrique de données automatiquement. Le système TSML proposée repose une architecture Siamese qui se compose de deux sous-systèmes identiques partageant le même ensemble de paramètres. Chaque sous-système traite un seul échantillon de données et donc le système entier reçoit une paire de données en entrée. Le système TSML comprend une fonction de coût qui définit la relation entre chaque paire de données et une fonction de projection permettant l’apprentissage des formes de haut niveau. Pour la fonction de coût, nous proposons d’abord la similarité triangulaire (Triangular Similarity), une nouvelle similarité métrique qui équivaut à la similarité cosinus. Sur la base d’une version simplifiée de la similarité triangulaire, nous proposons la fonction triangulaire (the triangular loss) afin d’effectuer l’apprentissage de métrique, en augmentant la similarité entre deux vecteurs dans la même classe et en diminuant la similarité entre deux vecteurs de classes différentes. Par rapport aux autres distances ou similarités, la fonction triangulaire et sa fonction gradient nous offrent naturellement une interprétation géométrique intuitive et intéressante qui explicite l’objectif d’apprentissage de métrique. En ce qui concerne la fonction de projection, nous présentons trois fonctions différentes: une projection linéaire qui est réalisée par une matrice simple, une projection non-linéaire qui est réalisée par Multi-layer Perceptrons (MLP) et une projection non-linéaire profonde qui est réalisée par Convolutional Neural Networks (CNN). Avec ces fonctions de projection, nous proposons trois systèmes de TSML pour plusieurs applications: la vérification par paires, l’identification d’objet, la réduction de la dimensionnalité et la visualisation de données. Pour chaque application, nous présentons des expérimentations détaillées sur des ensembles de données de référence afin de démontrer l’efficacité de notre systèmes de TSML. / In many machine learning and pattern recognition tasks, there is always a need for appropriate metric functions to measure pairwise distance or similarity between data, where a metric function is a function that defines a distance or similarity between each pair of elements of a set. In this thesis, we propose Triangular Similarity Metric Learning (TSML) for automatically specifying a metric from data. A TSML system is loaded in a siamese architecture which consists of two identical sub-systems sharing the same set of parameters. Each sub-system processes a single data sample and thus the whole system receives a pair of data as the input. The TSML system includes a cost function parameterizing the pairwise relationship between data and a mapping function allowing the system to learn high-level features from the training data. In terms of the cost function, we first propose the Triangular Similarity, a novel similarity metric which is equivalent to the well-known Cosine Similarity in measuring a data pair. Based on a simplified version of the Triangular Similarity, we further develop the triangular loss function in order to perform metric learning, i.e. to increase the similarity between two vectors in the same class and to decrease the similarity between two vectors of different classes. Compared with other distance or similarity metrics, the triangular loss and its gradient naturally offer us an intuitive and interesting geometrical interpretation of the metric learning objective. In terms of the mapping function, we introduce three different options: a linear mapping realized by a simple transformation matrix, a nonlinear mapping realized by Multi-layer Perceptrons (MLP) and a deep nonlinear mapping realized by Convolutional Neural Networks (CNN). With these mapping functions, we present three different TSML systems for various applications, namely, pairwise verification, object identification, dimensionality reduction and data visualization. For each application, we carry out extensive experiments on popular benchmarks and datasets to demonstrate the effectiveness of the proposed systems.
|
36 |
A Unified View of Local Learning : Theory and Algorithms for Enhancing Linear Models / Une Vue Unifiée de l'Apprentissage Local : Théorie et Algorithmes pour l'Amélioration de Modèles LinéairesZantedeschi, Valentina 18 December 2018 (has links)
Dans le domaine de l'apprentissage machine, les caractéristiques des données varient généralement dans l'espace des entrées : la distribution globale pourrait être multimodale et contenir des non-linéarités. Afin d'obtenir de bonnes performances, l'algorithme d'apprentissage devrait alors être capable de capturer et de s'adapter à ces changements. Même si les modèles linéaires ne parviennent pas à décrire des distributions complexes, ils sont réputés pour leur passage à l'échelle, en entraînement et en test, aux grands ensembles de données en termes de nombre d'exemples et de nombre de fonctionnalités. Plusieurs méthodes ont été proposées pour tirer parti du passage à l'échelle et de la simplicité des hypothèses linéaires afin de construire des modèles aux grandes capacités discriminatoires. Ces méthodes améliorent les modèles linéaires, dans le sens où elles renforcent leur expressivité grâce à différentes techniques. Cette thèse porte sur l'amélioration des approches d'apprentissage locales, une famille de techniques qui infère des modèles en capturant les caractéristiques locales de l'espace dans lequel les observations sont intégrées.L'hypothèse fondatrice de ces techniques est que le modèle appris doit se comporter de manière cohérente sur des exemples qui sont proches, ce qui implique que ses résultats doivent aussi changer de façon continue dans l'espace des entrées. La localité peut être définie sur la base de critères spatiaux (par exemple, la proximité en fonction d'une métrique choisie) ou d'autres relations fournies, telles que l'association à la même catégorie d'exemples ou un attribut commun. On sait que les approches locales d'apprentissage sont efficaces pour capturer des distributions complexes de données, évitant de recourir à la sélection d'un modèle spécifique pour la tâche. Cependant, les techniques de pointe souffrent de trois inconvénients majeurs :ils mémorisent facilement l'ensemble d'entraînement, ce qui se traduit par des performances médiocres sur de nouvelles données ; leurs prédictions manquent de continuité dans des endroits particuliers de l'espace ; elles évoluent mal avec la taille des ensembles des données. Les contributions de cette thèse examinent les problèmes susmentionnés dans deux directions : nous proposons d'introduire des informations secondaires dans la formulation du problème pour renforcer la continuité de la prédiction et atténuer le phénomène de la mémorisation ; nous fournissons une nouvelle représentation de l'ensemble de données qui tient compte de ses spécificités locales et améliore son évolutivité. Des études approfondies sont menées pour mettre en évidence l'efficacité de ces contributions pour confirmer le bien-fondé de leurs intuitions. Nous étudions empiriquement les performances des méthodes proposées tant sur des jeux de données synthétiques que sur des tâches réelles, en termes de précision et de temps d'exécution, et les comparons aux résultats de l'état de l'art. Nous analysons également nos approches d'un point de vue théorique, en étudiant leurs complexités de calcul et de mémoire et en dérivant des bornes de généralisation serrées. / In Machine Learning field, data characteristics usually vary over the space: the overall distribution might be multi-modal and contain non-linearities.In order to achieve good performance, the learning algorithm should then be able to capture and adapt to these changes. Even though linear models fail to describe complex distributions, they are renowned for their scalability, at training and at testing, to datasets big in terms of number of examples and of number of features. Several methods have been proposed to take advantage of the scalability and the simplicity of linear hypotheses to build models with great discriminatory capabilities. These methods empower linear models, in the sense that they enhance their expressive power through different techniques. This dissertation focuses on enhancing local learning approaches, a family of techniques that infers models by capturing the local characteristics of the space in which the observations are embedded. The founding assumption of these techniques is that the learned model should behave consistently on examples that are close, implying that its results should also change smoothly over the space. The locality can be defined on spatial criteria (e.g. closeness according to a selected metric) or other provided relations, such as the association to the same category of examples or a shared attribute. Local learning approaches are known to be effective in capturing complex distributions of the data, avoiding to resort to selecting a model specific for the task. However, state of the art techniques suffer from three major drawbacks: they easily memorize the training set, resulting in poor performance on unseen data; their predictions lack of smoothness in particular locations of the space;they scale poorly with the size of the datasets. The contributions of this dissertation investigate the aforementioned pitfalls in two directions: we propose to introduce side information in the problem formulation to enforce smoothness in prediction and attenuate the memorization phenomenon; we provide a new representation for the dataset which takes into account its local specificities and improves scalability. Thorough studies are conducted to highlight the effectiveness of the said contributions which confirmed the soundness of their intuitions. We empirically study the performance of the proposed methods both on toy and real tasks, in terms of accuracy and execution time, and compare it to state of the art results. We also analyze our approaches from a theoretical standpoint, by studying their computational and memory complexities and by deriving tight generalization bounds.
|
37 |
Video motion description based on histograms of sparse trajectoriesOliveira, Fábio Luiz Marinho de 05 September 2016 (has links)
Submitted by Renata Lopes (renatasil82@gmail.com) on 2017-06-06T19:12:19Z
No. of bitstreams: 1
fabioluizmarinhodeoliveira.pdf: 1410854 bytes, checksum: cb71ee666cda7d462ce0dd33963a988c (MD5) / Approved for entry into archive by Adriana Oliveira (adriana.oliveira@ufjf.edu.br) on 2017-06-07T13:33:08Z (GMT) No. of bitstreams: 1
fabioluizmarinhodeoliveira.pdf: 1410854 bytes, checksum: cb71ee666cda7d462ce0dd33963a988c (MD5) / Made available in DSpace on 2017-06-07T13:33:08Z (GMT). No. of bitstreams: 1
fabioluizmarinhodeoliveira.pdf: 1410854 bytes, checksum: cb71ee666cda7d462ce0dd33963a988c (MD5)
Previous issue date: 2016-09-05 / Descrição de movimento tem sido um tema desafiador e popular há muitos anos em
visão computacional e processamento de sinais, mas também intimamente relacionado a
aprendizado de máquina e reconhecimento de padrões. Frequentemente, para realizar essa
tarefa, informação de movimento é extraída e codificada em um descritor. Este trabalho
apresenta um método simples e de rápida computação para extrair essa informação e
codificá-la em descritores baseados em histogramas de deslocamentos relativos. Nossos
descritores são compactos, globais, que agregam informação de quadros inteiros, e o que
chamamos de auto-descritor, que não depende de informações de sequências senão aquela
que pretendemos descrever. Para validar estes descritores e compará-los com outros tra
balhos, os utilizamos no contexto de Reconhecimento de Ações Humanas, no qual cenas
são classificadas de acordo com as ações nelas exibidas. Nessa validação, obtemos resul
tados comparáveis aos do estado-da-arte para a base de dados KTH. Também avaliamos
nosso método utilizando as bases UCF11 e Hollywood2, com menores taxas de reconhe
cimento, considerando suas maiores complexidades. Nossa abordagem é promissora, pelas
razoáveis taxas de reconhecimento obtidas com um método muito menos complexo que os
do estado-da-arte, em termos de velocidade de computação e compacidade dos descritores
obtidos. Adicionalmente, experimentamos com o uso de Aprendizado de Métrica para a
classificação de nossos descritores, com o intuito de melhorar a separabilidade e a com
pacidade dos descritores. Os resultados com Aprendizado de Métrica apresentam taxas
de reconhecimento inferiores, mas grande melhoria na compacidade dos descritores. / Motion description has been a challenging and popular theme over many years within
computer vision and signal processing, but also very closely related to machine learn
ing and pattern recognition. Very frequently, to address this task, one extracts motion
information from image sequences and encodes this information into a descriptor. This
work presents a simple and fast computing method to extract this information and en
code it into descriptors based on histograms of relative displacements. Our descriptors
are compact, global, meaning it aggregates information from whole frames, and what we
call self-descriptors, meaning they do not depend on information from sequences other
than the one we want to describe. To validate these descriptors and compare them to
other works, we use them in the context of Human Action Recognition, where scenes are
classified according to the action portrayed. In this validation, we achieve results that are
comparable to those in the state-of-the-art for the KTH dataset. We also evaluate our
method on the UCF11 and Hollywood2 datasets, with lower recognition rates, considering
their higher complexity. Our approach is a promising one, due to the fairly good recogni
tion rates we obtain with a much less complex method than those of the state-of-the-art,
in terms of speed of computation and final descriptor compactness. Additionally, we ex
periment with the use of Metric Learning in the classification of our descriptors, aiming
to improve the separability and compactness of the descriptors. Our results for Metric
Learning show inferior recognition rates, but great improvement for the compactness of
the descriptors.
|
38 |
Multi-modal Models for Product Similarity : Comparative evaluation of unimodal and multi-modal architectures for product similarity prediction and product retrieval / Multimodala modeller för produktlikhetFrantzolas, Christos January 2023 (has links)
With the rapid growth of e-commerce, enabling effective product recommendation systems and improving product search for shoppers plays a crucial role in driving customer satisfaction. Traditional product retrieval approaches have mainly relied on unimodal models focusing on text data. However, to capture auxiliary context and improve the accuracy of similarity predictions, it is crucial to explore architectures that can leverage additional sources of information, such as images. This thesis compares the performance of multi- and unimodal methods for product similarity prediction and product retrieval. Both approaches are applied to two e-commerce datasets, one containing English and another containing Swedish product descriptions. A pre-trained multi-modal model called CLIP is used as a feature extractor. Different models are trained on CLIP embeddings using either text-only, image-only or image-text inputs. An extension of triplet loss with margins is tested, along with various training setups. Given the lack of similarity labels between products, product similarity prediction is studied by measuring the performance of a K-Nearest Neighbour classifier implemented on features extracted by the trained models. The thesis results demonstrate that multi-modal architectures outperform unimodal models in predicting product similarity. The same is true for product retrieval. Combining textual and visual information seems to lead to more accurate predictions than models relying on only one modality. The findings of this research have considerable implications for e-commerce platforms and recommendation systems, providing insights into the effectiveness of multi-modal models for product-related tasks. Overall, the study contributes to the existing body of knowledge by highlighting the advantages of leveraging multiple sources of information for deep learning. It also presents recommendations for designing and implementing effective multi-modal architectures. / I och med den snabba tillväxten av e-handel spelar att möjliggöra effektivare produktrekommendationssystem och att förbättra produktsök för konsumenter en viktig roll för att öka kundnöjdheten. Traditionella angreppsätt för produktsök har huvudsakligen tillförlitat sig på unimodala textmodeller. För att fånga ett bredare kontext och förbättra exaktheten av prediktioner av likhet mellan produkter är det viktigt att utforska arkitekturer som kan utnyttja fler informationskällor så som bilder. Den här avhandlingen jämför prestanda hos multimodala och unimodala metoder för produktlikhetsprediktioner och produktsök. Båda angreppsätten är tillämpade på två e-handelsdatamängder, en med engelska produktbeskrivningar och en med svenska. En förtränad multimodal modell kallad CLIP används för att skapa produktrepresentationer. Olika modeller har tränats på CLIPs representationer, antingen med enbart text, enbart bild eller både bild och text. En utökning av ett triplettmått med marginaler har testats som träningskriterium, i kombination med olika träningsinställningar. Givet en avsaknad av likhetsannoteringar mellan produkter så har produktlikhetsprediktion studerats genom att mäta prestandan av K-närmaste-grannar-klassificering genom att använda vektor-representationer från de tränade modellerna. Avhandlingens resultat visar att multimodala arkitekturer överträffar unimodala modeller för produktlikhetsprediktion. Att kombinera textuell och visuell information verkar leda till mer korrekta prediktioner jämfört med modeller som förlitar sig på endast en modalitet. Forskningsresultaten har markanta implikationer för e-handelsplattformar och rekommendationssystem, genom att tillhandahålla insikter i multimodala modellers effektivitet i produktrelaterade uppgifter. Överlag så bidrar studien till den existerande litteraturen genom att förtydliga fördelarna av att utnyttja flera informationskällor för djupinlärning. Den resulterar också i rekommendationer för att designa och implementera effektiva multimodala modellarkitekturer.
|
39 |
Attribute Embedding for Variational Auto-Encoders : Regularization derived from triplet loss / Inbäddning av attribut för Variationsautokodare : Strukturering av det Latenta RummetE. L. Dahlin, Anton January 2022 (has links)
Techniques for imposing a structure on the latent space of neural networks have seen much development in recent years. Clustering techniques used for classification have been used to great success, and with this work we hope to bridge the gap between contrastive losses and Generative models. We introduce an embedding loss derived from Triplet loss to show that attributes and information can be clustered in specific dimensions in the latent space of Variational Auto-Encoders. This allows control over the embedded attributes via manipulation of these latent space dimensions. This work also serves to take steps towards the usage of any data augmentation when applying Triplet loss to Variational Auto-Encoders. In this work three different Variational Auto-Encoders are trained on three different datasets to embed information in three different ways using this novel method. Our results show the method working to varying degrees depending on the implementation and the information embedded. Two experiments using image data and one using waveform audio shows that the method is modality invariant. / Tekniker för att införa en struktur i det latenta utrymmet i neurala nätverk har sett mycket utveckling under de senaste åren. Kluster metoder som används för klassificering har använts till stor framgång, och med detta arbete hoppas vi kunna brygga gapet mellan kontrastiva förlustfunktioner och generativa modeller. Vi introducerar en förlustfunktion för inbäddning härledd från triplet loss för att visa att attribut och information kan klustras i specifika dimensioner i det latenta utrymmet hos variationsautokodare. Detta tillåter kontroll över de inbäddade attributen via manipulering av dessa dimensioner i latenta utrymmet. Detta arbete tjänar också till att ta steg mot användningen av olika data augmentationer när triplet loss tillämpas på generativa modeller. Tre olika Variationsautokodare tränas på tre olika dataset för att bädda in information på tre olika sätt med denna nya metod. Våra resultat visar att metoden fungerar i varierande grad beroende på hur den tillämpas och vilken information som inbäddas. Två experiment använder bild-data och ett använder sig av ljud, vilket visar på att metoden är modalitetsinvariant.
|
40 |
Improving Zero-Shot Learning via Distribution EmbeddingsChalumuri, Vivek January 2020 (has links)
Zero-Shot Learning (ZSL) for image classification aims to recognize images from novel classes for which we have no training examples. A common approach to tackling such a problem is by transferring knowledge from seen to unseen classes using some auxiliary semantic information of class labels in the form of class embeddings. Most of the existing methods represent image features and class embeddings as point vectors, and such vector representation limits the expressivity in terms of modeling the intra-class variability of the image classes. In this thesis, we propose three novel ZSL methods that represent image features and class labels as distributions and learn their corresponding parameters as distribution embeddings. Therefore, the intra-class variability of image classes is better modeled. The first model is a Triplet model, where image features and class embeddings are projected as Gaussian distributions in a common space, and their associations are learned by metric learning. Next, we have a Triplet-VAE model, where two VAEs are trained with triplet based distributional alignment for ZSL. The third model is a simple Probabilistic Classifier for ZSL, which is inspired by energy-based models. When evaluated on the common benchmark ZSL datasets, the proposed methods result in an improvement over the existing state-of-the-art methods for both traditional ZSL and more challenging Generalized-ZSL (GZSL) settings. / Zero-Shot Learning (ZSL) för bildklassificering syftar till att känna igen bilder från nya klasser som vi inte har några utbildningsexempel för. Ett vanligt tillvägagångssätt för att ta itu med ett sådant problem är att överföra kunskap från sett till osynliga klasser med hjälp av någon semantisk information om klassetiketter i form av klassinbäddningar. De flesta av de befintliga metoderna representerar bildfunktioner och klassinbäddningar som punktvektorer, och sådan vektorrepresentation begränsar uttrycksförmågan när det gäller att modellera bildklassernas variation inom klass. I denna avhandling föreslår vi tre nya ZSL-metoder som representerar bildfunktioner och klassetiketter som distributioner och lär sig deras motsvarande parametrar som distributionsinbäddningar. Därför är bildklassernas variation inom klass bättre modellerad. Den första modellen är en Triplet-modell, där bildfunktioner och klassinbäddningar projiceras som Gaussiska fördelningar i ett gemensamt utrymme, och deras föreningar lärs av metrisk inlärning. Därefter har vi en Triplet-VAE-modell, där två VAEs tränas med tripletbaserad fördelningsinriktning för ZSL. Den tredje modellen är en enkel Probabilistic Classifier för ZSL, som är inspirerad av energibaserade modeller. När de utvärderas på de vanliga ZSLdatauppsättningarna, resulterar de föreslagna metoderna i en förbättring jämfört med befintliga toppmoderna metoder för både traditionella ZSL och mer utmanande Generalized-ZSL (GZSL) -inställningar.
|
Page generated in 0.0639 seconds