Global ETD Search

21	Discriminative object categorization with external semantic knowledge Hwang, Sung Ju 25 September 2013 (has links) Visual object category recognition is one of the most challenging problems in computer vision. Even assuming that we can obtain a near-perfect instance level representation with the advances in visual input devices and low-level vision techniques, object categorization still remains as a difficult problem because it requires drawing boundaries between instances in a continuous world, where the boundaries are solely defined by human conceptualization. Object categorization is essentially a perceptual process that takes place in a human-defined semantic space. In this semantic space, the categories reside not in isolation, but in relation to others. Some categories are similar, grouped, or co-occur, and some are not. However, despite this semantic nature of object categorization, most of the today's automatic visual category recognition systems rely only on the category labels for training discriminative recognition with statistical machine learning techniques. In many cases, this could result in the recognition model being misled into learning incorrect associations between visual features and the semantic labels, from essentially overfitting to training set biases. This limits the model's prediction power when new test instances are given. Using semantic knowledge has great potential to benefit object category recognition. First, semantic knowledge could guide the training model to learn a correct association between visual features and the categories. Second, semantics provide much richer information beyond the membership information given by the labels, in the form of inter-category and category-attribute distances, relations, and structures. Finally, the semantic knowledge scales well as the relations between categories become larger with an increasing number of categories. My goal in this thesis is to learn discriminative models for categorization that leverage semantic knowledge for object recognition, with a special focus on the semantic relationships among different categories and concepts. To this end, I explore three semantic sources, namely attributes, taxonomies, and analogies, and I show how to incorporate them into the original discriminative model as a form of structural regularization. In particular, for each form of semantic knowledge I present a feature learning approach that defines a semantic embedding to support the object categorization task. The regularization penalizes the models that deviate from the known structures according to the semantic knowledge provided. The first semantic source I explore is attributes, which are human-describable semantic characteristics of an instance. While the existing work treated them as mid-level features which did not introduce new information, I focus on their potential as a means to better guide the learning of object categories, by enforcing the object category classifiers to share features with attribute classifiers, in a multitask feature learning framework. This approach essentially discovers the common low-dimensional features that support predictions in both semantic spaces. Then, I move on to the semantic taxonomy, which is another valuable source of semantic knowledge. The merging and splitting criteria for the categories on a taxonomy are human-defined, and I aim to exploit this implicit semantic knowledge. Specifically, I propose a tree of metrics (ToM) that learns metrics that capture granularity-specific similarities at different nodes of a given semantic taxonomy, and uses a regularizer to isolate granularity-specific disjoint features. This approach captures the intuition that the features used for the discrimination of the parent class should be different from the features used for the children classes. Such learned metrics can be used for hierarchical classification. The use of a single taxonomy can be limited in that its structure is not optimal for hierarchical classification, and there may exist no single optimal semantic taxonomy that perfectly aligns with visual distributions. Thus, I next propose a way to overcome this limitation by leveraging multiple taxonomies as semantic sources to exploit, and combine the acquired complementary information across multiple semantic views and granularities. This allows us, for example, to synthesize semantics from both 'Biological', and 'Appearance'-based taxonomies when learning the visual features. Finally, as a further exploration of more complex semantic relations different from the previous two pairwise similarity-based models, I exploit analogies, which encode the relational similarities between two related pairs of categories. Specifically, I use analogies to regularize a discriminatively learned semantic embedding space for categorization, such that the displacements between the two category embeddings in both category pairs of the analogy are enforced to be the same. Such a constraint allows for a more confusing pair of categories to benefit from a clear separation in the matched pair of categories that share the same relation. All of these methods are evaluated on challenging public datasets, and are shown to effectively improve the recognition accuracy over purely discriminative models, while also guiding the recognition to be more semantic to human perception. Further, the applications of the proposed methods are not limited to visual object categorization in computer vision, but they can be applied to any classification problems where there exists some domain knowledge about the relationships or structures between the classes. Possible applications of my methods outside the visual recognition domain include document classification in natural language processing, and gene-based animal or protein classification in computational biology. / text Computer vision Machine learning Object categorization Object recognition Feature learning Metric learning Multitask learning Multiple kernel learning Embedding Manifold learning Regularization method Structured sparsity Structured regularization Hierarchical model
22	Multi-Modal Similarity Learning for 3D Deformable Registration of Medical Images Michel, Fabrice 04 October 2013 (has links) (PDF) Even though the prospect of fusing images issued by different medical imagery systems is highly contemplated, the practical instantiation of it is subject to a theoretical hurdle: the definition of a similarity between images. Efforts in this field have proved successful for select pairs of images; however defining a suitable similarity between images regardless of their origin is one of the biggest challenges in deformable registration. In this thesis, we chose to develop generic approaches that allow the comparison of any two given modality. The recent advances in Machine Learning permitted us to provide innovative solutions to this very challenging problem. To tackle the problem of comparing incommensurable data we chose to view it as a data embedding problem where one embeds all the data in a common space in which comparison is possible. To this end, we explored the projection of one image space onto the image space of the other as well as the projection of both image spaces onto a common image space in which the comparison calculations are conducted. This was done by the study of the correspondences between image features in a pre-aligned dataset. In the pursuit of these goals, new methods for image regression as well as multi-modal metric learning methods were developed. The resulting learned similarities are then incorporated into a discrete optimization framework that mitigates the need for a differentiable criterion. Lastly we investigate on a new method that discards the constraint of a database of images that are pre-aligned, only requiring data annotated (segmented) by a physician. Experiments are conducted on two challenging medical images data-sets (Pre-Aligned MRI images and PET/CT images) to justify the benefits of our approach. [SPI:OTHER] Engineering Sciences/Other Machine-learning Deformable registration Metric-learning
23	Supervised metric learning with generalization guarantees Bellet, Aurélien 11 December 2012 (has links) (PDF) In recent years, the crucial importance of metrics in machine learningalgorithms has led to an increasing interest in optimizing distanceand similarity functions using knowledge from training data to make them suitable for the problem at hand.This area of research is known as metric learning. Existing methods typically aim at optimizing the parameters of a given metric with respect to some local constraints over the training sample. The learned metrics are generally used in nearest-neighbor and clustering algorithms.When data consist of feature vectors, a large body of work has focused on learning a Mahalanobis distance, which is parameterized by a positive semi-definite matrix. Recent methods offer good scalability to large datasets.Less work has been devoted to metric learning from structured objects (such as strings or trees), because it often involves complex procedures. Most of the work has focused on optimizing a notion of edit distance, which measures (in terms of number of operations) the cost of turning an object into another.We identify two important limitations of current supervised metric learning approaches. First, they allow to improve the performance of local algorithms such as k-nearest neighbors, but metric learning for global algorithms (such as linear classifiers) has not really been studied so far. Second, and perhaps more importantly, the question of the generalization ability of metric learning methods has been largely ignored.In this thesis, we propose theoretical and algorithmic contributions that address these limitations. Our first contribution is the derivation of a new kernel function built from learned edit probabilities. Unlike other string kernels, it is guaranteed to be valid and parameter-free. Our second contribution is a novel framework for learning string and tree edit similarities inspired by the recent theory of (epsilon,gamma,tau)-good similarity functions and formulated as a convex optimization problem. Using uniform stability arguments, we establish theoretical guarantees for the learned similarity that give a bound on the generalization error of a linear classifier built from that similarity. In our third contribution, we extend the same ideas to metric learning from feature vectors by proposing a bilinear similarity learning method that efficiently optimizes the (epsilon,gamma,tau)-goodness. The similarity is learned based on global constraints that are more appropriate to linear classification. Generalization guarantees are derived for our approach, highlighting that our method minimizes a tighter bound on the generalization error of the classifier. Our last contribution is a framework for establishing generalization bounds for a large class of existing metric learning algorithms. It is based on a simple adaptation of the notion of algorithmic robustness and allows the derivation of bounds for various loss functions and regularizers. Metric learning Statistical learning Convex optimization Classification Structured data Edit distance Generalization bounds
24	Apprentissage de représentations pour la reconnaissance visuelle / Learning representations for visual recognition Saxena, Shreyas 12 December 2016 (has links) Dans cette dissertation, nous proposons des méthodes d’apprentissage automa-tique aptes à bénéficier de la récente explosion des volumes de données digitales.Premièrement nous considérons l’amélioration de l’efficacité des méthodes derécupération d’image. Nous proposons une approche d’apprentissage de métriques locales coordonnées (Coordinated Local Metric Learning, CLML) qui apprends des métriques locales de Mahalanobis, puis les intègre dans une représentation globale où la distance l2 peut être utilisée. Ceci permet de visualiser les données avec une unique représentation 2D, et l’utilisation de méthodes de récupération efficaces basées sur la distance l2. Notre approche peut être interprétée comme l’apprentissage d’une projection linéaire de descripteurs donnés par une méthode a noyaux de grande dimension définie explictement. Cette interprétation permet d’appliquer des outils existants pour l’apprentissage de métriques de Mahalanobis à l’apprentissage de métriques locales coordonnées. Nos expériences montrent que la CLML amé-liore les résultats en matière de récupération de visage obtenues par les approches classiques d’apprentissage de métriques locales et globales.Deuxièmement, nous présentons une approche exploitant les modèles de ré-seaux neuronaux convolutionnels (CNN) pour la reconnaissance faciale dans lespectre visible. L’objectif est l’amélioration de la reconnaissance faciale hétérogène, c’est à dire la reconnaissance faciale à partir d’images infra-rouges avec des images d’entraînement dans le spectre visible. Nous explorerons différentes stratégies d’apprentissage de métriques locales à partir des couches intermédiaires d’un CNN, afin de faire le rapprochement entre des images de sources différentes. Dans nos expériences, la profondeur de la couche optimale pour une tâche donnée est positivement corrélée avec le changement entre le domaine source (données d’entraînement du CNN) et le domaine cible. Les résultats montrent que nous pouvons utiliser des CNN entraînés sur des images du spectre visible pour obtenir des résultats meilleurs que l’état de l’art pour la reconnaissance faciale hétérogène (images et dessins quasi-infrarouges).Troisièmement, nous présentons les "tissus de neurones convolutionnels" (Convolutional Neural Fabrics) permettant l’exploration de l’espace discret et exponentiellement large des architectures possibles de réseaux neuronaux, de manière efficiente et systématique. Au lieu de chercher à sélectionner une seule architecture optimale, nous proposons d’utiliser un "tissu" d’architectures combinant un nombre exponentiel d’architectures en une seule. Le tissu est une représentation 3D connectant les sorties de CNNs à différentes couches, échelles et canaux avec un motif de connectivité locale, homogène et creux. Les seuls hyper-paramètres du tissu (le nombre de canaux et de couches) ne sont pas critiques pour la performance. La nature acyclique du tissu nous permet d’utiliser la rétro-propagation du gradient durant la phase d’apprentissage. De manière automatique, nous pouvons donc configurer le tissu de manière à implémenter l’ensemble de toutes les architectures possibles (un nombre exponentiel) et, plus généralement, des ensembles (combinaisons) de ces modèles. La complexité de calcul et de taille mémoire du tissu évoluent de manière linéaire alors qu’il permet d’exploiter un nombre exponentiel d’architectures en parallèle, en partageant les paramètres entre architectures. Nous présentons des résultats à l’état de l’art pour la classification d’images sur le jeu de données MNIST et CIFAR10, et pour la segmentation sémantique sur le jeu de données Part Labels. / In this dissertation, we propose methods and data driven machine learning solutions which address and benefit from the recent overwhelming growth of digital media content.First, we consider the problem of improving the efficiency of image retrieval. We propose a coordinated local metric learning (CLML) approach which learns local Mahalanobis metrics, and integrates them in a global representation where the l2 distance can be used. This allows for data visualization in a single view, and use of efficient ` 2 -based retrieval methods. Our approach can be interpreted as learning a linear projection on top of an explicit high-dimensional embedding of a kernel. This interpretation allows for the use of existing frameworks for Mahalanobis metric learning for learning local metrics in a coordinated manner. Our experiments show that CLML improves over previous global and local metric learning approaches for the task of face retrieval.Second, we present an approach to leverage the success of CNN models forvisible spectrum face recognition to improve heterogeneous face recognition, e.g., recognition of near-infrared images from visible spectrum training images. We explore different metric learning strategies over features from the intermediate layers of the networks, to reduce the discrepancies between the different modalities. In our experiments we found that the depth of the optimal features for a given modality, is positively correlated with the domain shift between the source domain (CNN training data) and the target domain. Experimental results show the that we can use CNNs trained on visible spectrum images to obtain results that improve over the state-of-the art for heterogeneous face recognition with near-infrared images and sketches.Third, we present convolutional neural fabrics for exploring the discrete andexponentially large CNN architecture space in an efficient and systematic manner. Instead of aiming to select a single optimal architecture, we propose a “fabric” that embeds an exponentially large number of architectures. The fabric consists of a 3D trellis that connects response maps at different layers, scales, and channels with a sparse homogeneous local connectivity pattern. The only hyperparameters of the fabric (the number of channels and layers) are not critical for performance. The acyclic nature of the fabric allows us to use backpropagation for learning. Learning can thus efficiently configure the fabric to implement each one of exponentially many architectures and, more generally, ensembles of all of them. While scaling linearly in terms of computation and memory requirements, the fabric leverages exponentially many chain-structured architectures in parallel by massively sharing weights between them. We present benchmark results competitive with the state of the art for image classification on MNIST and CIFAR10, and for semantic segmentation on the Part Labels dataset Apprentissage de métriques locales Transfert d’apprentissage Réseaux neuronaux convolutionnels Apprentissage d’architectures Local metric learning Transfer learning Convolutional neural network Architecture learning 004
25	Transformace dat pomocí evolučních algoritmů / Evolutionary Algorithms for Data Transformation Švec, Ondřej January 2017 (has links) In this work, we propose a novel method for a supervised dimensionality reduc- tion, which learns weights of a neural network using an evolutionary algorithm, CMA-ES, optimising the success rate of the k-NN classifier. If no activation func- tions are used in the neural network, the algorithm essentially performs a linear transformation, which can also be used inside of the Mahalanobis distance. There- fore our method can be considered to be a metric learning algorithm. By adding activations to the neural network, the algorithm can learn non-linear transfor- mations as well. We consider reductions to low-dimensional spaces, which are useful for data visualisation, and demonstrate that the resulting projections pro- vide better performance than other dimensionality reduction techniques and also that the visualisations provide better distinctions between the classes in the data thanks to the locality of the k-NN classifier. 1
26	Prédiction structurée pour l’analyse de données séquentielles / Structured prediction for sequential data Lajugie, Rémi 18 September 2015 (has links) Dans cette thèse nous nous intéressons à des problèmes d’apprentissage automatique dans le cadre de sorties structurées avec une structure séquentielle. D’une part, nous considérons le problème de l’apprentissage de mesure de similarité pour deux tâches : (i) la détection de rupture dans des signaux multivariés et (ii) le problème de déformation temporelle entre paires de signaux. Les méthodes généralement utilisées pour résoudre ces deux problèmes dépendent fortement d’une mesure de similarité. Nous apprenons une mesure de similarité à partir de données totalement étiquetées. Nous présentons des algorithmes usuels de prédiction structuré, efficaces pour effectuer l’apprentissage. Nous validons notre approche sur des données réelles venant de divers domaines. D’autre part, nous nous intéressons au problème de la faible supervision pour la tâche d’alignement d’un enregistrement audio sur la partition jouée. Nous considérons la partition comme une représentation symbolique donnant (i) une information complète sur l’ordre des symboles et (ii) une information approximative sur la forme de l’alignement attendu. Nous apprenons un classifieur pour chaque symbole avec ces informations. Nous développons une méthode d’apprentissage fondée sur l’optimisation d’une fonction convexe. Nous démontrons la validité de l’approche sur des données musicales. / In this manuscript, we consider structured machine learning problems and consider more precisely the ones involving sequential structure. In a first part, we consider the problem of similarity measure learning for two tasks where sequential structure is at stake: (i) the multivariate change-point detection and (ii) the time warping of pairs of time series. The methods generally used to solve these tasks rely on a similarity measure to compare timestamps. We propose to learn a similarity measure from fully labelled data, i.e., signals already segmented or pairs of signals for which the optimal time warping is known. Using standard structured prediction methods, we present algorithmically efficient ways for learning. We propose to use loss functions specifically designed for the tasks. We validate our approach on real-world data. In a second part, we focus on the problem of weak supervision, in which sequential data are not totally labeled. We focus on the problem of aligning an audio recording with its score. We consider the score as a symbolic representation giving: (i) a complete information about the order of events or notes played and (ii) an approximate idea about the expected shape of the alignment. We propose to learn a classifier for each note using this information. Our learning problem is based onthe optimization of a convex function that takes advantage of the weak supervision and of the sequential structure of data. Our approach is validated through experiments on the task of audio-to-score on real musical data. Apprentissage Faible supervision Prédiction structurée Apprentissage de métrique Alignement musique sur partition Déformation temporelle Machine learning Weak supervision Structured prediction Metric learning Music to partition alignment Time warping 004
27	Automated Gait Analysis : Using Deep Metric Learning Engström, Isak January 2021 (has links) Sectors of security, safety, and defence require methods for identifying people on the individual level. Automation of these tasks has the potential of outperforming manual labor, as well as relieving workloads. The ever-extending surveillance camera networks, advances in human pose estimation from monocular cameras, together with the progress of deep learning techniques, pave the way for automated walking gait analysis as an identification method. This thesis investigates the use of 2D kinematic pose sequences to represent gait, monocularly extracted from a limited dataset containing walking individuals captured from five camera views. The sequential information of the gait is captured using recurrent neural networks. Techniques in deep metric learning are applied to evaluate two network models, with contrasting output dimensionalities, against deep-metric-, and non-deep-metric-based embedding spaces. The results indicate that the gait representation, network designs, and network learning structure show promise when identifying individuals, scaling particularly well to unseen individuals. However, with the limited dataset, the network models performed best when the dataset included the labels from both the individuals and the camera views simultaneously, contrary to when the data only contained the labels from the individuals without the information of the camera views. For further investigations, an extension of the data would be required to evaluate the accuracy and effectiveness of these methods, for the re-identification task of each individual. / <p>Examensarbetet är utfört vid Institutionen för teknik och naturvetenskap (ITN) vid Tekniska fakulteten, Linköpings universitet</p> machine learning deep learning deep metric learning artificial neural network recurrent neural network computer vision gait pose estimation Media and Communication Technology Medieteknik
28	Appariements collaboratifs des offres et demandes d’emploi / Collaborative Matching of Job Openings and Job Seekers Schmitt, Thomas 29 June 2018 (has links) Notre recherche porte sur la recommandation de nouvelles offres d'emploi venant d'être postées et n'ayant pas d'historique d'interactions (démarrage à froid). Nous adaptons les systèmes de recommandations bien connus dans le domaine du commerce électronique à cet objectif, en exploitant les traces d'usage de l'ensemble des demandeurs d'emploi sur les offres antérieures. Une des spécificités du travail présenté est d'avoir considéré des données réelles, et de s'être attaqué aux défis de l'hétérogénéité et du bruit des documents textuels. La contribution présentée intègre l'information des données collaboratives pour apprendre une nouvelle représentation des documents textes, requise pour effectuer la recommandation dite à froid d'une offre nouvelle. Cette représentation dite latente vise essentiellement à construire une bonne métrique. L'espace de recherche considéré est celui des réseaux neuronaux. Les réseaux neuronaux sont entraînés en définissant deux fonctions de perte. La première cherche à préserver la structure locale des informations collaboratives, en s'inspirant des approches de réduction de dimension non linéaires. La seconde s'inspire des réseaux siamois pour reproduire les similarités issues de la matrice collaborative. Le passage à l'échelle de l'approche et ses performances reposent sur l'échantillonnage des paires d'offres considérées comme similaires. L'intérêt de l'approche proposée est démontrée empiriquement sur les données réelles et propriétaires ainsi que sur le benchmark publique CiteULike. Enfin, l'intérêt de la démarche suivie est attesté par notre participation dans un bon rang au challenge international RecSys 2017 (15/100; un million d'utilisateurs pour un million d'offres). / Our research focuses on the recommendation of new job offers that have just been posted and have no interaction history (cold start). To this objective, we adapt well-knowns recommendations systems in the field of e-commerce by exploiting the record of use of all job seekers on previous offers. One of the specificities of the work presented is to have considered real data, and to have tackled the challenges of heterogeneity and noise of textual documents. The presented contribution integrates the information of the collaborative data to learn a new representation of text documents, which is required to make the so-called cold start recommendation of a new offer. The new representation essentially aims to build a good metric. The search space considered is that of neural networks. Neural networks are trained by defining two loss functions. The first seeks to preserve the local structure of collaborative information, drawing on non-linear dimension reduction approaches. The second is inspired by Siamese networks to reproduce the similarities from the collaborative matrix. The scaling up of the approach and its performance are based on the sampling of pairs of offers considered similar. The interest of the proposed approach is demonstrated empirically on the real and proprietary data as well as on the CiteULike public benchmark. Finally, the interest of the approach followed is attested by our participation in a good rank in the international challenge RecSys 2017 (15/100, with millions of users and millions of offers). Filtrage collaborative Système de recommandation Apprentissage de métrique Réseaux dde neurones Traitement des lagues naturelles Collaborative filtering Metric learning Recommender system Natural language processing Neutral network
29	Product Matching Using Image Similarity Forssell, Melker, Janér, Gustav January 2020 (has links) PriceRunner is an online shopping comparison company. To maintain up-todate prices, PriceRunner has to process large amounts of data every day. The processing of the data includes matching unknown products, referred to as offers, to known products. Offer data includes information about the product such as: title, description, price and often one image of the product. PriceRunner has previously implemented a textual-based machine learning (ML) model, but is also looking for new approaches to complement the current product matching system. The objective of this master’s thesis is to investigate the potential of using an image-based ML model for product matching. Our method uses a similarity learning approach where the network learns to recognise the similarity between images. To achieve this, a siamese neural network was trained with the triplet loss function. The network is trained to map similar images closer together and dissimilar images further apart in a vector space. This approach is often used for face recognition, where there is an extensive amount of classes and a limited amount of images per class, and new classes are frequently added. This is also the case for the image data used in this thesis project. A general model was trained on images from the Clothing and Accessories hierarchy, one of the 16 toplevel hierarchies at PriceRunner, consisting of 17 product categories. The results varied between each product category. Some categories proved to be less suitable for image-based classification while others excelled. The model handles new classes relatively well without any, or with briefer, retraining. It was concluded that there is potential in using images to complement the current product matching system at PriceRunner. AI ML Triplet Loss SNN Machine Learning Artificiall Intelligence Similarity Learning Metric Learning Computer Vision Product Matching Image Matching Computer and Information Sciences Data- och informationsvetenskap
30	Pushing the boundary of Semantic Image Segmentation Jain, Shipra January 2020 (has links) The state-of-the-art object detection and image classification methods can perform impressively on more than 9k classes. In contrast, the number of classes in semantic segmentation datasets are fairly limited. This is not surprising , when the restrictions caused by the lack of labeled data and high computation demand are considered. To efficiently perform pixel-wise classification for c number of classes, segmentation models use cross-entropy loss on c-channel output for each pixel. The computational demand for such prediction turns out to be a major bottleneck for higher number of classes. The major goal of this thesis is to reduce the number of channels of the output prediction, thus allowing to perform semantic segmentation with very high number of classes. The reduction of dimension has been approached using metric learning for the semantic feature space. The metric learning provides us the mapping from pixel to embedding with minimal, still sufficient, number of dimensions. Our proposed approximation of groundtruth class probability for cross entropy loss helps the model to place the embeddings of same class pixels closer, reducing inter-class variabilty of clusters and increasing intra-class variability. The model also learns a prototype embedding for each class. In loss function, these class embeddings behave as positive and negative samples for pixel embeddings (anchor). We show that given a limited computational memory and resources, our approach can be used for training a segmentation model for any number of classes. We perform all experiments on one GPU and show that our approach performs similar and in some cases slightly better than deeplabv3+ baseline model for Cityscapes and ADE20K dataset. We also perform experiments to understand trade-offs in terms of memory usage, inference time and performance metrics. Our work helps in alleviating the problem of computational complexity, thus paving the way for image segmentation task with very high number of semantic classes. / De ledande djupa inlärningsmetoderna inom objektdetektion och bildklassificering kan hantera väl över 9000 klasser. Inom semantisk segmentering är däremot antalet klasser begränsat för vanliga dataset. Detta är inte förvånande då det behövs mycket annoterad data och beräkningskraft. För att effektivt kunna göra en pixelvis klassificering av c klasser, använder segmenteringsmetoder den s.k. korsentropin över c sannolikhets värden för varje pixel för att träna det djupa nätverket. Beräkningskomplexiteten från detta steg är den huvudsakliga flaskhalsen för att kunna öka antalet klasser. Det huvudsakliga målet av detta examensarbete är att minska antalet kanaler i prediktionen av nätverket för att kunna prediktera semantisk segmentering även vid ett mycket högt antal klasser. För att åstadkomma detta används metric learning för att träna slutrepresentationen av nätet. Metric learning metoden låter oss träna en representation med ett minimalt, men fortfarande tillräckligt antal dimensioner. Vi föreslår en approximation av korsentropin under träning som låter modellen placera representationer från samma klass närmare varandra, vilket reducerar interklassvarians och öka intraklarrvarians. Modellen lär sig en prototyprepresentation för varje klass. För inkärningskostnadsfunktionen ses dessa prototyper som positiva och negativa representationer. Vi visar att vår metod kan användas för att träna en segmenteringsmodell för ett godtyckligt antal klasser givet begränsade minnes- och beräkningsresurser. Alla experiment genomförs på en GPU. Vår metod åstadkommer liknande eller något bättre segmenteringsprestanda än den ursprungliga deeplabv3+ modellen på Cityscapes och ADE20K dataseten. Vi genomför också experiment för att analysera avvägningen mellan minnesanvändning, beräkningstid och segmenteringsprestanda. Vår metod minskar problemet med beräkningskomplexitet, vilket banar väg för segmentering av bilder med ett stort antal semantiska klasser. Deep Learning computer vision semantic segmentation metric learning contrastive learning Djup lärning datorsyn semantisk segmentering metrisk inlärning kontrastivt lärande Elektroteknik och elektronik

Search results