Spelling suggestions: "subject:"triplet los""
1 |
Visualization of interindividual differences in spinal dynamics in the presence of intraindividual variabilitiesDindorf, Carlo, Konradi, Jürgen, Wolf, Claudia, Taetz, Betram, Bleser, Gabriele, Huthwelker, Janine, Werthmann, Friederike, Bartaguiz, Eva, Drees, Philipp, Betz, Ulrich, Fröhlich, Michael 07 July 2022 (has links)
Surface topography systems enable the capture of
spinal dynamic movement. A visualization of possible unique
movement patterns appears to be difficult due to large intraclass and small inter-class variabilities. Therefore, we investigated
a visualization approach using Siamese neural networks (SNN)
and checked, if the identification of individuals is possible based
on dynamic spinal data. The presented visualization approach
seems promising in visualizing subjects in the presence of
intraindividual variability between different gait cycles as well
as day-to-day variability. Overall, the results indicate a possible
existence of a personal spinal ‘fingerprint’. The work forms the
basis for an objective comparison of subjects and the transfer of
the method to clinical use cases.
|
2 |
Re-identifikace graffiti tagů / Graffiti Tags Re-IdentificationPavlica, Jan January 2020 (has links)
This thesis focuses on the possibility of using current methods in the field of computer vision to re-identify graffiti tags. The work examines the possibility of using convolutional neural networks to re-identify graffiti tags, which are the most common type of graffiti. The work experimented with various models of convolutional neural networks, the most suitable of which was MobileNet using the triplet loss function, which managed to achieve a mAP of 36.02%.
|
3 |
Product Matching Using Image SimilarityForssell, Melker, Janér, Gustav January 2020 (has links)
PriceRunner is an online shopping comparison company. To maintain up-todate prices, PriceRunner has to process large amounts of data every day. The processing of the data includes matching unknown products, referred to as offers, to known products. Offer data includes information about the product such as: title, description, price and often one image of the product. PriceRunner has previously implemented a textual-based machine learning (ML) model, but is also looking for new approaches to complement the current product matching system. The objective of this master’s thesis is to investigate the potential of using an image-based ML model for product matching. Our method uses a similarity learning approach where the network learns to recognise the similarity between images. To achieve this, a siamese neural network was trained with the triplet loss function. The network is trained to map similar images closer together and dissimilar images further apart in a vector space. This approach is often used for face recognition, where there is an extensive amount of classes and a limited amount of images per class, and new classes are frequently added. This is also the case for the image data used in this thesis project. A general model was trained on images from the Clothing and Accessories hierarchy, one of the 16 toplevel hierarchies at PriceRunner, consisting of 17 product categories. The results varied between each product category. Some categories proved to be less suitable for image-based classification while others excelled. The model handles new classes relatively well without any, or with briefer, retraining. It was concluded that there is potential in using images to complement the current product matching system at PriceRunner.
|
4 |
Image generation through feature extraction and learning using a deep learning approachBruneel, Tibo January 2023 (has links)
With recent advancements, image generation has become more and more possible with the introduction of stronger generative artificial intelligence (AI) models. The idea and ability of generating non-existing images that highly resemble real world images is interesting for many use cases. Generated images could be used, for example, to augment, extend or replace real data sets for training AI models, therefore being capable of minimising costs on data collection and similar processes. Deep learning, a sub-field within the AI field has been on the forefront of such methodologies due to its nature of being able to capture and learn highly complex and feature-rich data. This work focuses on deep generative learning approaches within a forestry application, with the goal of generating tree log end images in order to enhance an AI model that uses such images. This approach would not only reduce costs of data collection for this model, but also many other information extraction models within the forestry field. This thesis study includes research on the state of the art within deep generative modelling and experiments using a full pipeline from a deep generative modelling stage to a log end recognition model. On top of this, a variant architecture and image sampling algorithm are proposed to add in this pipeline and evaluate its performance. The experiments and findings show that the applied generative model approaches show good feature learning, but lack the high-quality and realistic generation, resulting in more blurry results. The variant approach resulted in slightly better feature learning with a trade-off in generation quality. The proposed sampling algorithm proved to work well on a qualitative basis. The problems found in the generative models propagated further into the training of the recognition model, making the improvement of another AI model based on purely generated data impossible at this point in the research. The results of this research show that more work is needed on improving the application and generation quality to make it resemble real world data more, so that other models can be trained on artificial data. The variant approach does not improve much and its findings contribute to the field by proving its strengths and weaknesses, as with the proposed image sampling algorithm. At last this study provides a good starting point for research within this application, with many different directions and opportunities for future work.
|
5 |
Attribute Embedding for Variational Auto-Encoders : Regularization derived from triplet loss / Inbäddning av attribut för Variationsautokodare : Strukturering av det Latenta RummetE. L. Dahlin, Anton January 2022 (has links)
Techniques for imposing a structure on the latent space of neural networks have seen much development in recent years. Clustering techniques used for classification have been used to great success, and with this work we hope to bridge the gap between contrastive losses and Generative models. We introduce an embedding loss derived from Triplet loss to show that attributes and information can be clustered in specific dimensions in the latent space of Variational Auto-Encoders. This allows control over the embedded attributes via manipulation of these latent space dimensions. This work also serves to take steps towards the usage of any data augmentation when applying Triplet loss to Variational Auto-Encoders. In this work three different Variational Auto-Encoders are trained on three different datasets to embed information in three different ways using this novel method. Our results show the method working to varying degrees depending on the implementation and the information embedded. Two experiments using image data and one using waveform audio shows that the method is modality invariant. / Tekniker för att införa en struktur i det latenta utrymmet i neurala nätverk har sett mycket utveckling under de senaste åren. Kluster metoder som används för klassificering har använts till stor framgång, och med detta arbete hoppas vi kunna brygga gapet mellan kontrastiva förlustfunktioner och generativa modeller. Vi introducerar en förlustfunktion för inbäddning härledd från triplet loss för att visa att attribut och information kan klustras i specifika dimensioner i det latenta utrymmet hos variationsautokodare. Detta tillåter kontroll över de inbäddade attributen via manipulering av dessa dimensioner i latenta utrymmet. Detta arbete tjänar också till att ta steg mot användningen av olika data augmentationer när triplet loss tillämpas på generativa modeller. Tre olika Variationsautokodare tränas på tre olika dataset för att bädda in information på tre olika sätt med denna nya metod. Våra resultat visar att metoden fungerar i varierande grad beroende på hur den tillämpas och vilken information som inbäddas. Två experiment använder bild-data och ett använder sig av ljud, vilket visar på att metoden är modalitetsinvariant.
|
6 |
Improving Zero-Shot Learning via Distribution EmbeddingsChalumuri, Vivek January 2020 (has links)
Zero-Shot Learning (ZSL) for image classification aims to recognize images from novel classes for which we have no training examples. A common approach to tackling such a problem is by transferring knowledge from seen to unseen classes using some auxiliary semantic information of class labels in the form of class embeddings. Most of the existing methods represent image features and class embeddings as point vectors, and such vector representation limits the expressivity in terms of modeling the intra-class variability of the image classes. In this thesis, we propose three novel ZSL methods that represent image features and class labels as distributions and learn their corresponding parameters as distribution embeddings. Therefore, the intra-class variability of image classes is better modeled. The first model is a Triplet model, where image features and class embeddings are projected as Gaussian distributions in a common space, and their associations are learned by metric learning. Next, we have a Triplet-VAE model, where two VAEs are trained with triplet based distributional alignment for ZSL. The third model is a simple Probabilistic Classifier for ZSL, which is inspired by energy-based models. When evaluated on the common benchmark ZSL datasets, the proposed methods result in an improvement over the existing state-of-the-art methods for both traditional ZSL and more challenging Generalized-ZSL (GZSL) settings. / Zero-Shot Learning (ZSL) för bildklassificering syftar till att känna igen bilder från nya klasser som vi inte har några utbildningsexempel för. Ett vanligt tillvägagångssätt för att ta itu med ett sådant problem är att överföra kunskap från sett till osynliga klasser med hjälp av någon semantisk information om klassetiketter i form av klassinbäddningar. De flesta av de befintliga metoderna representerar bildfunktioner och klassinbäddningar som punktvektorer, och sådan vektorrepresentation begränsar uttrycksförmågan när det gäller att modellera bildklassernas variation inom klass. I denna avhandling föreslår vi tre nya ZSL-metoder som representerar bildfunktioner och klassetiketter som distributioner och lär sig deras motsvarande parametrar som distributionsinbäddningar. Därför är bildklassernas variation inom klass bättre modellerad. Den första modellen är en Triplet-modell, där bildfunktioner och klassinbäddningar projiceras som Gaussiska fördelningar i ett gemensamt utrymme, och deras föreningar lärs av metrisk inlärning. Därefter har vi en Triplet-VAE-modell, där två VAEs tränas med tripletbaserad fördelningsinriktning för ZSL. Den tredje modellen är en enkel Probabilistic Classifier för ZSL, som är inspirerad av energibaserade modeller. När de utvärderas på de vanliga ZSLdatauppsättningarna, resulterar de föreslagna metoderna i en förbättring jämfört med befintliga toppmoderna metoder för både traditionella ZSL och mer utmanande Generalized-ZSL (GZSL) -inställningar.
|
7 |
GPS-Free UAV Geo-Localization Using a Reference 3D DatabaseKarlsson, Justus January 2022 (has links)
The goal of this thesis has been global geolocalization using only visual input and a 3D database for reference. In recent years Convolutional Neural Networks (CNNs) have seen huge success in the task of classifying images. The flattened tensors at the final layers of a CNN can be viewed as vectors describing different input image features. Two networks were trained so that satellite and aerial images taken from different views of the same location had feature vectors that were similar. The networks were also trained so that images taken from different locations had different feature vectors. After training, the position of a given aerial image can then be estimated by finding the satellite image with a feature vector that is the most similar to that of the aerial image. A previous method called Where-CNN was used as a baseline model. Batch-Hard triplet loss, the Adam optimizer, and a different CNN backbone were tested as possible augmentations to this method. The models were trained on 2640 different locations in Linköping and Norrköping. The models were then tested on a sequence of 4411 query images along a path in Jönköping. The search region had 1449 different locations constituting a total area of 24km2. In Top-1% accuracy, there was a significant improvement over the baseline, increasing from 61.62% accuracy to 88.62%. The environment was modeled as a Hidden Markov Model to filter the sequence of guesses. The Viterbi algorithm was then used to find the most probable path. This filtering procedure reduced the average error along the path from 2328.0 m to just 264.4 m for the best model. Here the baseline had an average error of 563.0 m after filtering. A few different 3D methods were also tested. One drawback was that no pretrained weights existed for these models, as opposed to the 2D models, which were pretrained on the ImageNet dataset. The best 3D model achieved a Top-1% accuracy of 70.41%. It should be noted that the best 2D model without using any pretraining achieved a lower Top-1% accuracy of 49.38%. In addition, a 3D method for efficiently doing convolution on sparse 3D data was presented. Compared to the straight-forward method, it was almost 2.5 times faster while still having comparable accuracy at individual query prediction. While there was a significant improvement over the baseline, it was not significant enough to provide reliable and accurate localization for individual images. For global navigation, using the entire Earth as search space, the information in a 2D image might not be enough to be uniquely identifiable. However, the 3D CNN techniques tested did not improve the results of the pretrained 2D models. The use of more data and experimentation with different 3D CNN architectures is a direction in which further research would be exciting.
|
Page generated in 0.0781 seconds