• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 14
  • Tagged with
  • 15
  • 15
  • 7
  • 5
  • 5
  • 5
  • 5
  • 5
  • 5
  • 4
  • 4
  • 4
  • 4
  • 3
  • 3
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Exploring Entity Relationship in Pairwise Ranking: Adaptive Sampler and Beyond

Yu, Lu 12 1900 (has links)
Living in the booming age of information, we have to rely on powerful information retrieval tools to seek the unique piece of desired knowledge from such a big data world, like using personalized search engine and recommendation systems. As one of the core components, ranking model can appear in almost everywhere as long as we need a relative order of desired/relevant entities. Based on the most general and intuitive assumption that entities without user actions (e.g., clicks, purchase, comments) are of less interest than those with user actions, the objective function of pairwise ranking models is formulated by measuring the contrast between positive (with actions) and negative (without actions) entities. This contrastive relationship is the core of pairwise ranking models. The construction of these positive-negative pairs has great influence on the model inference accuracy. Especially, it is challenging to explore the entity relationships in heterogeneous information network. In this thesis, we aim at advancing the development of the methodologies and principles of mining heterogeneous information network through learning entity relations from a pairwise learning to rank optimization perspective. More specifically we first show the connections of different relation learning objectives modified from different ranking metrics including both pairwise and list-wise objectives. We prove that most of popular ranking metrics can be optimized in the same lower bound. Secondly, we propose the class-imbalance problem imposed by entity relation comparison in ranking objectives, and prove that class-imbalance problem can lead to frequency 5 clustering and gradient vanishment problems. As a response, we indicate out that developing a fast adaptive sampling method is very essential to boost the pairwise ranking model. To model the entity dynamic dependency, we propose to unify the individual-level interaction and union-level interactions, and result in a multi-order attentive ranking model to improve the preference inference from multiple views.
2

Self-supervised Learning for Efficient Object Detection / Självövervakat lärande för effektiv Objektdetektering

Berta, Benjamin István January 2021 (has links)
Self-supervised learning has become a prominent approach in pre-training Convolutional Neural Networks for computer vision. These methods are able to achieve state-of-the-art representation learning with unlabeled datasets. In this thesis, we apply Self-supervised Learning to the object detection problem. Previous methods have used large networks that are not suitable for embedded applications, so our goal was to train lightweight networks that can reach the accuracy of supervised learning. We used MoCo as a baseline for pre-training a ResNet-18 encoder and finetuned it on the COCO object detection task using a RetinaNet object detector. We evaluated our method based on the COCO evaluation metric with several additions to the baseline method. Our results show that lightweight networks can be trained by self-supervised learning and reach the accuracy of the supervised learning pre-training. / Självledd inlärning har blivit ett framträdande tillvägagångssätt vid träning av ”Convolutional Neural Networks” för datorseende. Dessa metoder kan uppnå topp prestanda med representationsinlärning med omärkta datamängder. I det här examensarbetet tillämpar vi Självledd inlärning på objektdetekteringsproblemet. Tidigare metoder har använt stora nätverk som inte är lämpliga för inbyggda applikationer, så vårt mål var att träna lättviktsnätverk som kan nå noggrannheten av ett tränat nätverk. Vi använde MoCo som basnivå för träning av en ResNet-18-kodare och finjusterade den på COCO-objektdetekteringsuppgiften med hjälp av en RetinaNet-objektdetektor. Vi utvärderade vår metod baserat på COCO-utvärderingsmåttet med flera tillägg till baslinjemetoden. Våra resultat visar att lättviktsnätverk kan tränas genom självledd inlärning och uppnå samma precisionen som för ett tränat nätverk.
3

Lists of potential diagnoses that final-year medical students need to consider: a modified Delphi study / 卒業時の医学生が想起すべき鑑別疾患候補リスト

Miyachi, Yuka 24 January 2022 (has links)
京都大学 / 新制・論文博士 / 博士(医学) / 乙第13461号 / 論医博第2248号 / 新制||医||1055(附属図書館) / (主査)教授 古川 壽亮, 教授 松村 由美, 教授 永井 洋士 / 学位規則第4条第2項該当 / Doctor of Medical Science / Kyoto University / DFAM
4

CLIP-RS: A Cross-modal Remote Sensing Image Retrieval Based on CLIP, a Northern Virginia Case Study

Djoufack Basso, Larissa 21 June 2022 (has links)
Satellite imagery research used to be an expensive research topic for companies and organizations due to the limited data and compute resources. As the computing power and storage capacity grows exponentially, a large amount of aerial and satellite images are generated and analyzed everyday for various applications. Current technological advancement and extensive data collection by numerous Internet of Things (IOT) devices and platforms have amplified labeled natural images. Such data availability catalyzed the development and performance of current state-of-the-art image classification and cross-modal models. Despite the abundance of publicly available remote sensing images, very few remote sensing (RS) images are labeled and even fewer are multi-captioned.These scarcities limit the scope of fine tuned state of the art models to at most 38 classes, based on the PatternNet data, one of the largest publicly available labeled RS data. Recent state-of-the art image-to-image retrieval and detection models in RS have shown great results. Because the text-to-image retrieval of RS images is still emerging, it still faces some challenges in the retrieval of those images.These challenges are based on the inaccurate retrieval of image categories that were not present in the training dataset and the retrieval of images from descriptive input. Motivated by those shortcomings in current cross-modal remote sensing image retrieval, we proposed CLIP-RS, a cross-modal remote sensing image retrieval platform. Our proposed framework CLIP-RS is a framework that combines a fine-tuned implementation of a recent state of the art cross-modal and text-based image retrieval model, Contrastive Language Image Pre-training (CLIP) and FAISS (Facebook AI similarity search), a library for efficient similarity search. Our implementation is deployed on a Web App for inference task on text-to-image and image-to-image retrieval of RS images collected via the Mapbox GL JS API. We used the free tier option of the Mapbox GL JS API and took advantage of its raster tiles option to locate the retrieved results on a local map, a combination of the downloaded raster tiles. Other options offered on our platform are: image similarity search, locating an image in the map, view images' geocoordinates and addresses.In this work we also proposed two remote sensing fine-tuned models and conducted a comparative analysis of our proposed models with a different fine-tuned model as well as the zeroshot CLIP model on remote sensing data. / Master of Science / Satellite imagery research used to be an expensive research topic for companies and organizations due to the limited data and compute resources. As the computing power and storage capacity grows exponentially, a large amount of aerial and satellite images are generated and analyzed everyday for various applications. Current technological advancement and extensive data collection by numerous Internet of Things (IOT) devices and platforms have amplified labeled natural images. Such data availability catalyzed the devel- opment and performance of current state-of-the-art image classification and cross-modal models. Despite the abundance of publicly available remote sens- ing images, very few remote sensing (RS) images are labeled and even fewer are multi-captioned.These scarcities limit the scope of fine tuned state of the art models to at most 38 classes, based on the PatternNet data,one of the largest publicly avail- able labeled RS data.Recent state-of-the art image-to-image retrieval and detection models in RS have shown great results. Because the text-to-image retrieval of RS images is still emerging, it still faces some challenges in the re- trieval of those images.These challenges are based on the inaccurate retrieval of image categories that were not present in the training dataset and the re- trieval of images from descriptive input. Motivated by those shortcomings in current cross-modal remote sensing image retrieval, we proposed CLIP-RS, a cross-modal remote sensing image retrieval platform.Cross-modal retrieval focuses on data retrieval across different modalities and in the context of this work, we focus on textual and imagery modalities.Our proposed frame- work CLIP-RS is a framework that combines a fine-tuned implementation of a recent state of the art cross-modal and text-based image retrieval model, Contrastive Language Image Pre-training (CLIP) and FAISS (Facebook AI similarity search), a library for efficient similarity search. In deep learning, the concept of fine tuning consists of using weights from a different model or algorithm into a similar model with different domain specific application. Our implementation is deployed on a Web Application for inference tasks on text-to-image and image-to-image retrieval of RS images collected via the Mapbox GL JS API. We used the free tier option of the Mapbox GL JS API and took advantage of its raster tiles option to locate the retrieved results on a local map, a combination of the downloaded raster tiles. Other options offered on our platform are: image similarity search, locating an image in the map, view images' geocoordinates and addresses.In this work we also pro- posed two remote sensing fine-tuned models and conducted a comparative analysis of our proposed models with a different fine-tuned model as well as the zeroshot CLIP model on remote sensing data. Detection models in RS have shown great results. Because the text-to-image retrieval of RS images is still emerging, it still faces some challenges in the re- trieval of those images.These challenges are based on the inaccurate retrieval of image categories that were not present in the training dataset and the re- trieval of images from descriptive input. Motivated by those shortcomings in current cross-modal remote sensing image retrieval, we proposed CLIP-RS, a cross-modal remote sensing image retrieval platform.Cross-modal retrieval focuses on data retrieval across different modalities and in the context of this work, we focus on textual and imagery modalities.Our proposed frame- work CLIP-RS is a framework that combines a fine-tuned implementation of a recent state of the art cross-modal and text-based image retrieval model, Contrastive Language Image Pre-training (CLIP) and FAISS (Facebook AI similarity search), a library for efficient similarity search. In deep learning, the concept of fine tuning consists of using weights from a different model or algorithm into a similar model with different domain specific application. Our implementation is deployed on a Web Application for inference tasks on text-to-image and image-to-image retrieval of RS images collected via the Mapbox GL JS API. We used the free tier option of the Mapbox GL JS API and took advantage of its raster tiles option to locate the retrieved results on a local map, a combination of the downloaded raster tiles. Other options offered on our platform are: image similarity search, locating an image in the map, view images' geocoordinates and addresses.In this work we also pro- posed two remote sensing fine-tuned models and conducted a comparative analysis of our proposed models with a different fine-tuned model as well as the zeroshot CLIP model on remote sensing data.
5

Improving BERTScore for Machine Translation Evaluation Through Contrastive Learning

Yousuf, Oreen January 2022 (has links)
Since the advent of automatic evaluation, tasks within Natural Language Processing (NLP), including Machine Translation, have been able to better utilize both time and labor resources. Later, multilingual pre-trained models (MLMs)have uplifted many languages’ capacity to participate in NLP research. Contextualized representations generated from these MLMs are both influential towards several downstream tasks and have inspired practitioners to better make sense of them. We propose the adoption of BERTScore, coupled with contrastive learning, for machine translation evaluation in lieu of BLEU - the industry leading metric. While BERTScore computes a similarity score for each token in a candidate and reference sentence, it does away with exact matches in favor of computing token similarity using contextual embeddings. We improve BERTScore via contrastive learning-based fine-tuning on MLMs. We use contrastive learning to improve BERTScore across different language pairs in both high and low resource settings (English-Hausa, English-Chinese), across three models (XLM-R, mBERT, and LaBSE) and across three domains (news,religious, combined). We also investigated both the effects of pairing relatively linguistically similar low-resource languages (Somali-Hausa), and data size on BERTScore and the corresponding Pearson correlation to human judgments. We found that reducing the distance between cross-lingual embeddings via contrastive learning leads to BERTScore having a substantially greater correlation to system-level human evaluation than BLEU for mBERT and LaBSE in all language pairs in multiple domains.
6

Information Extraction for Test Identification in Repair Reports in the Automotive Domain

Jie, Huang January 2023 (has links)
The knowledge of tests conducted on a problematic vehicle is essential for enhancing the efficiency of mechanics. Therefore, identifying the tests performed in each repair case is of utmost importance. This thesis explores techniques for extracting data from unstructured repair reports to identify component tests. The main emphasis is on developing a supervised multi-class classifier to categorize data and extract sentences that describe repair diagnoses and actions. It has been shown that incorporating a category-aware contrastive learning objective can improve the repair report classifier’s performance. The proposed approach involves training a sentence representation model based on a pre-trained model using a category-aware contrastive learning objective. Subsequently, the sentence representation model is further trained on the classification task using a loss function that combines the cross-entropy and supervised contrastive learning losses. By applying this method, the macro F1-score on the test set is increased from 90.45 to 90.73. The attempt to enhance the performance of the repair report classifier using a noisy data classifier proves unsuccessful. The noisy data classifier is trained using a prompt-based fine-tuning method, incorporating open-ended questions and two examples in the prompt. This approach achieves an F1-score of 91.09 and the resulting repair report classification datasets are found easier to classify. However, they do not contribute to an improvement in the repair report classifier’s performance. Ultimately, the repair report classifier is utilized to aid in creating the input necessary for identifying component tests. An information retrieval method is used to conduct the test identification. The incorporation of this classifier and the existing labels when creating queries leads to an improvement in the mean average precision at the top 3, 5, and 10 positions by 0.62, 0.81, and 0.35, respectively, although with a slight decrease of 0.14 at the top 1 position.
7

Matching Trades with Confirmations via Contrastive Learning : Asymmetric Contrastive Learning on Text Data / Applicering av kontrastinlärningsmetoder för att para ihop affärer med konfirmationer

Hector, Markus January 2023 (has links)
In the banking world trades of securities are finalized every day, on behalf of the banks themselves or of their clients. When the trades have been booked by the front office the confirmations sent by the counterparty have to be checked and connected to the correct trade by hand, posing the question whether this process could not be automated using machine learning techniques. There is no straightforward solution to this problem since the confirmations differ between counterparties, and can contain different enriched information or even be in different formats. This thesis addresses the problem of matching trades with their corresponding confirmations via deep learning methods. A model is trained using contrastive learning methods on generated pairs of trades and confirmations, with the goal of matching the pairs in the latent space by using nearest neighbor classification. Accuracy is measured by dividing the correctly classified samples by the total number of samples in a testing batch. The model achieves an accuracy as high as 97.8% over 100 trade-confirmation samples with a 30-dimensional latent space, and it is shown that similar contrastive methods can indeed be used in order to solve this problem. / Banker handlar varje dag med värdpapper av olika slag, antingen för sin egen vinning eller för sina kunders. När en affär har blivit beslutad mellan två parter så bokförs denna i bägge parternas interna system. En konfirmation kommer sedan skickas från den andra parten som manuellt måste paras ihop med affären vilket väcker frågan om huruvida detta inte kan automatiseras med hjälp av maskininlärning. Det finns inte en uppenbar lösning på detta problemet då konfirmationsmeddelandena kan skiljer sig åt mellan olika parter och kan innehålla olika tillagd information eller till och med vara i olika format. En model tränas genom att använda kontrast-inlärning på genererade par av affärer och konfirmationer av affärer för att kunna para ihop paren i det latenta rummet genom att se vilka grannar som ligger närmast. Nogrannheten mäts genom att dela antalet korrekt klassificerade exempel med det totala antalet par i en grupp test-par. Modellen uppnår en noggrannhet så hög som 97.8% på 100 affärs-konfirmationspar med ett 30-dimensionellt latent rum, och det visas att kontrast-inlärning kan användas för att lösa problemet. Det är dock svårt att säga mycket om hur väl modellen kan generalisera de inlärda kunskaperna eftersom träningsdatan behövde genereras och därför saknar en del av komplexiteten av ett äkte data set.
8

Pretraining a Neural Network for Hyperspectral Images Using Self-Supervised Contrastive Learning / Förträning av ett neuralt nätverk för hyperspektrala bilder baserat på självövervakad kontrastiv inlärning

Syrén Grönfelt, Natalie January 2021 (has links)
Hyperspectral imaging is an expanding topic within the field of computer vision, that uses images of high spectral granularity. Contrastive learning is a discrim- inative approach to self-supervised learning, a form of unsupervised learning where the network is trained using self-created pseudo-labels. This work com- bines these two research areas and investigates how a pretrained network based on contrastive learning can be used for hyperspectral images. The hyperspectral images used in this work are generated from simulated RGB images and spec- tra from a spectral library. The network is trained with a pretext task based on data augmentations, and is evaluated through transfer learning and fine-tuning for a downstream task. The goal is to determine the impact of the pretext task on the downstream task and to determine the required amount of labelled data. The results show that the downstream task (a classifier) based on the pretrained network barely performs better than a classifier without a pretrained network. In the end, more research needs to be done to confirm or reject the benefit of a pretrained network based on contrastive learning for hyperspectral images. Also, the pretrained network should be tested on real-world hyperspectral data and trained with a pretext task designed for hyperspectral images.
9

Pushing the boundary of Semantic Image Segmentation

Jain, Shipra January 2020 (has links)
The state-of-the-art object detection and image classification methods can perform impressively on more than 9k classes. In contrast, the number of classes in semantic segmentation datasets are fairly limited. This is not surprising , when the restrictions caused by the lack of labeled data and high computation demand are considered. To efficiently perform pixel-wise classification for c number of classes, segmentation models use cross-entropy loss on c-channel output for each pixel. The computational demand for such prediction turns out to be a major bottleneck for higher number of classes. The major goal of this thesis is to reduce the number of channels of the output prediction, thus allowing to perform semantic segmentation with very high number of classes. The reduction of dimension has been approached using metric learning for the semantic feature space. The metric learning provides us the mapping from pixel to embedding with minimal, still sufficient, number of dimensions. Our proposed approximation of groundtruth class probability for cross entropy loss helps the model to place the embeddings of same class pixels closer, reducing inter-class variabilty of clusters and increasing intra-class variability. The model also learns a prototype embedding for each class. In loss function, these class embeddings behave as positive and negative samples for pixel embeddings (anchor). We show that given a limited computational memory and resources, our approach can be used for training a segmentation model for any number of classes. We perform all experiments on one GPU and show that our approach performs similar and in some cases slightly better than deeplabv3+ baseline model for Cityscapes and ADE20K dataset. We also perform experiments to understand trade-offs in terms of memory usage, inference time and performance metrics. Our work helps in alleviating the problem of computational complexity, thus paving the way for image segmentation task with very high number of semantic classes. / De ledande djupa inlärningsmetoderna inom objektdetektion och bildklassificering kan hantera väl över 9000 klasser. Inom semantisk segmentering är däremot antalet klasser begränsat för vanliga dataset. Detta är inte förvånande då det behövs mycket annoterad data och beräkningskraft. För att effektivt kunna göra en pixelvis klassificering av c klasser, använder segmenteringsmetoder den s.k. korsentropin över c sannolikhets värden för varje pixel för att träna det djupa nätverket. Beräkningskomplexiteten från detta steg är den huvudsakliga flaskhalsen för att kunna öka antalet klasser. Det huvudsakliga målet av detta examensarbete är att minska antalet kanaler i prediktionen av nätverket för att kunna prediktera semantisk segmentering även vid ett mycket högt antal klasser. För att åstadkomma detta används metric learning för att träna slutrepresentationen av nätet. Metric learning metoden låter oss träna en representation med ett minimalt, men fortfarande tillräckligt antal dimensioner. Vi föreslår en approximation av korsentropin under träning som låter modellen placera representationer från samma klass närmare varandra, vilket reducerar interklassvarians och öka intraklarrvarians. Modellen lär sig en prototyprepresentation för varje klass. För inkärningskostnadsfunktionen ses dessa prototyper som positiva och negativa representationer. Vi visar att vår metod kan användas för att träna en segmenteringsmodell för ett godtyckligt antal klasser givet begränsade minnes- och beräkningsresurser. Alla experiment genomförs på en GPU. Vår metod åstadkommer liknande eller något bättre segmenteringsprestanda än den ursprungliga deeplabv3+ modellen på Cityscapes och ADE20K dataseten. Vi genomför också experiment för att analysera avvägningen mellan minnesanvändning, beräkningstid och segmenteringsprestanda. Vår metod minskar problemet med beräkningskomplexitet, vilket banar väg för segmentering av bilder med ett stort antal semantiska klasser.
10

Evaluating the effects of data augmentations for specific latent features : Using self-supervised learning / Utvärdering av effekterna av datamodifieringar på inlärda representationer : Vid självövervakande maskininlärning

Ingemarsson, Markus, Henningsson, Jacob January 2022 (has links)
Supervised learning requires labeled data which is cumbersome to produce, making it costly and time-consuming. SimCLR is a self-supervising framework that uses data augmentations to learn without labels. This thesis investigates how well cropping and color distorting augmentations work for two datasets, MPI3D and Causal3DIdent. The representations learned are evaluated using representation similarity analysis. The data augmentations were meant to make the model learn invariant representations of the object shape in the images regarding it as content while ignoring unnecessary features and regarding them as style. As a result, 8 models were created, models A-H. A and E were trained using supervised learning as a benchmark for the remaining self-supervised models. B and C learned invariant features of style instead of learning invariant representations of shape. Model D learned invariant representations of shape. Although, it also regarded style-related factors as content. Model F, G, and H managed to learn invariant representations of shape with varying intensities while regarding the rest of the features as style. The conclusion was that models can learn invariant representations of features related to content using self-supervised learning with the chosen augmentations. However, the augmentation settings must be suitable for the dataset. / Övervakad maskininlärning kräver annoterad data, vilket är dyrt och tidskrävande att producera. SimCLR är ett självövervakande maskininlärningsramverk som använder datamodifieringar för att lära sig utan annoteringar. Detta examensarbete utvärderar hur väl beskärning och färgförvrängande datamodifieringar fungerar för två dataset, MPI3D och Causal3DIdent. De inlärda representationerna utvärderas med hjälp av representativ likhetsanalys. Syftet med examensarbetet var att få de självövervakande maskininlärningsmodellerna att lära sig oföränderliga representationer av objektet i bilderna. Meningen med datamodifieringarna var att påverka modellens lärande så att modellen tolkar objektets form som relevant innehåll, men resterande egenskaper som icke-relevant innehåll. Åtta modeller skapades (A-H). A och E tränades med övervakad inlärning och användes som riktmärke för de självövervakade modellerna. B och C lärde sig oföränderliga representationer som bör ha betraktas som irrelevant istället för att lära sig form. Modell D lärde sig oföränderliga representationer av form men också irrelevanta representationer. Modellerna F, G och H lyckades lära sig oföränderliga representationer av form med varierande intensitet, samtidigt som de resterande egenskaperna betraktades som irrelevant. Beskärning och färgförvrängande datamodifieringarna gör således att självövervakande modeller kan lära sig oföränderliga representationer av egenskaper relaterade till relevant innehåll. Specifika inställningar för datamodifieringar måste dock vara lämpliga för datasetet.

Page generated in 0.1153 seconds