• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 18
  • 1
  • Tagged with
  • 19
  • 19
  • 10
  • 9
  • 9
  • 8
  • 7
  • 7
  • 6
  • 6
  • 6
  • 6
  • 6
  • 5
  • 4
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

REFT: Resource-Efficient Federated Training Framework for Heterogeneous and Resource-Constrained Environments

Desai, Humaid Ahmed Habibullah 22 November 2023 (has links)
Federated Learning (FL) is a sub-domain of machine learning (ML) that enforces privacy by allowing the user's local data to reside on their device. Instead of having users send their personal data to a server where the model resides, FL flips the paradigm and brings the model to the user's device for training. Existing works share model parameters or use distillation principles to address the challenges of data heterogeneity. However, these methods ignore some of the other fundamental challenges in FL: device heterogeneity and communication efficiency. In practice, client devices in FL differ greatly in their computational power and communication resources. This is exacerbated by unbalanced data distribution, resulting in an overall increase in training times and the consumption of more bandwidth. In this work, we present a novel approach for resource-efficient FL called emph{REFT} with variable pruning and knowledge distillation techniques to address the computational and communication challenges faced by resource-constrained devices. Our variable pruning technique is designed to reduce computational overhead and increase resource utilization for clients by adapting the pruning process to their individual computational capabilities. Furthermore, to minimize bandwidth consumption and reduce the number of back-and-forth communications between the clients and the server, we leverage knowledge distillation to create an ensemble of client models and distill their collective knowledge to the server. Our experimental results on image classification tasks demonstrate the effectiveness of our approach in conducting FL in a resource-constrained environment. We achieve this by training Deep Neural Network (DNN) models while optimizing resource utilization at each client. Additionally, our method allows for minimal bandwidth consumption and a diverse range of client architectures while maintaining performance and data privacy. / Master of Science / In a world driven by data, preserving privacy while leveraging the power of machine learning (ML) is a critical challenge. Traditional approaches often require sharing personal data with central servers, raising concerns about data privacy. Federated Learning (FL), is a cutting-edge solution that turns this paradigm on its head. FL brings the machine learning model to your device, allowing it to learn from your data without ever leaving your device. While FL holds great promise, it faces its own set of challenges. Existing research has largely focused on making FL work with different types of data, but there are still other issues to be resolved. Our work introduces a novel approach called REFT that addresses two critical challenges in FL: making it work smoothly on devices with varying levels of computing power and reducing the amount of data that needs to be transferred during the learning process. Imagine your smartphone and your laptop. They all have different levels of computing power. REFT adapts the learning process to each device's capabilities using a proposed technique called Variable Pruning. Think of it as a personalized fitness trainer, tailoring the workout to your specific fitness level. Additionally, we've adopted a technique called knowledge distillation. It's like a student learning from a teacher, where the teacher shares only the most critical information. In our case, this reduces the amount of data that needs to be sent across the internet, saving bandwidth and making FL more efficient. Our experiments, which involved training machines to recognize images, demonstrate that REFT works well, even on devices with limited resources. It's a step forward in ensuring your data stays private while still making machine learning smarter and more accessible.
2

Efficient Recycling Of Non-Ferrous Materials Using Cross-Modal Knowledge Distillation

Brundin, Sebastian, Gräns, Adam January 2021 (has links)
This thesis investigates the possibility of utilizing data from multiple modalities to enable an automated recycling system to separate ferrous from non-ferrous debris. The two methods sensor fusion and hallucinogenic sensor fusion were implemented in a four-step approach of deep CNNs. Sensor fusion implies that multiple modalities are run simultaneously during the operation of the system.The individual outputs are further fused, and the joint performance expects to be superior to having only one of the sensors. In hallucinogenic sensor fusion, the goal is to achieve the benefits of sensor fusion in respect to cost and complexity even when one of the modalities is reduced from the system. This is achieved by leveraging data from a more complex modality onto a simpler one in a student/teacher approach. As a result, the teacher modality will train the student sensor to hallucinate features beyond its visual spectra. Based on the results of a performed prestudy involving multiple types of modalities, a hyperspectral sensor was deployed as the teacher to complement a simple RGB camera. Three studies involving differently composed datasets were further conducted to evaluate the effectiveness of the methods. The results show that the joint performance of a hyperspectral sensor and an RGB camera is superior to both individual dispatches. It can also be concluded that training a network with hyperspectral images can improve the classification accuracy when operating with only RGB data. However, the addition of a hyperspectral sensor might be considered as superfluous as this report shows that the standardized shapes of industrial debris enable a single RGB to achieve an accuracy above 90%. The material used in this thesis can also be concluded to be suboptimal for hyperspectral analysis. Compared to the vegetation scenes, only a limited amount of additional data could be obtained by including wavelengths besides the ones representing red, green and blue.
3

Enhancing Object Detection Methods by Knowledge Distillation for Automotive Driving in Real-World Settings

Kian, Setareh 07 August 2023 (has links)
No description available.
4

Compressing Deep Learning models for Natural Language Understanding

Ait Lahmouch, Nadir January 2022 (has links)
Uppgifter för behandling av naturliga språk (NLP) har under de senaste åren visat sig vara särskilt effektiva när man använder förtränade språkmodeller som BERT. Det enorma kravet på datorresurser som krävs för att träna sådana modeller gör det dock svårt att använda dem i verkligheten. För att lösa detta problem har komprimeringsmetoder utvecklats. I det här projektet studeras, genomförs och testas några av dessa metoder för komprimering av neurala nätverk för textbearbetning. I vårt fall var den mest effektiva metoden Knowledge Distillation, som består i att överföra kunskap från ett stort neuralt nätverk, som kallas läraren, till ett litet neuralt nätverk, som kallas eleven. Det finns flera varianter av detta tillvägagångssätt, som skiljer sig åt i komplexitet. Vi kommer att titta på två av dem i det här projektet. Den första gör det möjligt att överföra kunskap mellan ett neuralt nätverk och en mindre dubbelriktad LSTM, genom att endast använda resultatet från den större modellen. Och en andra, mer komplex metod som uppmuntrar elevmodellen att också lära sig av lärarmodellens mellanliggande lager för att utvinna kunskap. Det slutliga målet med detta projekt är att ge företagets datavetare färdiga komprimeringsmetoder för framtida projekt som kräver användning av djupa neurala nätverk för NLP. / Natural language processing (NLP) tasks have proven to be particularly effective when using pre-trained language models such as BERT. However, the enormous demand on computational resources required to train such models makes their use in the real world difficult. To overcome this problem, compression methods have emerged in recent years. In this project, some of these neural network compression approaches for text processing are studied, implemented and tested. In our case, the most efficient method was Knowledge Distillation, which consists in transmitting knowledge from a large neural network, called teacher, to a small neural network, called student. There are several variants of this approach, which differ in their complexity. We will see two of them in this project, the first one which allows a knowledge transfer between any neural network and another smaller bidirectional LSTM, using only the output of the larger model. And a second, more complex approach that encourages the student model to also learn from the intermediate layers of the teacher model for incremental knowledge extraction. The ultimate goal of this project is to provide the company’s data scientists with ready-to-use compression methods for their future projects requiring the use of deep neural networks for NLP.
5

Knowledge Distillation of DNABERT for Prediction of Genomic Elements / Kunskapsdestillation av DNABERT för prediktion av genetiska attribut

Palés Huix, Joana January 2022 (has links)
Understanding the information encoded in the human genome and the influence of each part of the DNA sequence is a fundamental problem of our society that can be key to unveil the mechanism of common diseases. With the latest technological developments in the genomics field, many research institutes have the tools to collect massive amounts of genomic data. Nevertheless, there is a lack of tools that can be used to process and analyse these datasets in a biologically reliable and efficient manner. Many deep learning solutions have been proposed to solve current genomic tasks, but most of the times the main research interest is in the underlying biological mechanisms rather than high scores of the predictive metrics themselves. Recently, state-of-the-art in deep learning has shifted towards large transformer models, which use an attention mechanism that can be leveraged for interpretability. The main drawbacks of these large models is that they require a lot of memory space and have high inference time, which may make their use unfeasible in practical applications. In this work, we test the appropriateness of knowledge distillation to obtain more usable and equally performing models that genomic researchers can easily fine-tune to solve their scientific problems. DNABERT, a transformer model pre-trained on DNA data, is distilled following two strategies: DistilBERT and MiniLM. Four student models with different sizes are obtained and fine-tuned for promoter identification. They are evaluated in three key aspects: classification performance, usability and biological relevance of the predictions. The latter is assessed by visually inspecting the attention maps of TATA-promoter predictions, which are expected to have a peak of attention at the well-known TATA motif present in these sequences. Results show that is indeed possible to obtain significantly smaller models that are equally performant in the promoter identification task without any major differences between the two techniques tested. The smallest distilled model experiences less than 1% decrease in all performance metrics evaluated (accuracy, F1 score and Matthews Correlation Coefficient) and an increase in the inference speed by 7.3x, while only having 15% of the parameters of DNABERT. The attention maps for the student models show that they successfully learn to mimic the general understanding of the DNA that DNABERT possesses.
6

Energy-efficient Neuromorphic Computing for Resource-constrained Internet of Things Devices

Liu, Shiya 03 November 2023 (has links)
Due to the limited computation and storage resources of Internet of Things (IoT) devices, many emerging intelligent applications based on deep learning techniques heavily depend on cloud computing for computation and storage. However, cloud computing faces technical issues with long latency, poor reliability, and weak privacy, resulting in the need for on-device computation and storage. Also, on-device computation is essential for many time-critical applications, which require real-time data processing and energy-efficient. Furthermore, the escalating requirements for on-device processing are driven by network bandwidth limitations and consumer anticipations concerning data privacy and user experience. In the realm of computing, there is a growing interest in exploring novel technologies that can facilitate ongoing advancements in performance. Of the various prospective avenues, the field of neuromorphic computing has garnered significant recognition as a crucial means to achieve fast and energy-efficient machine intelligence applications for IoT devices. The programming of neuromorphic computing hardware typically involves the construction of a spiking neural network (SNN) capable of being deployed onto the designated neuromorphic hardware. This dissertation presents a range of methodologies aimed at enhancing the precision and energy efficiency of SNNs. To be more precise, these advancements are achieved by incorporating four essential methods. The first method is the quantization of neural networks through knowledge distillation. This work introduces a quantization technique that effectively reduces the computational and storage resource requirements of a model while minimizing the loss of accuracy. To further enhance the reduction of quantization errors, the second method introduces a novel quantization-aware training algorithm specifically designed for training quantized spiking neural network (SNN) models intended for execution on the Loihi chip, a specialized neuromorphic computing chip. SNNs generally exhibit lower accuracy performance compared to deep neural networks (DNNs). The third approach introduces a DNN-SNN co-learning algorithm, which enhances the performance of SNN models by leveraging knowledge obtained from DNN models. The design of the neural architecture plays a vital role in enhancing the accuracy and energy efficiency of an SNN model. The fourth method presents a novel neural architecture search algorithm specifically tailored for SNNs on the Loihi chip. The method selects an optimal architecture based on gradients induced by the architecture at initialization across different data samples without the need for training the architecture. To demonstrate the effectiveness and performance across diverse machine intelligence applications, our methods are evaluated through (i) image classification, (ii) spectrum sensing, and (iii) modulation symbol detection. / Doctor of Philosophy / In the emerging Internet of Things (IoT), our everyday devices, from smart home gadgets to wearables, can autonomously make intelligent decisions. However, due to their limited computing power and storage, many IoT devices heavily depend on cloud computing, which brings along issues like slow response times, privacy concerns, and unreliable connections. Neuromorphic computing is a recognized and crucial approach for achieving fast and energy-efficient machine intelligence applications in IoT devices. Inspired by the human brain's neural networks, this cutting-edge approach allows devices to perform complex tasks efficiently and in real-time. The programming of this neuromorphic hardware involves creating spiking neural networks (SNNs). This dissertation presents several innovative methods to improve the precision and energy efficiency of these SNNs. Firstly, a technique called "quantization" reduces the computational and storage requirements of models without sacrificing accuracy. Secondly, a unique training algorithm is designed to enhance the performance of SNN models. Thirdly, a clever co-learning algorithm allows SNN models to learn from traditional deep neural networks (DNNs), further improving their accuracy. Lastly, a novel neural architecture search algorithm finds the best architecture for SNNs on the designated neuromorphic chip, without the need for extensive training. By making IoT devices smarter and more efficient, neuromorphic computing brings us closer to a world where our gadgets can perform intelligent tasks independently, enhancing convenience and privacy for users across the globe.
7

Knowledge Distillation for Semantic Segmentation and Autonomous Driving. : Astudy on the influence of hyperparameters, initialization of a student network and the distillation method on the semantic segmentation of urban scenes.

Sanchez Nieto, Juan January 2022 (has links)
Reducing the size of a neural network whilst maintaining a comparable performance is an important problem to be solved since the constrictions on resources of small devices make it impossible to deploy large models in numerous real-life scenarios. A prominent example is autonomous driving, where computer vision tasks such as object detection and semantic segmentation need to be performed in real time by mobile devices. In this thesis, the knowledge and spherical knowledge distillation techniques are utilized to train a small model (PSPNet50) under the supervision of a large model (PSPNet101) in order to perform semantic segmentation of urban scenes. The importance of the distillation hyperparameters is studied first, namely the influence of the temperature and the weights of the loss function on the performance of the distilled model, showing no decisive advantage over the individual training of the student. Thereafter, distillation is performed utilizing a pretrained student, revealing a good improvement in performance. Contrary to expectations, the pretrained student benefits from a high learning rate when training resumes under distillation, especially in the spherical knowledge distillation case, displaying a superior and more stable performance when compared to the regular knowledge distillation setting. These findings are validated by several experiments conducted using the Cityscapes dataset. The best distilled model achieves 87.287% pixel accuracy and a 42.0% mean Intersection-Over-Union value (mIoU) on the validation set, higher than the 86.356% pixel accuracy and 39.6% mIoU obtained by the baseline student. On the test set, the official evaluation obtained by submission to the Cityscapes website yields 42.213% mIoU for the distilled model and 41.085% for the baseline student. / Att minska storleken på ett neuralt nätverk med bibehållen prestanda är ett viktigt problem som måste lösas, eftersom de begränsade resurserna i små enheter gör det omöjligt att använda stora modeller i många verkliga situationer. Ett framträdande exempel är autonom körning, där datorseende uppgifter som objektsdetektering och semantisk segmentering måste utföras i realtid av mobila enheter. I den här avhandlingen används tekniker för destillation av kunskap och sfärisk kunskap för att träna en liten modell (PSPNet50) under övervakning av en stor modell (PSPNet101) för att utföra semantisk segmentering av stadsscener. Betydelsen av hyperparametrarna för destillation studeras först, nämligen temperaturens och förlustfunktionens vikter för den destillerade modellens prestanda, vilket inte visar någon avgörande fördel jämfört med individuell träning av eleven. Därefter utförs destillation med hjälp av en utbildad elev, vilket visar på en god förbättring av prestanda. Tvärtemot förväntningarna har den utbildade eleven en hög inlärningshastighet när utbildningen återupptas under destillation, särskilt i fallet med sfärisk kunskapsdestillation, vilket ger en överlägsen och stabilare prestanda jämfört med den vanliga kunskapsdestillationssituationen. Dessa resultat bekräftas av flera experiment som utförts med hjälp av datasetet Cityscapes. Den bästa destillerade modellen uppnår 87.287% pixelprecision och ett 42.0% medelvärde för skärning över union (mIoU) på valideringsuppsättningen, vilket är högre än de 86.356% pixelprecision och 39.6% mIoU som uppnåddes av grundstudenten. I testuppsättningen ger den officiella utvärderingen som gjordes på webbplatsen Cityscapes 42.213% mIoU för den destillerade modellen och 41.085% för grundstudenten.
8

Exploration of Knowledge Distillation Methods on Transformer Language Models for Sentiment Analysis / Utforskning av metoder för kunskapsdestillation på transformatoriska språkmodeller för analys av känslor

Liu, Haonan January 2022 (has links)
Despite the outstanding performances of the large Transformer-based language models, it proposes a challenge to compress the models and put them into the industrial environment. This degree project explores model compression methods called knowledge distillation in the sentiment classification task on Transformer models. Transformers are neural models having stacks of identical layers. In knowledge distillation for Transformer, a student model with fewer layers will learn to mimic intermediate layer vectors from a teacher model with more layers by designing and minimizing loss. We implement a framework to compare three knowledge distillation methods: MiniLM, TinyBERT, and Patient-KD. Student models produced by the three methods are evaluated by accuracy score on the SST-2 and SemEval sentiment classification dataset. The student models’ attention matrices are also compared with the teacher model to find the best student model for capturing dependencies in the input sentences. The comparison results show that the distillation method focusing on the Attention mechanism can produce student models with better performances and less variance. We also discover the over-fitting issue in Knowledge Distillation and propose a Two-Step Knowledge Distillation with Transformer Layer and Prediction Layer distillation to alleviate the problem. The experiment results prove that our method can produce robust, effective, and compact student models without introducing extra data. In the future, we would like to extend our framework to support more distillation methods on Transformer models and compare performances in tasks other than sentiment classification. / Trots de stora transformatorbaserade språkmodellernas enastående prestanda är det en utmaning att komprimera modellerna och använda dem i en industriell miljö. I detta examensarbete undersöks metoder för modellkomprimering som kallas kunskapsdestillation i uppgiften att klassificera känslor på Transformer-modeller. Transformers är neurala modeller med staplar av identiska lager. I kunskapsdestillation för Transformer lär sig en elevmodell med färre lager att efterlikna mellanliggande lagervektorer från en lärarmodell med fler lager genom att utforma och minimera förluster. Vi genomför en ram för att jämföra tre metoder för kunskapsdestillation: MiniLM, TinyBERT och Patient-KD. Elevmodeller som produceras av de tre metoderna utvärderas med hjälp av noggrannhetspoäng på datasetet för klassificering av känslor SST-2 och SemEval. Elevmodellernas uppmärksamhetsmatriser jämförs också med den från lärarmodellen för att ta reda på vilken elevmodell som är bäst för att fånga upp beroenden i de inmatade meningarna. Jämförelseresultaten visar att destillationsmetoden som fokuserar på uppmärksamhetsmekanismen kan ge studentmodeller med bättre prestanda och mindre varians. Vi upptäcker också problemet med överanpassning i kunskapsdestillation och föreslår en tvåstegs kunskapsdestillation med transformatorskikt och prediktionsskikt för att lindra problemet. Experimentresultaten visar att vår metod kan producera robusta, effektiva och kompakta elevmodeller utan att införa extra data. I framtiden vill vi utöka vårt ramverk för att stödja fler destillationmetoder på Transformer-modeller och jämföra prestanda i andra uppgifter än sentimentklassificering.
9

Deep Ensembles for Self-Training in NLP / Djupa Ensembler för Självträninig inom Datalingvistik

Alness Borg, Axel January 2022 (has links)
With the development of deep learning methods the requirement of having access to large amounts of data has increased. In this study, we have looked at methods for leveraging unlabeled data while only having access to small amounts of labeled data, which is common in real-world scenarios. We have investigated a method called self-training for leveraging the unlabeled data when training a model. It works by training a teacher model on the labeled data that then labels the unlabeled data for a student model to train on. A popular method in machine learning is ensembling which is a way of improving a single model by combining multiple models. With previous studies mainly focusing on self-training with image data and showing that ensembles can successfully be used for images, we wanted to see if the same applies to text data. We mainly focused on investigating how ensembles can be used as teachers for training a single student model. This was done by creating different ensemble models and comparing them against the individual members in the ensemble. The results showed that ensemble do not necessarily improves the accuracy of the student model over a single model but in certain cases when used correctly they can provide benefits. We found that depending on the dataset bagging BERT models can perform the same or better than a larger BERT model and this translates to the student model. Bagging multiple smaller models also has the benefit of being easier to scale and more computationally efficient to train in comparison to scaling a single model. / Med utvecklingen av metoder för djupinlärning har kravet på att ha tillgång till stora mängder data ökat som är vanligt i verkliga scenarier. I den här studien har vi tittat på metoder för att utnytja oannoterad data när vi bara har tillgång till små mängder annoterad data. Vi har undersökte en metod som kallas självträning för att utnytja oannoterd data när man tränar en modell. Det fungerar genom att man tränar en lärarmodell på annoterad data som sedan annoterar den oannoterade datan för en elevmodell att träna på. En populär metod inom maskininlärning är ensembling som är en teknik för att förbättra en ensam modell genom att kombinera flera modeller. Tidigare studier har främst inriktade på självträning med bilddata och visat att ensembler framgångsrikt kan användas för bild data, vill vi se om detsamma gäller för textdata. Vi fokuserade främst på att undersöka hur ensembler kan användas som lärare för att träna en enskild elevmodell. Detta gjordes genom att skapa olika ensemblemodeller och jämföra dem med de enskilda medlemmarna i ensemblen. Resultaten visade att ensembler inte nödvändigtvis förbättrar elevmodellens noggrannhet jämfört med en enda modell, men i vissa fall kan de ge fördelar när de används på rätt sätt. Vi fann att beroende på datasetet kan bagging av BERT-modeller prestera likvärdigt eller bättre än en större BERT-modell och detta översätts även till studentmodellen prestandard. Att använda bagging av flera mindre modeller har också fördelen av att de är lättare att skala up och mer beräkningseffektivt att träna i jämförelse med att skala up en enskild modell.
10

Distilling Multilingual Transformer Models for Efficient Document Retrieval : Distilling multi-Transformer models with distillation losses involving multi-Transformer interactions / Destillering av flerspråkiga transformatormodeller för effektiv dokumentsökning : Destillering av modeller med flera transformatorer med destilleringsförluster som involverar interaktioner mellan flera transformatorer

Liu, Xuecong January 2022 (has links)
Open Domain Question Answering (OpenQA) is a task concerning automatically finding answers to a query from a given set of documents. Language-agnostic OpenQA is an increasingly important research area in the globalised world, where the answers can be in a different language from the question. An OpenQA system generally consists of a document retriever to retrieve relevant passages and a reader to extract answers from the passages. Large Transformers, such as Dense Passage Retrieval (DPR) models, have achieved state-of-the-art performances in document retrievals, but they are computationally expensive in production. Knowledge Distillation (KD) is an effective way to reduce the size and increase the speed of Transformers while retaining their performances. However, most existing research focuses on distilling single Transformer models, instead of multi-Transformer models, as in the case of DPR. This thesis project uses MiniLM and DistilBERT distillation methods, two of the most successful methods to distil the BERT model, to individually distil the passage and query model of a fined-tuned DPR model comprised of two pretrained MPNet models. In addition, the project proposes and tests Embedding Similarity Loss (ESL), a distillation loss designed for the interaction between the passage and query models in DPR architecture. The results show that using ESL results in better students than using MiniLM or DistilBERT loss alone and that combining ESL with any of the other two losses increases their student models’ performances in most cases, especially when training on Information-Seeking Question Answering in Typologically Diverse Languages (TyDi QA) instead of The Stanford Question Answering Dataset 1.1 (SQuAD 1.1). The best resulting 6-layer student DPR model retained more than 90% of the recall and Mean Average Precision (MAP) in Cross-Lingual Transfer (XLT) tasks while reducing the inference time to 63.2%. In Generalised Cross-Lingual Transfer (G-XLT) tasks, it retained only around 42% of the recall and MAP using 53.8% of the inference time. / Domänlöst frågebesvarande är en uppgift som handlar om att automatiskt hitta svar på en fråga från en given uppsättning av dokument. Språkagnostiska domänlöst frågebesvarande är ett allt viktigare forskningsområde i den globaliserade världen, där svaren kan vara på ett annat språk än själva frågan. Ett domänlöst frågebesvarande-system består i allmänhet av en dokumenthämtare som plockar relevanta textavsnitt och en läsare som extraherar svaren från dessa textavsnitt. Stora transformatorer, såsom DPR-modeller (Dense Passage Retrieval), har uppnått toppresultat i dokumenthämtning, men de är beräkningsmässigt dyra i produktion. KD (Knowledge Distillation) är ett effektivt sätt att minska storleken och öka hastigheten hos transformatorer samtidigt som deras prestanda bibehålls. För det mesta är den existerande forskningen dock inriktad på att destillera enstaka transformatormodeller i stället för modeller med flera transformatorer, som i fallet med DPR. I det här examensarbetet används MiniLM- och DistilBERT-destilleringsmetoderna, två av de mest framgångsrika metoderna för att destillera BERT-modellen, för att individuellt destillera text- och frågemodellen i en finjusterad DPRmodell som består av två förinlärda MPNet-modeller. Dessutom föreslås och testas ESL (Embedding Similarity Loss), en destilleringsförlust som är utformad för interaktionen mellan text- och frågemodellerna i DPRarkitekturen. Resultaten visar att användning av ESL resulterar i bättre studenter än om man enbart använder MiniLM eller DistilBERT-förlusten och att kombinationen ESL med någon av de andra två förlusterna ökar deras studentmodellers prestanda i de flesta fall, särskilt när man tränar på TyDi QA (Typologically Diverse Languages) istället för SQuAD 1.1 (The Stanford Question Answering Dataset). Den bästa resulterande 6-lagriga student DPRmodellen behöll mer än 90% av återkallandet och MAP (Mean Average Precision) för XLT-uppgifterna (Cross-Lingual Transfer) samtidigt som tiden för inferens minskades till 63.2%. För G-XLT-uppgifterna (Generalised CrossLingual Transfer) bibehölls endast cirka 42% av återkallelsen och MAP med 53.8% av inferenstiden.

Page generated in 0.5269 seconds