51 |
Examining the Privacy Aspects and Cost-Quality Balance of a Public Sector Conversational InterfaceMeier Ström, Theo, Vesterlund, Marcus January 2024 (has links)
This thesis explores the implementation of a conversational user interface for Uppsala Municipality, aimed at optimising the balance between cost of usage and quality when using large language models for public services. The central issue addressed is the effective integration of large language models, such as OpenAI's GPT-4, to enhance municipal services without compromising user privacy and data security. The solution developed involves a prototype that utilises a model chooser and prompt tuner, allowing the interface to adapt the complexity of responses based on user input. This adaptive approach reduces costs while maintaining high response quality. The results indicate that the prototype not only manages costs effectively, but also adheres to standards of data privacy and security. Clear information on data use and transparency improved user trust and understanding. In addition, strategies were effectively implemented to handle sensitive and unexpected input, improving overall data security. Overall, the findings suggest that this approach to implementing conversational user interfaces in public services is viable, offering valuable insights into the cost-effective and secure integration of language models in the public sector. The success of the prototype highlights its potential to improve future municipal services, underscoring the importance of transparency and user engagement in public digital interfaces. / Den här masteruppsatsen undersöker implementeringen av ett konversationsgränssnitt för Uppsala kommun, med målet att optimera balansen mellan kostnad och kvalitet vid användning av stora språkmodeller för den offentliga sektorn. Den centrala frågan som besvaras är hur stora språkmodeller, såsom OpenAI:s GPT-4, kan integreras för att förbättra kommunala tjänster utan att kompromissa med användarnas integritet och datasäkerhet. Den utvecklade lösningen innefattar en prototyp som använder en modellväljare och promptjusterare, vilket gör det möjligt för gränssnittet att anpassa svarens komplexitet baserat på användarens meddelande. Detta tillvägagångssätt reducerar kostnaderna samtidigt som en hög svarskvalitet bibehålls. Resultaten visar att prototypen inte bara hanterar kostnaderna effektivt, utan också upprätthåller standarder för datasekretess och säkerhet. Tydlig information om dataanvändning och transparens förbättrade avsevärt användarnas förtroende och förståelse. Dessutom implementerades strategier effektivt för att hantera känslig och oväntad data, vilket förbättrade den övergripande datasäkerheten. Sammanfattningsvis tyder resultaten på att detta tillvägagångssätt för implementering av konversationsgränssnitt i offentliga tjänster är möjligt och erbjuder lärdomar om kostnadseffektiv och säker integration av språkmodeller i offentlig sektor. Prototypens framgång påvisar dess potential att förbättra framtida kommunala tjänster, men lyfter också vikten av transparens och användarengagemang i offentliga digitala gränssnitt.
|
52 |
Leveraging Large Language Models for Actionable Insights in Facility Management : An Applied Study in Commercial Housing Real Estate / Utnyttjande av stora språkmodeller för handlingsbara insikter i fastighetsförvaltning : En tillämpad studie inom kommersiella bostadsfastigheterAndrén, Björn January 2024 (has links)
Artificial intelligence is one of the long-term trends in the twenty-first century. Historically, the real estate industry has been slow to adopt new technology, but generative AI introduces a range of innovative applications that traditional AI has not addressed. Creating a unique opportunity for the real estate industry to evolve and position itself at the forefront of technological advancements. Despite the promising potential of large language models, research applying the technology on real world problems within real estate sector is almost non-existent. Only a limited number of studies currently exist exploring the area. No applied studies of the technology have yet to be made in Europe to the authors knowledge. The purpose of this study was thus to contribute with an applied study of the technology within the context of facility management. Exploring how generative AI can increase efficiency within facility management by utilizing large language models to analyse tenant matters. Execution consisted of partnering with a real estate company, developing propritary frameworks, technology, and testing these on real world data. A design based researched method was adjusted to fit this study. In total 822 tenant matters where analyzed by a large language model (LLM). The findings show that a large language model can be utilized to analyze tenant matters. Furthermore, that model outputs can be trusted and utilized to improve services for tenants. This study highlights the importance of original data quality, data selection, understanding data inputs and contextualizing instructions for the large language model to achieve successfull automated information extraction. Concluding that analysing tenant matters with generative AI makes it possible to identify and quantify how a real estate company functions, performs, and meets tenants’ needs as a whole —not just from a facility management perspective. / Artificiell intelligens är en av de långsiktiga trenderna under tjugoförsta århundradet. Historiskt har fastighetsbranschen varit långsam med att anamma ny teknik, men generativ AI introducerar en rad innovativa tillämpningar som traditionell AI inte har adresserat. Detta skapar en unik möjlighet för fastighetsbranschen att utvecklas och positionera sig i framkanten av tekniska framsteg. Trots den lovande potentialen hos stora språkmodeller är forskning som tillämpar tekniken, på verkliga problem inom branschen, nästan obefintlig. Endast ett begränsat antal studier existerar för närvarande som utforskar området. Ingen tillämpad studie av tekniken har ännu gjorts i Europa, enligt författarens kännedom. Syftet med denna studie var således att bidra med en tillämpad studie av tekniken inom ramen för fastighetsförvaltning. Utforska hur generativ AI kan öka effektiviteten inom fastighetsförvaltning genom att använda stora språkmodeller för att analysera hyresgäst- ärenden. Genomförandet bestod av att samarbeta med ett fastighetsbolag, utveckling av proprietära ramverk, teknik och testa dessa på verkliga data. En designbaserad forskningsmetod justerades för att passa studien. Totalt analyserades 822 hyresgästärenden av en stor språkmodell (LLM). Resultaten visar att en stor språkmodell kan användas för att analysera hyresgästärenden. Vidare att modellens svar går att lita på och kan användas för att förbättra tjänster mot hyresgäster. Studien framhäver vikten av originaldatakvalitet, val av data, förståelse för datainmatning samt kontextualisering av instruktioner för att den stora språkmodellen ska uppnå framgångsrik automatisk informationsutvinning. Slutsatsen är att AI-analys av hyresgästärenden gör det möjligt att identifiera och kvantifiera hur ett fastighetsbolag som helhet fungerar, presterar och möter hyresgästernas behov—inte bara ur ett fastighetsförvaltningsperspektiv.
|
53 |
The future of IT Project Management & Delivery: NLP AI opportunities & challengesViznerova, Ester January 2023 (has links)
This thesis explores the opportunities and challenges of integrating recent Natural Language Processing (NLP) Artificial Intelligence (AI) advancements into IT project management and delivery (PM&D). Using a qualitative design through hermeneutic phenomenology strategy, the study employs a semi-systematic literature review and semi-structured interviews to delve into NLP AI's potential impacts in IT PM&D, from both theoretical and practical standpoints. The results revealed numerous opportunities for NLP AI application across Project Performance Domains, enhancing areas such as stakeholder engagement, team productivity, project planning, performance measurement, project work, delivery, and risk management. However, challenges were identified in areas including system integration, value definition, team and stakeholder-related issues, environmental considerations, and ethical concerns. In-house and third-party model usage also presented their unique set of challenges, emphasizing cost implications, data privacy and security, result quality, and dependence issues. The research concludes the immense potential of NLP AI in IT PM&D is tempered by these challenges, and calls for robust strategies, sound ethics, comprehensive training, new ROI evaluation frameworks, and responsible AI usage to effectively manage these issues. This thesis provides valuable insights to academics, practitioners, and decision-makers navigating the rapidly evolving landscape of NLP AI in IT PM&D.
|
54 |
Large Language Models : Bedömning av ChatGPT:s potential som verktyg för kommentering av kod / Large Language Models : Assessment of ChatGPT's Potential as a Tool for Code CommentingSvensson, Tom, Vuk, Dennis January 2023 (has links)
Användningen av Artificiell Intelligens (AI) är utbredd bland verksamma företag idag, likväl privatpersoner. Det har blivit en integrerad del av vårt samhälle som ofta går obemärkt förbi. Allt från face recognition, självkörande bilar och automatisering inom arbetsrelaterade områden, har AI onekligen påverkat omvärlden. I takt med att AI-modeller fortsätter att utvecklas tillkommer även farhågor om dess påverkan på jobb, tillhörande säkerhetsrisker och etiska dilemman. Uppsatsens litteratur hjälper till att skildra AI historiskt, i nutid, men även ge en uppfattning om vart den är på väg. Den AI-modell som i nuläget har väckt störst uppmärksamhet är ChatGPT. Dess potential tycks inte ha några gränser, därmed uppstod relevansen för att öka kunskapen kring AI-modellen. Vidare gjordes en avgränsning, där fokusområdet var att undersöka hur ChatGPT kan generera kodkommentarer och potentiellt agera som ett hjälpmedel vid kommentering av källkod. I samband med avgränsningen och fokusområdet bildades även forskningsfrågan: Large Language Models: Bedömning av ChatGPT:s potential som verktyg för kommentering av kod För att besvara forskningsfrågan har avhandlingen varit baserat på en kvalitativ ansats, där urvalet av respondenter har varit programmerare. Den primära datainsamlingen har genomförts via två semistrukturerade intervjuer, varav den inledande innefattade initiala känslor kring ChatGPT och övergripande fakta om respektive intervjuobjekt. Vidare gjordes det en observation för att få en inblick i hur AI-modellen används av programmerare, för att avslutningsvis göra en uppföljande intervju post-observation i syfte att samla tankarna från intervjuobjekten efter användning av ChatGPT för att generera kodkommentarer. Baserat på den insamlade empirin kunde studien konkludera vissa begränsningar i den nuvarande modellen, inte minst behovet av tydliga instruktioner. Trots brister visar ChatGPTs framställning potential att vara en betydande resurs för kommentering av kod i framtiden. Resultaten indikerar att modellen kan generera relativt passande kommentarer i de analyserade kodkodstycken. Emellertid uttryckte deltagarna under de avslutande intervjuerna generellt sett att kommentarerna var redundanta och saknade betydande värde för att öka förståelsen av källkoden. Respondenterna diskuterade dock möjligheterna att använda ChatGPT i framtiden, men underströk behovet av förbättringar för att göra det till en tillförlitlig metod inom arbetsrelaterade situationer. / The usage of Artificial Intelligence (AI) is widespread among both companies and individuals today. It has become an integrated part of our society, often going unnoticed. From face recognition and self-driving cars to automation in work-related areas, AI has undeniably impacted the world. As AI models continue to evolve, concerns about their impact on jobs, associated security risks, and ethical dilemmas arise. The literature in this essay helps portray AI historically, in the present, and provides an insight into its future direction. The AI model that has currently garnered the most attention is ChatGPT. Its potential seems limitless, which prompted the relevance of increasing knowledge about the AI model. Furthermore, a delimitation was made, where the focus area was to investigate how ChatGPT can generate code comments and potentially act as a tool for commenting source code. As part of the research focus and scope, the research question was formulated: "Large Language Models: Assessment of ChatGPT's Potential as a Tool for Code Commenting." To answer the research question, the thesis adopted a qualitative approach, with programmers as the selected respondents. The primary data collection was conducted through two semi-structured interviews, where the initial interview involved capturing initial impressions of ChatGPT and gathering general information about the interviewees. Additionally, an observation was carried out to gain insights into how programmers utilize the AI model, followed by a post-observation interview to gather the interviewees' thoughts after using ChatGPT to generate code comments. Based on the collected empirical data, the study was able to conclude certain limitations in the current model, particularly the need for clear instructions. Despite these limitations, ChatGPT's performance demonstrates the potential to be a significant resource for code commenting in the future. The results indicate that the model can generate relatively suitable comments in the analyzed code snippets. However, during the concluding interviews, participants generally expressed that the comments were redundant and lacked significant value in enhancing the understanding of the source code. Nevertheless, the respondents 2 discussed the possibilities of using ChatGPT in the future, while emphasizing the need for improvements to establish it as a reliable method in work-related situations.
|
55 |
An initial investigation of Automatic Program Repair for Solidity Smart Contracts with Large Language Models / En första undersökning av automatisk lagning av solidity smarta kontrakt med stora språkmodellerCruz, Erik January 2023 (has links)
This thesis investigates how Large Language Models can be used to repair Solidity Smart Contracts automatically through the main contribution of this thesis, the Transformative Repair Tool. The Transformative Repair Tool achieves similar results to current state-of-the-art tools on the Smartbugs Curated Dataset and is the first published tool that uses Large Language Models to repair Solidity Smart Contracts. Moreover, the thesis explores different prompt strategies to repair Smart Contracts and assess their performance. / Detta masterexamensarbete undersöker hur stora språkmodeller kan användas för att automatisk laga solidity smarta kontrakt genom verktyget Transformative Repair Tool, som är detta masterexamensarbete huvudsakliga bidrag. Transformative Repair Tool presterar liknande som dagens bästa verktyg inom automatisk lagning av smarta kontrakt på Smartbugs Curated datasettet och är det första publicerade verktyget som just använder stora språkmodeller för att reparera solidity smarta kontrakt. Dessutom så utforskar denna rapport olika textprompts och dess prestanda för att laga smarta kontrakt
|
56 |
DEEP LEARNING BASED METHODS FOR AUTOMATIC EXTRACTION OF SYNTACTIC PATTERNS AND THEIR APPLICATION FOR KNOWLEDGE DISCOVERYMdahsanul Kabir (16501281) 03 January 2024 (has links)
<p dir="ltr">Semantic pairs, which consist of related entities or concepts, serve as the foundation for comprehending the meaning of language in both written and spoken forms. These pairs enable to grasp the nuances of relationships between words, phrases, or ideas, forming the basis for more advanced language tasks like entity recognition, sentiment analysis, machine translation, and question answering. They allow to infer causality, identify hierarchies, and connect ideas within a text, ultimately enhancing the depth and accuracy of automated language processing.</p><p dir="ltr">Nevertheless, the task of extracting semantic pairs from sentences poses a significant challenge, necessitating the relevance of syntactic dependency patterns (SDPs). Thankfully, semantic relationships exhibit adherence to distinct SDPs when connecting pairs of entities. Recognizing this fact underscores the critical importance of extracting these SDPs, particularly for specific semantic relationships like hyponym-hypernym, meronym-holonym, and cause-effect associations. The automated extraction of such SDPs carries substantial advantages for various downstream applications, including entity extraction, ontology development, and question answering. Unfortunately, this pivotal facet of pattern extraction has remained relatively overlooked by researchers in the domains of natural language processing (NLP) and information retrieval.</p><p dir="ltr">To address this gap, I introduce an attention-based supervised deep learning model, ASPER. ASPER is designed to extract SDPs that denote semantic relationships between entities within a given sentential context. I rigorously evaluate the performance of ASPER across three distinct semantic relations: hyponym-hypernym, cause-effect, and meronym-holonym, utilizing six datasets. My experimental findings demonstrate ASPER's ability to automatically identify an array of SDPs that mirror the presence of these semantic relationships within sentences, outperforming existing pattern extraction methods by a substantial margin.</p><p dir="ltr">Second, I want to use the SDPs to extract semantic pairs from sentences. I choose to extract cause-effect entities from medical literature. This task is instrumental in compiling various causality relationships, such as those between diseases and symptoms, medications and side effects, and genes and diseases. Existing solutions excel in sentences where cause and effect phrases are straightforward, such as named entities, single-word nouns, or short noun phrases. However, in the complex landscape of medical literature, cause and effect expressions often extend over several words, stumping existing methods, resulting in incomplete extractions that provide low-quality, non-informative, and at times, conflicting information. To overcome this challenge, I introduce an innovative unsupervised method for extracting cause and effect phrases, PatternCausality tailored explicitly for medical literature. PatternCausality employs a set of cause-effect dependency patterns as templates to identify the key terms within cause and effect phrases. It then utilizes a novel phrase extraction technique to produce comprehensive and meaningful cause and effect expressions from sentences. Experiments conducted on a dataset constructed from PubMed articles reveal that PatternCausality significantly outperforms existing methods, achieving a remarkable order of magnitude improvement in the F-score metric over the best-performing alternatives. I also develop various PatternCausality variants that utilize diverse phrase extraction methods, all of which surpass existing approaches. PatternCausality and its variants exhibit notable performance improvements in extracting cause and effect entities in a domain-neutral benchmark dataset, wherein cause and effect entities are confined to single-word nouns or noun phrases of one to two words.</p><p dir="ltr">Nevertheless, PatternCausality operates within an unsupervised framework and relies heavily on SDPs, motivating me to explore the development of a supervised approach. Although SDPs play a pivotal role in semantic relation extraction, pattern-based methodologies remain unsupervised, and the multitude of potential patterns within a language can be overwhelming. Furthermore, patterns do not consistently capture the broader context of a sentence, leading to the extraction of false-positive semantic pairs. As an illustration, consider the hyponym-hypernym pattern <i>the w of u</i> which can correctly extract semantic pairs for a sentence like <i>the village of Aasu</i> but fails to do so for the phrase <i>the moment of impact</i>. The root cause of this limitation lies in the pattern's inability to capture the nuanced meaning of words and phrases in a sentence and their contextual significance. These observations have spurred my exploration of a third model, DepBERT which constitutes a dependency-aware supervised transformer model. DepBERT's primary contribution lies in introducing the underlying dependency structure of sentences to a language model with the aim of enhancing token classification performance. To achieve this, I must first reframe the task of semantic pair extraction as a token classification problem. The DepBERT model can harness both the tree-like structure of dependency patterns and the masked language architecture of transformers, marking a significant milestone, as most large language models (LLMs) predominantly focus on semantics and word co-occurrence while neglecting the crucial role of dependency architecture.</p><p dir="ltr">In summary, my overarching contributions in this thesis are threefold. First, I validate the significance of the dependency architecture within various components of sentences and publish SDPs that incorporate these dependency relationships. Subsequently, I employ these SDPs in a practical medical domain to extract vital cause-effect pairs from sentences. Finally, my third contribution distinguishes this thesis by integrating dependency relations into a deep learning model, enhancing the understanding of language and the extraction of valuable semantic associations.</p>
|
57 |
<b>Leveraging Advanced Large Language Models To Optimize Network Device Configuration</b>Mark Bogdanov (18429435) 24 April 2024 (has links)
<p dir="ltr">Recent advancements in large language models such as ChatGPT and AU Large allow for the effective integration and application of LLMs into network devices such as switches and routers in terms of the ability to play a role in configuration and management. The given devices are an essential part of every network infrastructure, and the nature of physical networking topologies is complex, which leads to the need to ensure optimal network efficiency and security via meticulous and precise configurations.</p><p dir="ltr">The research explores the potential of an AI-driven interface that utilizes AU Large to streamline, enhance, and automate the configuration process of network devices while ensuring that the security of the whole process is guaranteed by running the entire system on-premise. Three core areas are of primary concern in the given study: the effectiveness of integrating the AU Large into network management systems, the impact on efficiency, accuracy, and error rates in network configurations, and the scalability and adaptability to more complex requirements and growing network environments.</p><p dir="ltr">The key performance metrics evaluated are the error rate in the generated configurations, scalability by looking at the performance as more network devices are added, and the ability to generate incredibly complex configurations accurately. The high-level results of the critical performance metrics show an evident correlation between increased device count and increased prompt complexity with a degradation in the performance of the AU Large model from Mistral AI.</p><p dir="ltr">This research has significant potential to alter preset network management practices by applying AI to make network configuration more efficient, reduce the scope for human error, and create an adaptable tool for diverse and complex networking environments. This research contributes to both AI and network management fields by highlighting a path toward the “future of network management.”</p>
|
58 |
Natural Language Based AI Tools in Interaction Design Research : Using ChatGPT for Qualitative User Research Insight AnalysisSaare, Karmen January 2024 (has links)
This thesis investigates the use of Artificial Intelligence, specifically the Large Language Model (LLM) application ChatGPT in the context of qualitative user research, with the goal of enhancing the user research interview analysis process. Through an empirical study where ChatGPT was used in the process of a typical user research insight analysis, the limitations and opportunities of the AI tool are examined. The study's results highlight the most significant insights from the empirical investigation, serving as examples to raise awareness of the implications of using ChatGPT in the context of user interview analysis. The study concludes that ChatGPT has the potential to enhance the interpretation of primarily individual interviews by generating well-articulated summaries, provided their accuracy can be verified. Additionally, ChatGPT may be particularly useful in low-risk design projects where the consequences of potential misinterpretations are minimal. Finally, the significance of clearly articulated written instructions for ChatGPT for best results is pointed out.
|
59 |
Direct Preference Optimization for Improved Technical WritingAssistance : A Study of How Language Models Can Support the Writing of Technical Documentation at Saab / En studie i hur språkmodeller kan stödja skrivandet av teknisk dokumentation på SaabBengtsson, Hannes, Habbe, Patrik January 2024 (has links)
This thesis explores the potential of Large Language Models (LLMs) to assist in the technical documentation process at Saab. With the increasing complexity and regulatory demands on such documentation, the objective is to investigate advanced natural language processing techniques as a means of streamlining the creation of technical documentation. Although many standards exist, this thesis particularly focuses on the standard ASD-STE100, Simplified Technical English abbrv. STE, a controlled language for technical documentation. STE's primary aim is to ensure that technical documents are understandable to individuals regardless of their native language or English proficiency. The study focuses on the implementation of Direct Preference Optimization (DPO) and Supervised Instruction Fine-Tuning (SIFT) to refine the capabilities of LLMs in producing clear and concise outputs that comply with STE. Through a series of experiments, we investigate the effectiveness of LLMs in interpreting and simplifying technical language, with a particular emphasis on adherence to STE standards. The study utilizes a dataset comprised of target data paired with synthetic source data generated by a LLM. We apply various model training strategies, including zero-shot performance, supervised instruction fine-tuning, and direct preference optimization. We evaluate the various models' output using established quantitative metrics for text simplification and substitute human evaluators with company internal software for evaluating adherence to company standards and STE. Our findings suggest that while LLMs can significantly contribute to the technical writing process, the choice of training methods and the quality of data play crucial roles in the model's performance. This study shows how LLMs can improve productivity and reduce manual work. It also looks at the problems and suggests ways to make technical documentation automation better in the future.
|
60 |
The shifting landscape of data : learning to tame distributional shiftsIbrahim, Adam 05 1900 (has links)
Les modèles d'apprentissage automatique (ML) atteignent des performances remarquables sur les tâches pour lesquelles ils sont entraînés. Cependant, ils sont souvent sensibles aux changements dans la distribution des données, ce qui peut nuir à leur fiabilité. Cela peut se produire lorsque la distribution des données rencontrées au déploiement diffère de celle vue pendant l'entraînement, entraînant une dégradation considérable des performances. Pire encore, les attaquants peuvent également induire de tels changements afin d'induire les modèles d'apprentissage automatique en erreur. Enfin, cela peut même arriver si l'entraînement est effectué séquentiellement sur des distributions de données différentes. Ces changements de distribution sont omniprésents en ML, nuisant à l'équité, à la fiabilité, à la sécurité et à l'efficacité des modèles d'apprentissage automatique. Cette thèse se concentre sur la compréhension et l'amélioration de la robustesse et de l'adaptation des modèles de ML aux changements de distribution, englobant à la fois des travaux théoriques et expérimentaux.
Tout d'abord, nous étudions les limites fondamentales de l'optimisation différentiable à plusieurs objectifs. Une meilleure compréhension de ces limites est importante car les travaux sur les changements de distribution reposent souvent sur des formulations de la théorie des jeux. Nous fournissons de nouvelles bornes inférieures sur la vitesse de convergence d'une large classe de méthodes, ainsi que de nouvelles métriques de conditionnement qui aident à évaluer la difficulté d'optimiser des classes de jeux, et expliquent le potentiel de convergence rapide, même sans forte convexité ou forte concavité.
Deuxièmement, nous abordons le manque de robustesse aux attaques adversarielles contre plusieurs types d'attaques, une limitation courante des méthodes de pointe. Nous proposons une approche inspirée de la généralisation de domaine, utilisant l'extrapolation des risques (REx) pour promouvoir la robustesse à plusieurs attaques. Notre méthode atteint des performances supérieures aux bases de référence existantes, que les attaques aient été vues ou non lors de l'entraînement.
Enfin, nous nous intéressons aux défis du pré-entraînement continu pour les grands modèles de langage (LLM). Ces modèles sont confrontés à un compromis: soit ils oublient de manière catastrophique les connaissances antérieures lorsqu'ils sont mis à jour sur de nouvelles données, soit ils nécessitent un réentraînement complet coûteux en calcul. Nous démontrons qu'une combinaison de réchauffement et de re-décroissance du taux d'apprentissage, et de réutilisation des données précédemment utilisées permet aux LLM d'apprendre continuellement à partir de nouvelles distributions tout en préservant leurs performances sur les données auparavant apprises. Cette approche permet d'atteindre les performances d'un réentraînement complet, mais à une fraction du coût en calcul.
Dans l'ensemble, cette thèse apporte des considérations importantes pour améliorer la robustesse et l'adaptation aux changements de distribution. Ces contributions ouvrent des voies prometteuses pour relever les défis du ML du monde réel dans l'optimisation multiobjectif, la défense contre les adversaires et l'apprentissage continu des grands modèles de langage. / Machine learning (ML) models achieve remarkable performance on tasks they are trained for. However, they often are sensitive to shifts in the data distribution, which may lead to unexpected behaviour. This can happen when the data distribution encountered during deployment differs from that used for training, leading to considerable degradation of performance. Worse, attackers may also induce such shifts to fool machine learning models. Finally, this can even happen when training sequentially on different data distribution. These distributional shifts are pervasive in ML, hindering the fairness, reliability, safety and efficiency of machine learning models. This thesis is focused on understanding and improving the robustness and adaptation of ML models to distributional shifts, encompassing both theoretical and experimental work.
First, we investigate the fundamental limits of differentiable multiobjective optimisation. This investigation is important because works on distributional shifts often rely on game theoretical formulations. We provide new lower bounds on the speed of convergence of a large class of methods, along with novel condition numbers that help assess the difficulty to optimise classes of games, and explain the potential for fast convergence even without strong convexity or strong concavity.
Second, we address the lack of adversarial robustness against multiple attack types, a common limitation of state-of-the-art methods. We propose a domain generalisation-inspired approach, using Risk Extrapolation (REx) to promote robustness across a range of attacks. Our method achieves performance superior to existing baselines for both seen and novel types of attacks.
Finally, we tackle the challenges of continual pretraining for large language models (LLMs). These models face a trade-off: either they catastrophically forget previous knowledge when updated on new data, or they require computationally expensive full retraining. We demonstrate that a combination of learning rate re-warming, re-decaying, and the replay of previous data allows LLMs to continually learn from new distributions while preserving past knowledge. This approach matches the performance of full retraining, but at a fraction of the computational cost.
Overall, this thesis contributes impactful considerations towards improving robustness and adaptation to distributional shifts. These contributions open promising avenues for addressing real-world ML challenges across multiobjective optimisation, adversarial defense, and continual learning of large language models.
|
Page generated in 0.0587 seconds