• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 66
  • 7
  • 6
  • 1
  • 1
  • Tagged with
  • 99
  • 99
  • 99
  • 37
  • 34
  • 31
  • 31
  • 31
  • 28
  • 26
  • 26
  • 26
  • 24
  • 22
  • 20
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
71

An initial investigation of Automatic Program Repair for Solidity Smart Contracts with Large Language Models / En första undersökning av automatisk lagning av solidity smarta kontrakt med stora språkmodeller

Cruz, Erik January 2023 (has links)
This thesis investigates how Large Language Models can be used to repair Solidity Smart Contracts automatically through the main contribution of this thesis, the Transformative Repair Tool. The Transformative Repair Tool achieves similar results to current state-of-the-art tools on the Smartbugs Curated Dataset and is the first published tool that uses Large Language Models to repair Solidity Smart Contracts. Moreover, the thesis explores different prompt strategies to repair Smart Contracts and assess their performance. / Detta masterexamensarbete undersöker hur stora språkmodeller kan användas för att automatisk laga solidity smarta kontrakt genom verktyget Transformative Repair Tool, som är detta masterexamensarbete huvudsakliga bidrag. Transformative Repair Tool presterar liknande som dagens bästa verktyg inom automatisk lagning av smarta kontrakt på Smartbugs Curated datasettet och är det första publicerade verktyget som just använder stora språkmodeller för att reparera solidity smarta kontrakt. Dessutom så utforskar denna rapport olika textprompts och dess prestanda för att laga smarta kontrakt
72

DEEP LEARNING BASED METHODS FOR AUTOMATIC EXTRACTION OF SYNTACTIC PATTERNS AND THEIR APPLICATION FOR KNOWLEDGE DISCOVERY

Mdahsanul Kabir (16501281) 03 January 2024 (has links)
<p dir="ltr">Semantic pairs, which consist of related entities or concepts, serve as the foundation for comprehending the meaning of language in both written and spoken forms. These pairs enable to grasp the nuances of relationships between words, phrases, or ideas, forming the basis for more advanced language tasks like entity recognition, sentiment analysis, machine translation, and question answering. They allow to infer causality, identify hierarchies, and connect ideas within a text, ultimately enhancing the depth and accuracy of automated language processing.</p><p dir="ltr">Nevertheless, the task of extracting semantic pairs from sentences poses a significant challenge, necessitating the relevance of syntactic dependency patterns (SDPs). Thankfully, semantic relationships exhibit adherence to distinct SDPs when connecting pairs of entities. Recognizing this fact underscores the critical importance of extracting these SDPs, particularly for specific semantic relationships like hyponym-hypernym, meronym-holonym, and cause-effect associations. The automated extraction of such SDPs carries substantial advantages for various downstream applications, including entity extraction, ontology development, and question answering. Unfortunately, this pivotal facet of pattern extraction has remained relatively overlooked by researchers in the domains of natural language processing (NLP) and information retrieval.</p><p dir="ltr">To address this gap, I introduce an attention-based supervised deep learning model, ASPER. ASPER is designed to extract SDPs that denote semantic relationships between entities within a given sentential context. I rigorously evaluate the performance of ASPER across three distinct semantic relations: hyponym-hypernym, cause-effect, and meronym-holonym, utilizing six datasets. My experimental findings demonstrate ASPER's ability to automatically identify an array of SDPs that mirror the presence of these semantic relationships within sentences, outperforming existing pattern extraction methods by a substantial margin.</p><p dir="ltr">Second, I want to use the SDPs to extract semantic pairs from sentences. I choose to extract cause-effect entities from medical literature. This task is instrumental in compiling various causality relationships, such as those between diseases and symptoms, medications and side effects, and genes and diseases. Existing solutions excel in sentences where cause and effect phrases are straightforward, such as named entities, single-word nouns, or short noun phrases. However, in the complex landscape of medical literature, cause and effect expressions often extend over several words, stumping existing methods, resulting in incomplete extractions that provide low-quality, non-informative, and at times, conflicting information. To overcome this challenge, I introduce an innovative unsupervised method for extracting cause and effect phrases, PatternCausality tailored explicitly for medical literature. PatternCausality employs a set of cause-effect dependency patterns as templates to identify the key terms within cause and effect phrases. It then utilizes a novel phrase extraction technique to produce comprehensive and meaningful cause and effect expressions from sentences. Experiments conducted on a dataset constructed from PubMed articles reveal that PatternCausality significantly outperforms existing methods, achieving a remarkable order of magnitude improvement in the F-score metric over the best-performing alternatives. I also develop various PatternCausality variants that utilize diverse phrase extraction methods, all of which surpass existing approaches. PatternCausality and its variants exhibit notable performance improvements in extracting cause and effect entities in a domain-neutral benchmark dataset, wherein cause and effect entities are confined to single-word nouns or noun phrases of one to two words.</p><p dir="ltr">Nevertheless, PatternCausality operates within an unsupervised framework and relies heavily on SDPs, motivating me to explore the development of a supervised approach. Although SDPs play a pivotal role in semantic relation extraction, pattern-based methodologies remain unsupervised, and the multitude of potential patterns within a language can be overwhelming. Furthermore, patterns do not consistently capture the broader context of a sentence, leading to the extraction of false-positive semantic pairs. As an illustration, consider the hyponym-hypernym pattern <i>the w of u</i> which can correctly extract semantic pairs for a sentence like <i>the village of Aasu</i> but fails to do so for the phrase <i>the moment of impact</i>. The root cause of this limitation lies in the pattern's inability to capture the nuanced meaning of words and phrases in a sentence and their contextual significance. These observations have spurred my exploration of a third model, DepBERT which constitutes a dependency-aware supervised transformer model. DepBERT's primary contribution lies in introducing the underlying dependency structure of sentences to a language model with the aim of enhancing token classification performance. To achieve this, I must first reframe the task of semantic pair extraction as a token classification problem. The DepBERT model can harness both the tree-like structure of dependency patterns and the masked language architecture of transformers, marking a significant milestone, as most large language models (LLMs) predominantly focus on semantics and word co-occurrence while neglecting the crucial role of dependency architecture.</p><p dir="ltr">In summary, my overarching contributions in this thesis are threefold. First, I validate the significance of the dependency architecture within various components of sentences and publish SDPs that incorporate these dependency relationships. Subsequently, I employ these SDPs in a practical medical domain to extract vital cause-effect pairs from sentences. Finally, my third contribution distinguishes this thesis by integrating dependency relations into a deep learning model, enhancing the understanding of language and the extraction of valuable semantic associations.</p>
73

Tailored Query Resolution for Medical Data Interaction: Integrating LangChain4j, LLMs, and Retrieval Augmented Generation : Utilizing Real Time Embedding Techniques / Skräddarsydd Frågeupplösning för Interaktion med Medicinsk Data: Integrering av LangChain4j, LLMs och Hämtnings-Förstärkt Generation : Med realtidsinbäddningtekniker

Tegsten, Samuel January 2024 (has links)
Current artificial intelligence tools, including machine learning and large language models, display inabilities to interact with medical data in real time and raise privacy concerns related to user data management. This study illustrates the development of a system prototype using LangChain4j, which is an open-source project offering a multitude of AI-tools, including embedding tools, retrieval-augmented generation, and unified API:s for large language model providers. It was utilized to process medical data from a Neo4j database and enabled real-time interaction for that data. All content generation was generated locally to address privacy concerns, while using Apache Kafka for data distribution. The system prototype was evaluated by response time, resource consumption and accuracy assessment. Among the models assessed, LLaMA 3 emerged as the top performer in accuracy, successfully identifying 42.87% of all attributes with a correctness rate of 89.81%. Meanwhile, Phi3 exhibited superior outcomes in both resource consumption and response time. The embedding process, while enabling the selection of visible data, imposed limitations on general usability. In summary, this thesis advances data interaction using AI by developing a prototype that enables real-time interaction with medical data. It achieves high accuracy and efficient resource utilization while addressing limitations in current AI tools related to real-time processing and privacy concerns. / Nuvarande verktyg för artificiell intelligens, inklusive maskininlärning och stora språkmodeller, visar oförmåga att interagera med medicinska data i realtid och väcker integritetsproblem relaterade till hantering av användardata. Denna studie illustrerar utvecklingen av ett systemprototyp med LangChain4j, ett open-source-projekt som erbjuder en mängd AI-verktyg, inklusive inbäddningsverktyg, retrieval-augmented generation och enhetliga API för leverantörer av stora språkmodeller. Det användes för att bearbeta medicinska data från en Neo4j-databas och möjliggjorde realtidsinteraktion för dessa data. All innehållsgenerering skedde lokalt med Apache Kafka för datadistribution. Systemprototypen utvärderades utifrån svarstid, resursförbrukning och noggrannhetsbedömning. Bland de modeller som utvärderades visade sig LLaMA 3 vara den bästa presteraren i noggrannhet, och identifierade framgångsrikt 42,87 % av alla attribut med en korrekthet på 89,81 %. Samtidigt visade Phi3 överlägsna resultat både i resursförbrukning och svarstid. Inbäddningsprocessen, medan den möjliggjorde valet av synliga data, innebar begränsningar för allmän användbarhet. Sammanfattningsvis förbättrar denna avhandling datainteraktion med AI genom att utveckla en prototyp som möjliggör realtidsinteraktion med medicinska data. Den uppnår hög noggrannhet och effektiv resursanvändning samtidigt som den adresserar begränsningar i nuvarande AI-verktyg relaterade till realtidsbearbetning och integritetsproblem.
74

Implementering av Retrieval-Augmented Generation för automatiserad analys av hållbarhetsrapportering : Utnyttjande av språkmodeller som stöd för att bedöma företags rapportering av verksamhetens påverkan på biologisk mångfald / Implementation of Retrieval-Augmented Generation to automate analysis of sustainability reports : Utilizing language models as support to evaluate companies reports of their activities’ effects on biodiversity

Wilmi, Wiljam, Roslund, Niklas January 2024 (has links)
Vikten av hållbarhetsredovisning kan ses genom den uppmärksamhet ämnet har från företag, media, myndigheter och den ökande regleringen genom införandet av nya direktiv och lagstiftning. Att manuellt analysera företags hållbarhetsredovisningar är en tidskrävande process. En automatiserad analys av hållbarhetsredovisningar skulle innebära ekonomiska och tidsmässiga vinster när viktiga insikter tas fram relaterat till större företags påverkan på sin miljö och omgivning. Denna studie syftar till att utforska möjligheterna till en automatisering av en befintlig manuell arbetsmetod. Prototypen som utvecklats tillämpar moderna språkbehandlingsmetoder, ett område inom maskininlärning, för att realisera denna vision. Studiens implementation uppnår för de utvärderade språkmodellerna upp till 96% precision för majoritetsklassen vid bearbetning av grunddatat respektive 55% precision för minoritetsdataklassen vid bearbetning av grunddata jämfört resultat från den manuellt genomförda metoden. Slutsatsen är att en automatiserad version av den befintliga manuella analysmetoden kan konstrueras och även förbättras med den snabba utveckling som sker inom teknologi och språkmodeller, om ytterligare resurser avsätts. Resultaten visar hopp om potentialen för en metodik som utvecklas i vidare arbeten. / The importance of sustainability reporting can be observed by the attention directed towards the subject from companies, media and authorities’ continuous new directives and laws. To manually analyze companies’ sustainability reports is a time-consuming process. An automated approach analyzing sustainability reports would give advantages regarding both time and economics when important insights related to companies’ operations are brought into light. This study aims to explore possibilities in automating an existing manual method related to analyzing sustainability reports. The developed prototype applies modern language models and methods related to machine learning to realize this vision. For the evaluated language models, the study’s implementation achieves up to 96% precision for the majority class, while the minority class achieves up to 55% precision in processing of data, when compared to reference results from the manual evaluation method. The work’s conclusion indicates that an automated version of the existing manual method for analysis can be constructed with sufficient resources, and even further improved as the area of technology further advances. The results are positive for the potential for a more sophisticated method that can be developed in further work.
75

<b>Leveraging Advanced Large Language Models To Optimize Network Device Configuration</b>

Mark Bogdanov (18429435) 24 April 2024 (has links)
<p dir="ltr">Recent advancements in large language models such as ChatGPT and AU Large allow for the effective integration and application of LLMs into network devices such as switches and routers in terms of the ability to play a role in configuration and management. The given devices are an essential part of every network infrastructure, and the nature of physical networking topologies is complex, which leads to the need to ensure optimal network efficiency and security via meticulous and precise configurations.</p><p dir="ltr">The research explores the potential of an AI-driven interface that utilizes AU Large to streamline, enhance, and automate the configuration process of network devices while ensuring that the security of the whole process is guaranteed by running the entire system on-premise. Three core areas are of primary concern in the given study: the effectiveness of integrating the AU Large into network management systems, the impact on efficiency, accuracy, and error rates in network configurations, and the scalability and adaptability to more complex requirements and growing network environments.</p><p dir="ltr">The key performance metrics evaluated are the error rate in the generated configurations, scalability by looking at the performance as more network devices are added, and the ability to generate incredibly complex configurations accurately. The high-level results of the critical performance metrics show an evident correlation between increased device count and increased prompt complexity with a degradation in the performance of the AU Large model from Mistral AI.</p><p dir="ltr">This research has significant potential to alter preset network management practices by applying AI to make network configuration more efficient, reduce the scope for human error, and create an adaptable tool for diverse and complex networking environments. This research contributes to both AI and network management fields by highlighting a path toward the “future of network management.”</p>
76

Towards Automatic Generation of Personality-Adapted Speech and Emotions for a Conversational Companion Robot / Mot Automatisk Generering av Personlighets Anpassade Tal och Känslor för en Samtalskunnig Sällskaps Robot

Galatolo, Alessio January 2022 (has links)
Previous works in Human-Robot Interaction have demonstrated the positive potential benefit of designing highly anthropomorphic robots. This includes physical appearance but also whether they can express emotions, behave in a congruent manner, etc. This work wants to explore the creation of a robot that is able to express a given personality consistently throughout a dialogue while also manifesting congruent emotional expressions. Personality defines many aspects of the character of a person and it can influence how one speaks, behaves, reacts to events, etc. Here, we only focus our attention on language and on how it changes depending on one particular personality trait, the extraversion. To this end, we tested different language models to automate the process of generating language according to a particular personality. We also compared large language models such as GPT-3 to smaller ones, to analyse how size can correlate to performance in this task. We initially evaluated these methods through a fairly small user study in order to confirm the correct manipulation of personality in a text-only context. Results suggest that personality manipulation and how well it is understood highly depend on the context of a dialogue, with a more ‘personal’ dialogue being more successful in manifesting personality. Also, the performance of GPT-3 is comparable to smaller models, specifically trained, with the main difference only given in the perceived fluency of the generations. We then conducted a follow-up study where we chose to use a robot that is capable of showing different facial expressions used to manifest different emotions, the Furhat robot. We integrated into the robot the generations from our language models together with an emotion classification method that is used to guide its facial expressions. Whilst the output of our models did trigger different emotional expressions, resulting in robots which differed both in their language and nonverbal behaviour, resultant perception of these robots’ personality only approached significance (p ∼ 0.08). In this study, GPT3 performed very similarly to much smaller models, with the difference in fluency also being much smaller than before. We did not see any particular change in the perception of the robots in terms of likeability nor uncanniness. / Tidigare arbeten inom Människa-robotinteraktion har visat den positiva potentiella fördelen med att designa mycket antropomorfa robotar. Detta inkluderar fysiskt utseende men också huruvida de kan uttrycka känslor, bete sig på ett kongruent sätt, etc. Detta arbete vill utforska skapandet av en robot som kan uttrycka en given personlighet konsekvent under en dialog samtidigt som den manifesterar kongruenta känslomässiga uttryck. Personlighet definierar många aspekter av en persons karaktär och den kan påverka hur man talar, beter sig, reagerar på händelser etc. Här fokuserar vi vår uppmärksamhet endast på språket och på hur det förändras beroende på ett särskilt personlighetsdrag, extraversion. För detta ändamål testade vi olika språkmodeller för att automatisera processen att skapa språk enligt en viss personlighet. Vi jämförde även stora språkmodeller som GPT-3 med mindre, för att analysera hur storlek kan relatera till prestanda i denna uppgift. Vi utvärderade inledningsvis dessa metoder genom en mindre användarstudie för att bekräfta att personligheten kan manipuleras på rätt sätt i en textbaserad kontext. Resultaten tyder på att personlighetsmanipulation och hur väl den förstås i hög grad beror på sammanhanget i en dialog, där en mer ‘personlig’ dialog är mer framgångsrik när det gäller att manifestera personlighet. Prestandan hos GPT-3 är också jämförbar med mindre modeller, specifikt tränade på en uppgift, där den största skillnaden var i den genererade textens upplevda flyt. Vi gjorde sedan en uppföljningsstudie där vi valde att använda en robot som är kapabel att visa olika ansiktsuttryck och därigenom kapabel att manifestera olika känslor, Furhat-roboten. Vi integrerade talet som genererades från våra språkmodeller i roboten tillsammans med en känsloklassificeringsmetod som används för att styra dess ansiktsuttryck. Medan resultatet av våra modeller framkallade olika känslomässiga uttryck, vilket resulterade i robotar som skilde sig åt både i språk och icke-verbal kommunikation, närmade sig endast den resulterande uppfattningen av dessa robotars personlighet signifikans (p ∼ 0.08). I denna studie presterade GPT-3 mycket likartat med mycket mindre modeller, med skillnaden i flyt också mycket mindre än tidigare. Vi såg ingen speciell förändring i uppfattningen av robotarna när det gäller sympati eller obehaglighet.
77

<b>INTEGRATION OF UAV AND LLM IN AGRICULTURAL ENVIRONMENT</b>

Sudeep Reddy Angamgari (20431028) 16 December 2024 (has links)
<p dir="ltr">Unmanned Aerial Vehicles (UAVs) are increasingly applied in agricultural tasks such as crop monitoring, especially with AI-driven enhancements significantly increasing their autonomy and ability to execute complex operations without human interventions. However, existing UAV systems lack efficiency, intuitive user interfaces using natural language processing for command input, and robust security which is essential for real-time operations in dynamic environments. In this paper, we propose a novel solution to create a secure, efficient, and user-friendly interface for UAV control by integrating Large Language Model (LLM) with the case study on agricultural environment. In particular, we designed a four-stage approach that allows only authorized user to issue voice commands to the UAV. The command is issued to the LLM controller processed by LLM using API and generates UAV control code. Additionally, we focus on optimizing UAV battery life and enhancing scene interpretation of the environment. We evaluate our approach using AirSim and an agricultural setting built in Unreal Engine, testing under various conditions, including variable weather and wind factors. Our experimental results confirm our method's effectiveness, demonstrating improved operational efficiency and adaptability in diverse agricultural scenarios.</p>
78

Applied Retrieval Augmented Generation Within Service Desk Automation

Cederlund, Oscar January 2024 (has links)
Background. New ways of modeling abstract concepts have been enabled due to the recent boom in generative machine learning brought on by transformer architecture. By modeling abstract concepts within high-dimensional vectors their semantic meaning can be inferred and compared, which allows for methods such as embedding-based retrieval and the groundwork for a retrieval-augmented generation. Large language models can augment their parametric generative capabilities by introducing non-parametric information through retrieval processes. Objectives. Previous studies have explored different uses of embedding-based retrieval and retrieval-augmented generation, and this study examines the impact of these methods when used as an aid to support technicians. Methods. By developing and deploying a proof-of-concept system using embedding-based retrieval and retrieval-augmented generation to the Södra ITs service desk, the thesis could monitor system performance. Introducing a system to the service desk that generates instructional solutions to the support tickets and presenting them to the technician. The thesis investigates both systems' perceived performance based on the participating IT technician's input along with the retention of generated solutions and the quality of the solutions. Results. With 75.4% of the systems generated solutions being classified as reasonable solutions to ticket problems the system was deployed to the service desk. After an evaluation period where the technicians had been working with the system, it was shown that the solutions had a retention rate of 38.4%. These results were validated by a survey conducted at the service desk where the inputs were gathered from the technicians, showing a great deal degree of user engagement but a varying opinion on the system's helpfulness. Conclusions. Despite the varying degrees of opinion on the usefulness of the system among the technicians the numbers from the production test show that a significant amount of tickets were solved with the help of the system. Still, there's a huge dependency on seamless integration with the technicians and ticket quality from the requester. / Bakgrund. Nya sätt att modellera abstrakta begrepp har möjliggjorts tack vare den senaste tidens tillväxt inom generativ maskininlärning tack vare transformatorarkitekturen. Genom att modellera abstrakta begrepp i högdimensionella vektorer kan deras semantiska innebörd tolkas och jämföras, vilket möjliggör metoder som inbäddningsbaserad hämtning och grunden för en hämtningsförstärkt generation. Stora språkmodeller kan utvidga sina parametriska generativa förmågor genom att införa icke-parametrisk information genom hämtningsprocesser. Syfte. Tidigare studier har behandlat olika användningsområden för inbäddningsbaserad hämtning och hämtningsförstärkt generering, och i det här examensarbetet undersöks vilken inverkan dessa metoder har när de används som ett hjälpmedel för supporttekniker. Metod. Genom att utveckla och driftsätta ett prototypsystem som använder inbäddningsbaserad hämtning och hämtningsförstärkt generering till Södra ITs servicedesk, kunde examensarbetet övervaka systemets prestanda. Detta genom att införa ett system i servicedesken som genererar instruktionslösningar till supportärendena och presentera dem för teknikern. Examensarbetet undersöker både systemens upplevda prestanda baserat på den deltagande IT-teknikerns synpunkter tillsammans med kvarhållandet av genererade lösningar och kvaliteten på lösningarna. Resultat. Då 75,4% av de systemgenererade lösningarna klassificerades som rimliga för problemen i ärendena driftsattes systemet i servicedesken. Efter en utvärderingsperiod där teknikerna hade arbetat med systemet visade det sig att lösningarna hade en kvarhållningsgrad på 38,4%. Dessa resultat validerades av en undersökning som utförts vid servicedesken där synpunkter samlades in från teknikerna, vilket visade på en hög grad av användarengagemang men en varierande syn på systemets användbarhet. Slutsatser. Trots de varierande synpunkterna på systemets användbarhet bland teknikerna visar siffrorna från produktionstestningen att en betydande mängd ärenden löstes med hjälp av systemet. Dock är man fortfarande mycket beroende av en smidig integration med teknikerna och en god kvalitet på ärendena från beställaren.
79

CHEMICAL SPACE INVADERS: ENHANCING EXPLORATION OF MODULARLY CONSTRUCTED CHEMICAL SPACES USING CONTEXT AWARE AI AGENTS

Matthew Muhoberac (19820007) 10 October 2024 (has links)
<p dir="ltr">Chemical science can be imagined as a universe of information in which individual galaxies, solar systems, stars, and planets are compounds, reactions, biomolecules, etc. which need to be discovered, researched, and documented. The problem with this is that the universe of chemical science is potentially vaster than the one in which we live, and we are exploring it in a relatively inefficient manner. There is a scene in one of my favorite television shows, Futurama, which paints a picture of traditional chemical exploration. Taking place in the 30<sup>th</sup> century, the main character Fry loses his robot friend Bender in outer space and resorts to using a giant telescope in the Himalayan mountains to randomly search through points in space to try to find him. After days of searching nonstop, he gives up noting that it is an impossible task because space is so vast in size, and he is searching so inefficiently. While human exploration of chemistry may not be as inefficient, there are a lot of steps which are driven by trial and error and educated guesswork which ultimately introduce major inefficiencies into scientific discovery. While we don’t live in the 30<sup>th</sup> century yet, we do have access to 21<sup>st</sup>century technology which can assist in exploring chemistry in a more directed manner. This mainly involves using machine learning, search algorithms, and generative powered exploratory AI to serve as a force multiplier which can serve to assist human chemists in chemical exploration. To shamelessly compare this with another space-based sci-fi reference, this would be akin to deploying hundreds or thousands of automated space probes to search unexplored planets, akin to how the empire found the rebellion on Hoth in the Empire Strikes Back.</p><p dir="ltr">The journey to integrate AI with chemical exploration starts with the important concept of standardization and how to apply it to chemically relevant data. To easily organize, store, and access relevant aspects of small molecules, macromolecules, chemical reactions, biological assays, etc. it is imperative that data be represented in a standard format which accurately portrays necessary chemical information. This becomes especially relevant as humans aggregate more and more chemical data. In this thesis, we tackle a subset of standardization in Chapter 2 involving benchmarking sets for comparative evaluation of docking software. One major reason why standardization is so important is that standardization promotes ease of access to relevant data, regardless of if this access is attempted by human or computational means. While improving data access for humans is beneficial, computationally it is a game changer when datamining training data for machine learning (ML) applications. Having standardized data readily available for computational access allows for software to rapidly access and preprocess relevant data boosts efficiency in ML model training. In Chapter 4 of this thesis, the central database of the CIPHER close-loop system is standardized and integrated with a REST API, allowing for rapid data acquisition via a structured URL call. Having database standardization and a mechanism for easy data mining makes a database “ML ready” and promotes the database for ML applications.</p><p dir="ltr">Build upon data standardization and training ML models for chemical applications, the next step of this journey revolves around a concept known as a “chemical space” and how chemists can approximate and sample chemical spaces in a directed manner. In the context of this thesis, a chemical space can be visualized in the following manner. Start by envisioning any chemical relationship between some inputs and outputs as an unknown mathematical function. For example, if one is measuring the assay response of a specific drug at a certain concentration, the input would be the concentration, and the output would be the assay response. Then the bounds of this space are set by determining the range of input values and this forms a chemical space which corresponds to the chemical problem. Chemists sample these spaces every day when they go into the lab, run experiments, and analyze their data. While the example described above is relatively simple in scope, even if the relationship is very complex techniques such as ML can be used to approximate the relationship. An example of this approximation is shown in Chapter 3 of this thesis, where normalizing flow architecture is used to bias a vector space representation of molecules with chemical properties, creating a space which correlates compound and property and can be sampled to provided compounds with specific values of trained chemical properties. Training individual models is important, but to truly emulate certain chemical processes multiple models may need to be combined with physical instrumentation to efficiently sample and validate a chemical space. Chapter 4 of this thesis expands upon this concept by integrating a variety of ML modules with high-throughput (HT) bioassay instrumentation to create a “close loop” system designed around discovering, synthesizing, and validating non-addictive analgesics.</p><p dir="ltr">The final step of this journey is to integrate these systems which sample chemical spaces with AI, allowing for automated exploration of these spaces in a directed manner. There are several AI frameworks which can be used separately or combined to accomplish this task, but the framework that is the focus of this thesis is AI agents. AI agents are entities which use some form of AI to serve as a logical processing center which drives their exploration through a problem space. This can be a simple algorithm, some type of heuristic model, or an advance form of generative AI such as an LLM. Additionally, these agents generally have access to certain tools which serve as a medium for interaction with physical or computational environments, such as controlling a robotic arm or searching a database. Finally, these agents generally have a notion of past actions and observations, commonly referred to as memory, which allows agents to recall important information as they explore. Chapter 5 of this thesis details a custom agentic framework which is tailored towards complex scientific applications. This framework builds agents from source documentation around a specific user defined scope, provides them with access to literature and documentation in the form of embeddings, has custom memory for highly targeted retention, and allows form agents to communicate with one another to promote collaborative problem solving. Chapter 6 of this thesis showcase an application of a simpler agentic framework to an automated lipidomic workflow which performs comparative analysis on 5xFAD vs. WT mice brain tissue. The group of AI agents involved in this system generate mass spectrometry worklists, filter data into categories for analysis, perform comparative analysis, and allow for the user to dynamically create plots which can be used to answer specific statistical questions. In addition to performing all these operational and statistical analysis functions, the system includes an agent which uses document embeddings trained on curated technical manuals and protocols to answer user questions via a chatbot style interface. Overall, the system showcases how AI can effectivity be applied to relevant chemical problems to enhance speed, bolster accuracy, and improve usability.</p>
80

The shifting landscape of data : learning to tame distributional shifts

Ibrahim, Adam 05 1900 (has links)
Les modèles d'apprentissage automatique (ML) atteignent des performances remarquables sur les tâches pour lesquelles ils sont entraînés. Cependant, ils sont souvent sensibles aux changements dans la distribution des données, ce qui peut nuir à leur fiabilité. Cela peut se produire lorsque la distribution des données rencontrées au déploiement diffère de celle vue pendant l'entraînement, entraînant une dégradation considérable des performances. Pire encore, les attaquants peuvent également induire de tels changements afin d'induire les modèles d'apprentissage automatique en erreur. Enfin, cela peut même arriver si l'entraînement est effectué séquentiellement sur des distributions de données différentes. Ces changements de distribution sont omniprésents en ML, nuisant à l'équité, à la fiabilité, à la sécurité et à l'efficacité des modèles d'apprentissage automatique. Cette thèse se concentre sur la compréhension et l'amélioration de la robustesse et de l'adaptation des modèles de ML aux changements de distribution, englobant à la fois des travaux théoriques et expérimentaux. Tout d'abord, nous étudions les limites fondamentales de l'optimisation différentiable à plusieurs objectifs. Une meilleure compréhension de ces limites est importante car les travaux sur les changements de distribution reposent souvent sur des formulations de la théorie des jeux. Nous fournissons de nouvelles bornes inférieures sur la vitesse de convergence d'une large classe de méthodes, ainsi que de nouvelles métriques de conditionnement qui aident à évaluer la difficulté d'optimiser des classes de jeux, et expliquent le potentiel de convergence rapide, même sans forte convexité ou forte concavité. Deuxièmement, nous abordons le manque de robustesse aux attaques adversarielles contre plusieurs types d'attaques, une limitation courante des méthodes de pointe. Nous proposons une approche inspirée de la généralisation de domaine, utilisant l'extrapolation des risques (REx) pour promouvoir la robustesse à plusieurs attaques. Notre méthode atteint des performances supérieures aux bases de référence existantes, que les attaques aient été vues ou non lors de l'entraînement. Enfin, nous nous intéressons aux défis du pré-entraînement continu pour les grands modèles de langage (LLM). Ces modèles sont confrontés à un compromis: soit ils oublient de manière catastrophique les connaissances antérieures lorsqu'ils sont mis à jour sur de nouvelles données, soit ils nécessitent un réentraînement complet coûteux en calcul. Nous démontrons qu'une combinaison de réchauffement et de re-décroissance du taux d'apprentissage, et de réutilisation des données précédemment utilisées permet aux LLM d'apprendre continuellement à partir de nouvelles distributions tout en préservant leurs performances sur les données auparavant apprises. Cette approche permet d'atteindre les performances d'un réentraînement complet, mais à une fraction du coût en calcul. Dans l'ensemble, cette thèse apporte des considérations importantes pour améliorer la robustesse et l'adaptation aux changements de distribution. Ces contributions ouvrent des voies prometteuses pour relever les défis du ML du monde réel dans l'optimisation multiobjectif, la défense contre les adversaires et l'apprentissage continu des grands modèles de langage. / Machine learning (ML) models achieve remarkable performance on tasks they are trained for. However, they often are sensitive to shifts in the data distribution, which may lead to unexpected behaviour. This can happen when the data distribution encountered during deployment differs from that used for training, leading to considerable degradation of performance. Worse, attackers may also induce such shifts to fool machine learning models. Finally, this can even happen when training sequentially on different data distribution. These distributional shifts are pervasive in ML, hindering the fairness, reliability, safety and efficiency of machine learning models. This thesis is focused on understanding and improving the robustness and adaptation of ML models to distributional shifts, encompassing both theoretical and experimental work. First, we investigate the fundamental limits of differentiable multiobjective optimisation. This investigation is important because works on distributional shifts often rely on game theoretical formulations. We provide new lower bounds on the speed of convergence of a large class of methods, along with novel condition numbers that help assess the difficulty to optimise classes of games, and explain the potential for fast convergence even without strong convexity or strong concavity. Second, we address the lack of adversarial robustness against multiple attack types, a common limitation of state-of-the-art methods. We propose a domain generalisation-inspired approach, using Risk Extrapolation (REx) to promote robustness across a range of attacks. Our method achieves performance superior to existing baselines for both seen and novel types of attacks. Finally, we tackle the challenges of continual pretraining for large language models (LLMs). These models face a trade-off: either they catastrophically forget previous knowledge when updated on new data, or they require computationally expensive full retraining. We demonstrate that a combination of learning rate re-warming, re-decaying, and the replay of previous data allows LLMs to continually learn from new distributions while preserving past knowledge. This approach matches the performance of full retraining, but at a fraction of the computational cost. Overall, this thesis contributes impactful considerations towards improving robustness and adaptation to distributional shifts. These contributions open promising avenues for addressing real-world ML challenges across multiobjective optimisation, adversarial defense, and continual learning of large language models.

Page generated in 0.0631 seconds