• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 9
  • 1
  • Tagged with
  • 12
  • 8
  • 7
  • 7
  • 6
  • 5
  • 5
  • 5
  • 5
  • 4
  • 4
  • 4
  • 3
  • 3
  • 3
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Can an LLM find its way around a Spreadsheet?

Lee, Cho Ting 05 June 2024 (has links)
Spreadsheets are routinely used in business and scientific contexts, and one of the most vexing challenges data analysts face is performing data cleaning prior to analysis and evaluation. The ad-hoc and arbitrary nature of data cleaning problems, such as typos, inconsistent formatting, missing values, and a lack of standardization, often creates the need for highly specialized pipelines. We ask whether an LLM can find its way around a spreadsheet and how to support end-users in taking their free-form data processing requests to fruition. Just like RAG retrieves context to answer users' queries, we demonstrate how we can retrieve elements from a code library to compose data processing pipelines. Through comprehensive experiments, we demonstrate the quality of our system and how it is able to continuously augment its vocabulary by saving new codes and pipelines back to the code library for future retrieval. / Master of Science / Spreadsheets are frequently utilized in both business and scientific settings, and one of the most challenging tasks that must be accomplished before analysis and evaluation can take place is the cleansing of the data. The ad-hoc and arbitrary nature of issues in data quality, such as typos, inconsistent formatting, missing values, and lack of standardization, often creates the need for highly specialized data cleaning pipelines. Within the scope of this thesis, we investigate whether a large language model (LLM) can navigate its way around a spreadsheet, as well as how to assist end-users in bringing their free-form data processing requests to fruition. Just like Retrieval-Augmented Generation (RAG) retrieves context to answer user queries, we demonstrate how we can retrieve elements from a Python code reference to compose data processing pipelines. Through comprehensive experiments, we showcase the quality of our system and how it is capable of continuously improving its code-writing ability by saving new codes and pipelines back to the code library for future retrieval.
2

Investigating the use of LLMs for automated test generation: challenges, benefits, and suitability

Hurani, Muaz, Idris, Hamzeh January 2024 (has links)
This thesis investigates the application of Large Language Models (LLMs) in auto-mated test generation for software development, focusing on their challenges, bene-fits, and suitability for businesses. The study employs a mixed-methods approach, combining a literature review with empirical evaluations through surveys, interviews, and focus groups involving software developers and testers. Key findings indicate that LLMs enhance the efficiency and speed of test case generation, offering substantial improvements in test coverage and reducing development costs. However, the integration of LLMs poses several challenges, including technical complexities, the need for extensive customization, and concerns about the quality and reliability of the generated test cases. Additionally, ethical issues such as data biases and the potential impact on job roles were highlighted. The results show that while LLMs excel in generating test cases for routine tasks, their effectiveness diminishes in complex scenarios requiring deep domain knowledge and intricate system interactions. The study concludes that with proper training, continuous feedback, and iterative refinement, LLMs can be effectively integrated into existing workflows to complement traditional testing methods.
3

Evaluating the Impact of Hallucinations on User Trust and Satisfaction in LLM-based Systems

Oelschlager, Richard January 2024 (has links)
Hallucinations in LLMs refer to instances where the models generateoutputs that are unrelated, incorrect, or misleading based on the inputprovided. This thesis investigates the impact of hallucinations in largelanguage model (LLM)-based systems on user trust and satisfaction, a criticalissue as AI becomes increasingly integrated into everyday applications.Hallucinations in LLMs—instances where the model generates incorrect ormisleading information—pose significant challenges for user reliability andoverall system effectiveness. Given the expanding role of AI in sectorsrequiring high trust levels, such as healthcare and finance, understanding andmitigating these errors is paramount. To address this issue, a controlled experiment was designed tosystematically assess how hallucinations affect user trust and satisfaction.Participants interacted with an AI system designed to exhibit varying levelsof hallucinatory behavior. Quantitative measures of trust and satisfactionwere collected through standardized questionnaires pre- and post-interaction,accompanied by statistical analyses to evaluate changes in user perception. The results clearly demonstrate that hallucinations significantly diminishuser trust and satisfaction, confirming the hypothesis that the accuracy of AIoutputs is crucial for user reliance. These findings not only contribute to theacademic discourse on human-AI interaction, but also have practicalimplications for AI developers and policymakers focusing on creating andregulating reliable AI technologies. This study bridges a crucial knowledge gap and provides a foundation forfuture research aimed at developing more robust and trustworthy AI systems.Readers engaged in AI development, implementation, and policymaking willfind the insights particularly relevant, encouraging further exploration intostrategies that could enhance user trust in AI technologies.
4

Analyzing Large Language Models For Classifying Sexual Harassment Stories With Out-of-Vocabulary Word Substitution

Seung Yeon Paik (18419409) 25 April 2024 (has links)
<p dir="ltr">Sexual harassment is regarded as a serious issue in society, with a particularly negative impact on young children and adolescents. Online sexual harassment has recently gained prominence as a significant number of communications have taken place online. Online sexual harassment can happen anywhere in the world because of the global nature of the internet, which transcends geographical barriers and allows people to communicate electronically. Online sexual harassment can occur in a wide variety of environments such as through work mail or chat apps in the workplace, on social media, in online communities, and in games (Chawki & El Shazly, 2013).<br>However, especially for non-native English speakers, due to cultural differences and language barriers, may vary in their understanding or interpretation of text-based sexual harassment (Welsh, Carr, MacQuarrie, & Huntley, 2006). To bridge this gap, previous studies have proposed large language models to detect and classify online sexual harassment, prompting a need to explore how language models comprehend the nuanced aspects of sexual harassment data. Prior to exploring the role of language models, it is critical to recognize the current gaps in knowledge that these models could potentially address in order to comprehend and interpret the complex nature of sexual harassment.</p><p><br></p><p dir="ltr">The Large Language Model (LLM) has attracted significant attention recently due to its exceptional performance on a broad spectrum of tasks. However, these models are characterized by being very sensitive to input data (Fujita et al., 2022; Wei, Wang, et al., 2022). Thus, the purpose of this study is to examine how various LLMs interpret data that falls under the domain of sexual harassment and how they comprehend it after replacing Out-of-Vocabulary words.</p><p dir="ltr"><br>This research examines the impact of Out-of-Vocabulary words on the performance of LLMs in classifying sexual harassment behaviors in text. The study compares the story classification abilities of cutting-edge LLM, before and after the replacement of Out-of-Vocabulary words. Through this investigation, the study provides insights into the flexibility and contextual awareness of LLMs when managing delicate narratives in the context of sexual harassment stories as well as raises awareness of sensitive social issues.</p>
5

Augmenting Large Language Models with Humor Theory To Understand Puns

Ryan Rony Dsilva (18429846) 25 April 2024 (has links)
<p dir="ltr">This research explores the application of large language models (LLMs) to comprehension of puns. Leveraging the expansive capabilities of LLMs, this study delves into the domain of pun classification by examining it through the prism of two humor theories: the Computational Model of Humor and the Benign Violation theory, which is an extension of the N+V Theory. The computational model posits that for a phrase to qualify as a pun, it must possess both ambiguity and distinctiveness, characterized by a word that can be interpreted in two plausible ways, each interpretation being supported by at least one unique word. On the other hand, the Benign Violation theory posits that puns work by breaching one linguistic rule while conforming to another, thereby creating a "benign violation." By leveraging the capabilities of large language models (LLMs), this research endeavors to scrutinize a curated collection of English language puns. Our aim is to assess the validity and effectiveness of the use of these theoretical frameworks in accurately classifying puns. We undertake controlled experiments on the dataset, selectively removing a condition specific to one theory and then evaluating the puns based on the criteria of the other theory to see how well it classifies the altered inputs. This approach allows us to uncover deeper insights into the processes that facilitate the recognition of puns and to explore the practical implications of applying humor theories. The findings of our experiments, detailed in the subsequent sections, sheds light on how the alteration of specific conditions impacts the ability of the LLMs to accurately classify puns, according to each theory, where each component of the theory does not influence the result to the same extent, thereby contributing to our understanding of humor mechanics through the eyes of LLMs.</p>
6

Automatic text summarization of French judicial data with pre-trained language models, evaluated by content and factuality metrics

Adler, Malo January 2024 (has links)
During an investigation carried out by a police officer or a gendarme, audition reports are written, the length of which can be up to several pages. The high-level goal of this thesis is to study various automatic and reliable text summarization methods to help with this time-consuming task. One challenge comes from the specific, French and judicial data that we wish to summarize; and another challenge comes from the need for reliable and factual models. First, this thesis focuses on automatic summarization evaluation, in terms of both content (how well the summary captures essential information of the source text) and factuality (to what extent the summary only includes information from or coherent with the source text). Factuality evaluation, in particular, is of crucial interest when using LLMs for judicial purposes, because of their hallucination risks. Notably, we propose a light variation of SelfCheckGPT, which has a stronger correlation with human judgment (0.743) than the wide-spread BARTScore (0.542), or our study dataset. Other paradigms, such as Question-Answering, are studied in this thesis, which however underperform compared to these. Then, extractive summarization methods are explored and compared, including one based on graphs via the TextRank algorithm, and one based on greedy optimization. The latter (overlap rate: 0.190, semantic similarity: 0.513) clearly outperforms the base TextRank (overlap rate: 0.172, semantic similarity: 0.506). An improvement of the TextRank with a threshold mechanism is also proposed, leading to a non-negligible improvement (overlap rate: 0.180, semantic similarity: 0.513). Finally, abstractive summarization, with pre-trained LLMs based on a Transformer architecture, is studied. In particular, several general-purpose and multilingual models (Llama-2, Mistral and Mixtral) were objectively compared on a summarization dataset of judicial procedures from the French police. Results show that the performances of these models are highly related to their size: Llama-2 7B struggles to adapt to uncommon data (overlap rate: 0.083, BARTScore: -3.099), while Llama-2 13B (overlap rate: 0.159, BARTScore: -2.718) and Llama-2 70B (overlap rate: 0.191, BARTScore: -2.479) have proven quite versatile and efficient. To improve the performances of the smallest models, empirical prompt-engineering and parameter-efficient fine-tuning are explored. Notably, our fine-tuned version of Mistral 7B reaches performances comparable to those of much larger models (overlap rate: 0.185, BARTScore: -2.060), without the need for empirical prompt-engineering, and with a linguistic style closer to what is expected. / Under en utredning som görs av en polis eller en gendarm skrivs förhörsprotokoll vars längd kan vara upp till flera sidor. Målet på hög nivå med denna rapport är att studera olika automatiska och tillförlitliga textsammanfattningsmetoder för att hjälpa till med denna tidskrävande uppgift. En utmaning kommer från de specifika franska och rättsliga uppgifter som vi vill sammanfatta; och en annan utmaning kommer från behovet av pålitliga, sakliga och uppfinningsfria modeller. För det första fokuserar denna rapport på automatisk sammanfattningsutvärdering, både vad gäller innehåll (hur väl sammanfattningen fångar väsentlig information i källtexten) och fakta (i vilken utsträckning sammanfattningen endast innehåller information från eller överensstämmer med källtexten). Faktautvärdering, i synnerhet, är av avgörande intresse när man använder LLM för rättsliga ändamål, på grund av deras hallucinationsrisker. Vi föreslår särskilt en lätt variant av SelfCheckGPT, som har en starkare korrelation med mänskligt omdöme (0,743) än den utbredda BARTScore (0,542), eller vår studiedatauppsättning. Andra paradigm, såsom Question-Answering, studeras i denna rapport, som dock underpresterar jämfört med dessa. Sedan utforskas och jämförs extraktiva sammanfattningsmetoder, inklusive en baserad på grafer via TextRank-algoritmen och en baserad på girig optimering. Den senare (överlappning: 0,190, semantisk likhet: 0,513) överträffar klart basen TextRank (överlappning: 0,172, semantisk likhet: 0,506). En förbättring av TextRank med en tröskelmekanism föreslås också, vilket leder till en icke försumbar förbättring (överlappning: 0,180, semantisk likhet: 0,513). Slutligen studeras abstrakt sammanfattning, med förutbildade LLM baserade på en transformatorarkitektur. I synnerhet jämfördes flera allmänna och flerspråkiga modeller (Llama-2, Mistral och Mixtral) objektivt på en sammanfattningsdatauppsättning av rättsliga förfaranden från den franska polisen. Resultaten visar att prestandan för dessa modeller är starkt relaterade till deras storlek: Llama-2 7B kämpar för att anpassa sig till ovanliga data (överlappning: 0,083, BARTScore: -3,099), medan Llama-2 13B (överlappning: 0,159, BARTScore: -2,718) och Llama-2 70B (överlappning: 0,191, BARTScore: -2,479) har visat sig vara ganska mångsidiga och effektiva. För att förbättra prestandan för de minsta modellerna utforskas empirisk prompt-teknik och parametereffektiv finjustering. Noterbart är att vår finjusterade version av Mistral 7B når prestanda som är jämförbara med de för mycket större modeller (överlappning: 0,185, BARTScore: -2,060), utan behov av empirisk prompt-teknik och med en språklig stil som ligger närmare vad som förväntas.
7

Exploring artificial intelligence bias : a comparative study of societal bias patterns in leading AI-powered chatbots.

Udała, Katarzyna Agnieszka January 2023 (has links)
The development of artificial intelligence (AI) has revolutionised the way we interact with technology and each other, both in society and in professional careers. Although they come with great potential for productivity and automation, AI systems have been found to exhibit biases that reflect and perpetuate existing societal inequalities. With the recent rise of artificial intelligence tools exploiting the large language model (LLM) technology, such as ChatGPT, Bing Chat and Bard AI, this research project aims to investigate the extent of AI bias in said tools and explore its ethical implications. By reviewing and analysing responses to carefully crafted prompts generated by three different AI chatbot tools, the author will intend to determine whether the content generated by these tools indeed exhibits patterns of bias related to various social identities, as well as compare the extent to which such bias is present across all three tools. This study will contribute to the growing body of literature on AI ethics and inform efforts to develop more equitable and inclusive AI systems. By exploring the ethical dimensions of AI bias in selected LLMs, this research will shed light on the broader societal implications of AI and the role of technology in shaping our future.
8

Characterizing, classifying and transforming language model distributions

Kniele, Annika January 2023 (has links)
Large Language Models (LLMs) have become ever larger in recent years, typically demonstrating improved performance as the number of parameters increases. This thesis investigates how the probability distributions output by language models differ depending on the size of the model. For this purpose, three features for capturing the differences between the distributions are defined, namely the difference in entropy, the difference in probability mass in different slices of the distribution, and the difference in the number of tokens covering the top-p probability mass. The distributions are then put into different distribution classes based on how they differ from the distributions of the differently-sized model. Finally, the distributions are transformed to be more similar to the distributions of the other model. The results suggest that classifying distributions before transforming them, and adapting the transformations based on which class a distribution is in, improves the transformation results. It is also shown that letting a classifier choose the class label for each distribution yields better results than using random labels. Furthermore, the findings indicate that transforming the distributions using entropy and the number of tokens in the top-p probability mass makes the distributions more similar to the targets, while transforming them based on the probability mass of individual slices of the distributions makes the distributions more dissimilar.
9

Exploring Generative AI for Enhanced Guided Buying Efficiency : A Case Study at Battery Manufacturing Firm

Gupta, Sparsh January 2024 (has links)
The rapidly evolving domain of artificial intelligence has given rise to generative AI technology, which, unlike traditional machine learning, is capable of learning patterns from data and generating new, meaningful outputs. These models have applications in various domains, including customer service, content creation, and personalized recommendations. Understanding the implementation of generative AI is essential for business leaders to harness its potential and drive innovation. This thesis focuses on the application of generative AI for guided buying within the context of Company X, aiming to address the challenges and potential solutions in streamlining the purchase of goods and services. The research methodology involves using elements from the grounded theory approach, utilizing a focus group discourse approach for empirical analysis. By exploring the impact of generative AI on procurement processes and an organization's orientation to guided buying, the study contributes to enhancing strategic capabilities of the organization within the competitive industrial landscape. The results indicate that there three dimensions 1) Operational Stakeholders 2) Generative AI Robustness and 3) Information Management for effective introduction of generative AI into procurement practices. The overall contribution was made to the general academic attempt to understand how to intergrate generative AI technologies into various enterprise functions, specifically within Supply Chain and Procurement.
10

Avancerade Stora Språk Modeller i Praktiken : En Studie av ChatGPT-4 och Google Bard inom Desinformationshantering

Ahmadi, Aref, Barakzai, Ahmad Naveed January 2023 (has links)
SammanfattningI  denna  studie  utforskas  kapaciteterna  och  begränsningarna  hos  avancerade  stora språkmodeller (SSM), med särskilt fokus på ChatGPT-4 och Google Bard. Studien inleds med att ge en historisk bakgrund till artificiell intelligens och hur denna utveckling har lett fram till skapandet av dessa modeller. Därefter genomförs en kritisk analys av deras prestanda i språkbehandling och problemlösning. Genom att evaluera deras effektivitet i hanteringen av nyhetsinnehåll och sociala medier, samt i utförandet av kreativa uppgifter som pussel, belyses deras förmåga inom språklig bearbetning samt de utmaningar de möter i att förstå nyanser och utöva kreativt tänkande.I denna studie framkom det att SSM har en avancerad förmåga att förstå och reagera på komplexa språkstrukturer. Denna förmåga är dock inte utan begränsningar, speciellt när det kommer till uppgifter som kräver en noggrann bedömning för att skilja mellan sanning och osanning. Denna observation lyfter fram en kritisk aspekt av SSM:ernas nuvarande kapacitet, de är effektiva inom många områden, men möter fortfarande utmaningar i att hantera de finare nyanserna i mänskligt språk och tänkande. Studiens resultat betonar även vikten av mänsklig tillsyn vid användning av artificiell intelligens (AI), vilket pekar på behovet av att ha realistiska förväntningar på AI:s kapacitet och betonar vidare betydelsen av en ansvarsfull utveckling  av  AI,  där  en  noggrann  uppmärksamhet  kring etiska  aspekter  är  central.  En kombination av mänsklig intelligens och AI föreslås som en lösning för att hantera komplexa utmaningar, vilket bidrar till en fördjupad förståelse av avancerade språkmodellers dynamik och deras roll inom AI:s bredare utveckling och tillämpning.

Page generated in 0.2383 seconds