Spelling suggestions: "subject:"prompted engineering""
21 |
Automatic text summarization of French judicial data with pre-trained language models, evaluated by content and factuality metricsAdler, Malo January 2024 (has links)
During an investigation carried out by a police officer or a gendarme, audition reports are written, the length of which can be up to several pages. The high-level goal of this thesis is to study various automatic and reliable text summarization methods to help with this time-consuming task. One challenge comes from the specific, French and judicial data that we wish to summarize; and another challenge comes from the need for reliable and factual models. First, this thesis focuses on automatic summarization evaluation, in terms of both content (how well the summary captures essential information of the source text) and factuality (to what extent the summary only includes information from or coherent with the source text). Factuality evaluation, in particular, is of crucial interest when using LLMs for judicial purposes, because of their hallucination risks. Notably, we propose a light variation of SelfCheckGPT, which has a stronger correlation with human judgment (0.743) than the wide-spread BARTScore (0.542), or our study dataset. Other paradigms, such as Question-Answering, are studied in this thesis, which however underperform compared to these. Then, extractive summarization methods are explored and compared, including one based on graphs via the TextRank algorithm, and one based on greedy optimization. The latter (overlap rate: 0.190, semantic similarity: 0.513) clearly outperforms the base TextRank (overlap rate: 0.172, semantic similarity: 0.506). An improvement of the TextRank with a threshold mechanism is also proposed, leading to a non-negligible improvement (overlap rate: 0.180, semantic similarity: 0.513). Finally, abstractive summarization, with pre-trained LLMs based on a Transformer architecture, is studied. In particular, several general-purpose and multilingual models (Llama-2, Mistral and Mixtral) were objectively compared on a summarization dataset of judicial procedures from the French police. Results show that the performances of these models are highly related to their size: Llama-2 7B struggles to adapt to uncommon data (overlap rate: 0.083, BARTScore: -3.099), while Llama-2 13B (overlap rate: 0.159, BARTScore: -2.718) and Llama-2 70B (overlap rate: 0.191, BARTScore: -2.479) have proven quite versatile and efficient. To improve the performances of the smallest models, empirical prompt-engineering and parameter-efficient fine-tuning are explored. Notably, our fine-tuned version of Mistral 7B reaches performances comparable to those of much larger models (overlap rate: 0.185, BARTScore: -2.060), without the need for empirical prompt-engineering, and with a linguistic style closer to what is expected. / Under en utredning som görs av en polis eller en gendarm skrivs förhörsprotokoll vars längd kan vara upp till flera sidor. Målet på hög nivå med denna rapport är att studera olika automatiska och tillförlitliga textsammanfattningsmetoder för att hjälpa till med denna tidskrävande uppgift. En utmaning kommer från de specifika franska och rättsliga uppgifter som vi vill sammanfatta; och en annan utmaning kommer från behovet av pålitliga, sakliga och uppfinningsfria modeller. För det första fokuserar denna rapport på automatisk sammanfattningsutvärdering, både vad gäller innehåll (hur väl sammanfattningen fångar väsentlig information i källtexten) och fakta (i vilken utsträckning sammanfattningen endast innehåller information från eller överensstämmer med källtexten). Faktautvärdering, i synnerhet, är av avgörande intresse när man använder LLM för rättsliga ändamål, på grund av deras hallucinationsrisker. Vi föreslår särskilt en lätt variant av SelfCheckGPT, som har en starkare korrelation med mänskligt omdöme (0,743) än den utbredda BARTScore (0,542), eller vår studiedatauppsättning. Andra paradigm, såsom Question-Answering, studeras i denna rapport, som dock underpresterar jämfört med dessa. Sedan utforskas och jämförs extraktiva sammanfattningsmetoder, inklusive en baserad på grafer via TextRank-algoritmen och en baserad på girig optimering. Den senare (överlappning: 0,190, semantisk likhet: 0,513) överträffar klart basen TextRank (överlappning: 0,172, semantisk likhet: 0,506). En förbättring av TextRank med en tröskelmekanism föreslås också, vilket leder till en icke försumbar förbättring (överlappning: 0,180, semantisk likhet: 0,513). Slutligen studeras abstrakt sammanfattning, med förutbildade LLM baserade på en transformatorarkitektur. I synnerhet jämfördes flera allmänna och flerspråkiga modeller (Llama-2, Mistral och Mixtral) objektivt på en sammanfattningsdatauppsättning av rättsliga förfaranden från den franska polisen. Resultaten visar att prestandan för dessa modeller är starkt relaterade till deras storlek: Llama-2 7B kämpar för att anpassa sig till ovanliga data (överlappning: 0,083, BARTScore: -3,099), medan Llama-2 13B (överlappning: 0,159, BARTScore: -2,718) och Llama-2 70B (överlappning: 0,191, BARTScore: -2,479) har visat sig vara ganska mångsidiga och effektiva. För att förbättra prestandan för de minsta modellerna utforskas empirisk prompt-teknik och parametereffektiv finjustering. Noterbart är att vår finjusterade version av Mistral 7B når prestanda som är jämförbara med de för mycket större modeller (överlappning: 0,185, BARTScore: -2,060), utan behov av empirisk prompt-teknik och med en språklig stil som ligger närmare vad som förväntas.
|
22 |
Introducing Generative Artificial Intelligence in Tech Organizations : Developing and Evaluating a Proof of Concept for Data Management powered by a Retrieval Augmented Generation Model in a Large Language Model for Small and Medium-sized Enterprises in Tech / Introducering av Generativ Artificiell Intelligens i Tech Organisationer : Utveckling och utvärdering av ett Proof of Concept för datahantering förstärkt av en Retrieval Augmented Generation Model tillsammans med en Large Language Model för små och medelstora företag inom TechLithman, Harald, Nilsson, Anders January 2024 (has links)
In recent years, generative AI has made significant strides, likely leaving an irreversible mark on contemporary society. The launch of OpenAI's ChatGPT 3.5 in 2022 manifested the greatness of the innovative technology, highlighting its performance and accessibility. This has led to a demand for implementation solutions across various industries and companies eager to leverage these new opportunities generative AI brings. This thesis explores the common operational challenges faced by a small-scale Tech Enterprise and, with these challenges identified, examines the opportunities that contemporary generative AI solutions may offer. Furthermore, the thesis investigates what type of generative technology is suitable for adoption and how it can be implemented responsibly and sustainably. The authors approach this topic through 14 interviews involving several AI researchers and the employees and executives of a small-scale Tech Enterprise, which served as a case company, combined with a literature review. The information was processed using multiple inductive thematic analyses to establish a solid foundation for the investigation, which led to the development of a Proof of Concept. The findings and conclusions of the authors emphasize the high relevance of having a clear purpose for the implementation of generative technology. Moreover, the authors predict that a sustainable and responsible implementation can create the conditions necessary for the specified small-scale company to grow. When the authors investigated potential operational challenges at the case company it was made clear that the most significant issue arose from unstructured and partially absent documentation. The conclusion reached by the authors is that a data management system powered by a Retrieval model in a LLM presents a potential path forward for significant value creation, as this solution enables data retrieval functionality from unstructured project data and also mitigates a major inherent issue with the technology, namely, hallucinations. Furthermore, in terms of implementation circumstances, both empirical and theoretical findings suggest that responsible use of generative technology requires training; hence, the authors have developed an educational framework named "KLART". Moving forward, the authors describe that sustainable implementation necessitates transparent systems, as this increases understanding, which in turn affects trust and secure use. The findings also indicate that sustainability is strongly linked to the user-friendliness of the AI service, leading the authors to emphasize the importance of HCD while developing and maintaining AI services. Finally, the authors argue for the value of automation, as it allows for continuous data and system updates that potentially can reduce maintenance. In summary, this thesis aims to contribute to an understanding of how small-scale Tech Enterprises can implement generative AI technology sustainably to enhance their competitive edge through innovation and data-driven decision-making.
|
Page generated in 0.0826 seconds