Spelling suggestions: "subject:"prompted engineering""
11 |
Automatic generation of definitions : Exploring if GPT is useful for defining wordsEriksson, Fanny January 2023 (has links)
When reading a text, it is common to get stuck on unfamiliar words that are difficult to understand in the local context. In these cases, we use dictionaries or similar online resources to find the general meaning of the word. However, maintaining a handwritten dictionary is highly resource demanding as the language is constantly developing, and using generative language models for producing definitions could therefore be a more efficient option. To explore this possibility, this thesis performs an online survey to examine if GPT could be useful for defining words. It also investigates how well the Swedish language model GPT-SW3 (3.5 b) define words compared to the model text-davinci-003, and how prompts should be formatted when defining words with these models. The results indicate that text-davinci-003 generates high quality definitions, and according to students t-test, the definitions received significantly higher ratings from participants than definitions taken from Svensk ordbok (SO). Furthermore, the results showed that GPT-SW3 (3.5 b) received the lowest ratings, indicating that it takes more investment to keep up with the big models developed by OpenAI. Regarding prompt formatting, the most appropriate prompt format for defining words is highly dependent on the model, and the results showed that text- davinci-003 performed well using zero-shot, while GPT-SW3 (3.5 b) required a few shot setting. Considering both the high quality of the definitions generated by text-davinci-003, and the practical advantages with generating definitions automatically, GPT could be a useful method for defining words.
|
12 |
GENERATING SQL FROM NATURAL LANGUAGE IN FEW-SHOT AND ZERO-SHOT SCENARIOSAsplund, Liam January 2024 (has links)
Making information stored in databases more accessible to users inexperienced in structured query language (SQL) by converting natural language to SQL queries has long been a prominent research area in both the database and natural language processing (NLP) communities. There have been numerous approaches proposed for this task, such as encoder-decoder frameworks, semantic grammars, and more recently with the use of large language models (LLMs). When training LLMs to successfully generate SQL queries from natural language questions there are three notable methods used, pretraining, transfer learning and in-context learning (ICL). ICL is particularly advantageous in scenarios where the hardware at hand is limited, time is of concern and large amounts of task specific labled data is nonexistent. This study seeks to evaluate two strategies in ICL, namely zero-shot and few-shot scenarios using the Mistral-7B-Instruct LLM. Evaluation of the few-shot scenarios was conducted using two techniques, random selection and Jaccard Similarity. The zero-shot scenarios served as a baseline for the few-shot scenarios to overcome, which ended as anticipated, with the few-shot scenarios using Jaccard similarity outperforming the other two methods, followed by few-shot scenarios using random selection coming in at second best, and the zero-shot scenarios performing the worst. Evaluation results acquired based on execution accuracy and exact matching accuracy confirm that leveraging similarity in demonstrating examples when prompting the LLM will enhance the models knowledge about the database schema and table names which is used during the inference phase leadning to more accurately generated SQL queries than leveraging diversity in demonstrating examples.
|
13 |
Assisted Prompt Engineering : Making Text-to-Image Models Available Through Intuitive Prompt Applications / Assisterad Prompt Engineering : Gör Text-till-Bild Modeller Tillgängliga Med Intuitiva Prompt ApplikationerBjörnler, Zimone January 2024 (has links)
This thesis explores the application of prompt engineering combined with human-AI interaction (HAII) to make text-to-image (TTI) models more accessible and intuitive for non-expert users. The thesis research focuses on developing an application with an intuitive interface that enables users to generate images without extensive knowledge of prompt engineering. A pre-post study was conducted to evaluate the application, demonstrating significant improvements in user satisfaction and ease of use. The findings suggest that such tailored interfaces can make AI technologies more accessible, empowering users to engage creatively with minimal technical barriers. This study contributes to the fields of Media technology and AI by showcasing how simplifying prompt engineering can enhance the accessibility of generative AI tools. / Detta examensarbete utforskar tillämpningen av prompt engineering i kombination med human-AI interaction för att göra text-till-bild modeller mer tillgängliga och intuitiva för icke-experter. Forskningen för examensarbetet fokuseras på att utveckla en applikation med ett intuitivt gränssnitt som gör det möjligt för användare att generera bilder utan omfattande kunskaper om prompt engineering. En före-efter-studie genomfördes för att utvärdera applikationen, vilket visade på en tydlig ökning i användarnöjdhet och användarvänlighet. Utfallet från studien tyder på att skräddarsydda gränssnitt kan göra AI-tekniken mer tillgänglig, och göra det möjligt för användare att nyttja det kreativa skapandet med minimerade tekniska hinder. Den här studien bidrar till områdena avmedieteknik och AI genom att demonstrera hur prompt engineering kan förenklas vilket kan förbättra tillgängligheten av AI-verktyg.
|
14 |
The impact of task specification on code generated via ChatGPTLundblad, Jonathan, Thörn, Edwin, Thörn, Linus January 2023 (has links)
ChatGPT has made large language models more accessible and made it possible to code using natural language prompts. This study conducted an experiment comparing prompt engineering techniques called task specification and investigated their impacton code generation in terms of correctness and variety. The hypotheses of this study focused on whether the baseline method had a statistically significant difference in code correctness compared to the other methods. Code is evaluated using a software requirement specification that measures functional and syntactical correctness. Additionally, code variance is measured to identify patterns in code generation. The results show that there is a statistically significant difference in some code correctness criteria between the baseline and the other task specification methods, and the code variance measurements indicate a variety in the generated solutions. Future work could include using another large language model; different programming tasks andprogramming languages; and other prompt engineering techniques.
|
15 |
Matching ESCF Prescribed Cyber Security Skills with the Swedish Job Market : Evaluating the Effectiveness of a Language ModelAhmad, Al Ghaith, Abd ULRAHMAN, Ibrahim January 2023 (has links)
Background: As the demand for cybersecurity professionals continues to rise, it is crucial to identify the key skills necessary to thrive in this field. This research project sheds light on the cybersecurity skills landscape by analyzing the recommendations provided by the European Cybersecurity Skills Framework (ECSF), examining the most required skills in the Swedish job market, and investigating the common skills identified through the findings. The project utilizes the large language model, ChatGPT, to classify common cybersecurity skills and evaluate its accuracy compared to human classification. Objective: The primary objective of this research is to examine the alignment between the European Cybersecurity Skills Framework (ECSF) and the specific skill demands of the Swedish cybersecurity job market. This study aims to identify common skills and evaluate the effectiveness of a Language Model (ChatGPT) in categorizing jobs based on ECSF profiles. Additionally, it seeks to provide valuable insights for educational institutions and policymakers aiming to enhance workforce development in the cybersecurity sector. Methods: The research begins with a review of the European Cybersecurity Skills Framework (ECSF) to understand its recommendations and methodology for defining cybersecurity skills as well as delineating the cybersecurity profiles along with their corresponding key cybersecurity skills as outlined by ECSF. Subsequently, a Python-based web crawler, implemented to gather data on cybersecurity job announcements from the Swedish Employment Agency's website. This data is analyzed to identify the most frequently required cybersecurity skills sought by employers in Sweden. The Language Model (ChatGPT) is utilized to classify these positions according to ECSF profiles. Concurrently, two human agents manually categorize jobs to serve as a benchmark for evaluating the accuracy of the Language Model. This allows for a comprehensive assessment of its performance. Results: The study thoroughly reviews and cites the recommended skills outlined by the ECSF, offering a comprehensive European perspective on key cybersecurity skills (Tables 4 and 5). Additionally, it identifies the most in-demand skills in the Swedish job market, as illustrated in Figure 6. The research reveals the matching between ECSF-prescribed skills in different profiles and those sought after in the Swedish cybersecurity market. The skills of the profiles 'Cybersecurity Implementer' and 'Cybersecurity Architect' emerge as particularly critical, representing over 58% of the market demand. This research further highlights shared skills across various profiles (Table 7). Conclusion: This study highlights the matching between the European Cybersecurity Skills Framework (ECSF) recommendations and the evolving demands of the Swedish cybersecurity job market. Through a review of ECSF-prescribed skills and a thorough examination of the Swedish job landscape, this research identifies crucial areas of alignment. Significantly, the skills associated with 'Cybersecurity Implementer' and 'Cybersecurity Architect' profiles emerge as central, collectively constituting over 58% of market demand. This emphasizes the urgent need for educational programs to adapt and harmonize with industry requisites. Moreover, the study advances our understanding of the Language Model's effectiveness in job categorization. The findings hold significant implications for workforce development strategies and educational policies within the cybersecurity domain, underscoring the pivotal role of informed skills development in meeting the evolving needs of the cybersecurity workforce.
|
16 |
An Empirical Study on Using Codex for Automated Program RepairZhao, Pengyu January 2023 (has links)
This thesis explores the potential of Codex, a pre-trained Large Language Model (LLM), for Automated Program Repair (APR) by assessing its performance on the Defects4J benchmark that includes real-world Java bugs. The study aims to provide a comprehensive understanding of Codex’s capabilities and limitations in generating syntactically and semantically equivalent patches for defects, as well as evaluating its ability to handle defects with different levels of importance and complexity. Additionally, we aim to compare the performance of Codex with other LLMs in the APR domain. To achieve these objectives, we employ a systematic methodology that includes prompt engineering, Codex parameter adjustment, code extraction, patch verification, and Abstract Syntax Tree (AST) comparison. We successfully verified 528 bugs in Defects4J, which represents the highest number among other studies, and achieved 53.98% of plausible and 26.52% correct patches. Furthermore, we introduce the elle-elle-aime framework, which extends the RepairThemAll for Codex-based APR and is adaptable for evaluating other LLMs, such as ChatGPT and GPT-4. The findings of this empirical study provide valuable insights into the factors that impact Codex’s performance on APR, helping to create new prompt strategies and techniques that improve research productivity. / Denna avhandling utforskar potentialen hos Codex, en förtränad LLM, för APR genom att utvärdera dess prestanda på Defects4J-benchmarket som inkluderar verkliga Java-buggar. Studien syftar till att ge en omfattande förståelse för Codex förmågor och begränsningar när det gäller att generera syntaktiskt och semantiskt ekvivalenta patchar för defekter samt att utvärdera dess förmåga att hantera defekter med olika nivåer av betydelse och komplexitet. Dessutom är vårt mål att jämföra prestanda hos Codex med andra LLM inom APR-området. För att uppnå dessa mål använder vi en systematisk metodik som inkluderar prompt engineering, justering av Codex-parametrar, kodextraktion, patchverifiering och jämförelse av AST. Vi verifierade framgångsrikt 528 buggar i Defects4J, vilket representerar det högsta antalet bland andra studier, och uppnådde 53,98% plausibla och 26,52% korrekta patchar. Vidare introducerar vi elle-elle-aime ramverket, som utvidgar RepairThemAll för Codex-baserad APR och är anpassningsbart för att utvärdera andra LLM, såsom ChatGPT och GPT-4. Resultaten av denna empiriska studie ger värdefulla insikter i de faktorer som påverkar Codex prestanda på APR och hjälper till att skapa nya promptstrategier och tekniker som förbättrar forskningsproduktiviteten.
|
17 |
Code Generation from Large API Specifications with Open Large Language Models : Increasing Relevance of Code Output in Initial Autonomic Code Generation from Large API Specifications with Open Large Language ModelsLyster Golawski, Esbjörn, Taylor, James January 2024 (has links)
Background. In software systems defined by extensive API specifications, auto- nomic code generation can streamline the coding process by replacing repetitive, manual tasks such as creating REST API endpoints. The use of large language models (LLMs) for generating source code comprehensively on the first try requires refined prompting strategies to ensure output relevancy, a challenge that grows as API specifications become larger. Objectives. This study aims to develop and validate a prompting orchestration solution for LLMs that generates more relevant, non-duplicated code compared to a single comprehensive prompt, without refactoring previous code. Additionally, the study evaluates the practical value of the generated code for developers at Ericsson familiar with the target application that uses the same API specification. Methods. Employing a prototyping approach, we develop a solution that produces more relevant, non-duplicated code compared to a single prompt with local-hosted LLMs for the target API at Ericsson. We perform a controlled experiment running the developed solution and a single prompt to collect the outputs. Using the results, we conduct interviews with Ericsson developers about the value of the AI-generated code. Results. The study identified a prompting orchestration method that generated 427 relevant lines of code (LOC) on average in the best-case scenario compared to 66 LOC with a single comprehensive prompt. Additionally, 66% of the developers interviewed preferred using the AI-generated code as a starting point over starting from scratch when developing applications for Ericsson, and 66% preferred starting from the AI-generated code over code generated from the same API specification via Swagger CodeGen. Conclusions. Increasing the extent locally hosted LLMs can generate relevant code from large API specifications without refactoring the generated code in comparison to a single comprehensive prompt is possible with the right prompting orchestration method. The value of the generated code is that it can currently be used as a good starting point for further software development.
|
18 |
Investigating an Age-Inclusive Medical AI Assistant with Large Language Models : User Evaluation with Older Adults / Undersökning av en åldersinkluderande medicinsk AI-assistent med stora språkmodeller : Snvändarstudier med äldre vuxnaMagnus, Thulin January 2024 (has links)
The integration of Large Language Models (LLMs) such as GPT-4 and Gemini into healthcare, particularly for elderly care, represents a significant opportunity in the use of artificial intelligence in medical settings. This thesis investigates the capabilities of these models to understand and respond to the healthcare needs of older adults effectively. A framework was developed to evaluate their performance, consisting of specifically designed medical scenarios that simulate real-life interactions, prompting strategies to elicit responses and a comprehensive user evaluation to assess technical performance and contextual understanding. The analysis reveals that while LLMs such as GPT-4 and Gemini exhibit high levels of technical proficiency, their contextual performance shows considerable variability, especially in personalization and handling complex, empathy-driven interactions. In simpler tasks, these models demonstrate appropriate responsiveness, but they struggle with more complex scenarios that require deep medical reasoning and personalized communication. Despite these challenges, the research highlights the potential of LLMs to significantly enhance healthcare delivery for older adults by providing timely and relevant medical information. However, to realize a truly effective implementation, further development is necessary to improve the models’ ability to engage in meaningful dialogue and understand the nuanced needs of an aging population. The findings underscore the necessity of actively involving older adults in the development of AI technologies, ensuring that these models are tailored to their specific needs. This includes focusing on enhancing the contextual and demographic awareness of AI systems. Future efforts should focus on enhancing these models by incorporating user feedback from the older population and applying user-centered design principles to improve accessibility and usability. Such improvements will better support the diverse needs of aging populations in healthcare settings, enhancing care delivery for both patients and doctors while maintaining the essential human touch in medical interactions. / Integrationen av stora språkmodeller (LLMs) såsom GPT-4 och Gemini inom sjukvården, särskilt inom äldrevård, representerar betydande möjligheter i användningen av artificiell intelligens i medicinska sammanhang. Denna avhandling undersöker dessa modellers förmåga att förstå och effektivt svara på äldres vårdbehov. För att utvärdera deras prestanda utvecklades ett ramverk bestående av specifikt utformade medicinska situationer som simulerar verkliga interaktioner, strategier för att framkalla relevanta svar från modellerna och en omfattande användarutvärdering för att bedöma både teknisk prestanda och kontextuell förståelse. Analysen visar att även om LLMs såsom GPT-4 och Gemini visar på hög teknisk prestationsförmåga, är dess kontextuella förmåga mer begränsad, särskilt när det gäller personalisering och hantering av komplexa, empatidrivna interaktioner. Vid enklare uppgifter visar dessa modeller på en lämplig responsivitet, men de utmanas vid mer komplexa scenarier som kräver djup medicinsk resonemang och personlig kommunikation. Trots dessa utmaningar belyser denna forskning potentialen hos LLMs att väsentligt förbättra vårdleveransen för äldre genom att tillhandahålla aktuell och relevant medicinsk information. Däremot krävs ytterligare utveckling för att verkligen möjliggöra en effektiv implementering, vilket inkluderar att förbättra modellernas förmåga att delta i en meningsfull dialog och förstå de nyanserade behoven hos äldre patienter. Resultaten från denna avhandling understryker nödvändigheten av att aktivt involvera äldre individer i utvecklingen av AI-teknologier, för att säkerställa att dessa modeller är skräddarsydda för deras specifika behov. Detta inkluderar ett fokus på att förbättra den kontextuella och demografiska medvetenheten hos AI-system. Framtida insatser bör inriktas på att förbättra dessa modeller genom att integrera användarfeedback från äldre populationer och tillämpa principer för användarcentrerad design för att förbättra tillgänglighet och användbarhet. Sådana förbättringar kommer att bättre stödja de mångsidiga behoven hos äldre i vårdsammanhang, förbättra vårdleveransen för både patienter och läkare samtidigt som den väsentliga mänskliga kontakten i medicinska interaktioner bibehålls.
|
19 |
Bridging Language & Data : Optimizing Text-to-SQL Generation in Large Language Models / Från ord till SQL : Optimering av text-till-SQL-generering i stora språkmodellerWretblad, Niklas, Gordh Riseby, Fredrik January 2024 (has links)
Text-to-SQL, which involves translating natural language into Structured Query Language (SQL), is crucial for enabling broad access to structured databases without expert knowledge. However, designing models for such tasks is challenging due to numerous factors, including the presence of ’noise,’ such as ambiguous questions and syntactical errors. This thesis provides an in-depth analysis of the distribution and types of noise in the widely used BIRD-Bench benchmark and the impact of noise on models. While BIRD-Bench was created to model dirty and noisy database values, it was not created to contain noise and errors in the questions and gold queries. We found after a manual evaluation that noise in questions and gold queries are highly prevalent in the financial domain of the dataset, and a further analysis of the other domains indicate the presence of noise in other parts as well. The presence of incorrect gold SQL queries, which then generate incorrect gold answers, has a significant impact on the benchmark’s reliability. Surprisingly, when evaluating models on corrected SQL queries, zero-shot baselines surpassed the performance of state-of-the-art prompting methods. The thesis then introduces the concept of classifying noise in natural language questions, aiming to prevent the entry of noisy questions into text-to-SQL models and to annotate noise in existing datasets. Experiments using GPT-3.5 and GPT-4 on a manually annotated dataset demonstrated the viability of this approach, with classifiers achieving up to 0.81 recall and 80% accuracy. Additionally, the thesis explored the use of LLMs for automatically correcting faulty SQL queries. This showed a 100% success rate for specific query corrections, highlighting the potential for LLMs in improving dataset quality. We conclude that informative noise labels and reliable benchmarks are crucial to developing new Text-to-SQL methods that can handle varying types of noise.
|
20 |
Tailored Query Resolution for Medical Data Interaction: Integrating LangChain4j, LLMs, and Retrieval Augmented Generation : Utilizing Real Time Embedding Techniques / Skräddarsydd Frågeupplösning för Interaktion med Medicinsk Data: Integrering av LangChain4j, LLMs och Hämtnings-Förstärkt Generation : Med realtidsinbäddningteknikerTegsten, Samuel January 2024 (has links)
Current artificial intelligence tools, including machine learning and large language models, display inabilities to interact with medical data in real time and raise privacy concerns related to user data management. This study illustrates the development of a system prototype using LangChain4j, which is an open-source project offering a multitude of AI-tools, including embedding tools, retrieval-augmented generation, and unified API:s for large language model providers. It was utilized to process medical data from a Neo4j database and enabled real-time interaction for that data. All content generation was generated locally to address privacy concerns, while using Apache Kafka for data distribution. The system prototype was evaluated by response time, resource consumption and accuracy assessment. Among the models assessed, LLaMA 3 emerged as the top performer in accuracy, successfully identifying 42.87% of all attributes with a correctness rate of 89.81%. Meanwhile, Phi3 exhibited superior outcomes in both resource consumption and response time. The embedding process, while enabling the selection of visible data, imposed limitations on general usability. In summary, this thesis advances data interaction using AI by developing a prototype that enables real-time interaction with medical data. It achieves high accuracy and efficient resource utilization while addressing limitations in current AI tools related to real-time processing and privacy concerns. / Nuvarande verktyg för artificiell intelligens, inklusive maskininlärning och stora språkmodeller, visar oförmåga att interagera med medicinska data i realtid och väcker integritetsproblem relaterade till hantering av användardata. Denna studie illustrerar utvecklingen av ett systemprototyp med LangChain4j, ett open-source-projekt som erbjuder en mängd AI-verktyg, inklusive inbäddningsverktyg, retrieval-augmented generation och enhetliga API för leverantörer av stora språkmodeller. Det användes för att bearbeta medicinska data från en Neo4j-databas och möjliggjorde realtidsinteraktion för dessa data. All innehållsgenerering skedde lokalt med Apache Kafka för datadistribution. Systemprototypen utvärderades utifrån svarstid, resursförbrukning och noggrannhetsbedömning. Bland de modeller som utvärderades visade sig LLaMA 3 vara den bästa presteraren i noggrannhet, och identifierade framgångsrikt 42,87 % av alla attribut med en korrekthet på 89,81 %. Samtidigt visade Phi3 överlägsna resultat både i resursförbrukning och svarstid. Inbäddningsprocessen, medan den möjliggjorde valet av synliga data, innebar begränsningar för allmän användbarhet. Sammanfattningsvis förbättrar denna avhandling datainteraktion med AI genom att utveckla en prototyp som möjliggör realtidsinteraktion med medicinska data. Den uppnår hög noggrannhet och effektiv resursanvändning samtidigt som den adresserar begränsningar i nuvarande AI-verktyg relaterade till realtidsbearbetning och integritetsproblem.
|
Page generated in 0.0932 seconds