Spelling suggestions: "subject:"prompt engineering"" "subject:"kprompt engineering""
1 |
Responsible AI in Educational Chatbots: Seamless Integration and Content Moderation Strategies / Ansvarsfull AI i pedagogiska chatbots: strategier för sömlös integration och moderering av innehållEriksson, Hanna January 2024 (has links)
With the increasing integration of artificial intelligence (AI) technologies into educational settings, it becomes important to ensure responsible and effective use of these systems. This thesis addresses two critical challenges within AI-driven educational applications: the effortless integration of different Large Language Models (LLMs) and the mitigation of inappropriate content. An AI assistant chatbot was developed, allowing teachers to design custom chatbots and set rules for them, enhancing students’ learning experiences. Evaluation of LangChain as a framework for LLM integration, alongside various prompt engineering techniques including zero-shot, few-shot, zero-shot chain-of-thought, and prompt chaining, revealed LangChain’s suitability for this task and highlighted prompt chaining as the most effective method for mitigating inappropriate content in this use case. Looking ahead, future research could focus on further exploring prompt engineering capabilities and strategies to ensure uniform learning outcomes for all students, as well as leveraging LangChain to enhance the adaptability and accessibility of educational applications.
|
2 |
Prompt Engineering: Toward a Rhetoric and Poetics for Neural Network Augmented Authorship in Composition and RhetoricFoley, Christopher 01 January 2024 (has links) (PDF)
My dissertation introduces the notion of "augmented authorship" and applications for prompt engineering with generative neural networks inspired by Gregory Ulmer's theories of electracy (2003) to the interdisciplinary fields that teach writing and rhetoric. With the goal of inspiring the general practice of electracy, I introduce prompt engineering as practice in flash reason (Ulmer 2008; 2012), a new collective prudence emerging from the apparatus of electracy. By situating electracy and flash reason as threshold concepts in writing studies, and by aligning principles of electracy with ACRL and NCTE digital literacy frameworks, I demonstrate how prompt engineering across modalities can help students meet digital literacy goals, before providing accessible models or "relays" in the form of AI-coauthored texts, course modules, and aesthetic models deployed in the game world Roblox.
|
3 |
A Multimodal Framework for Automated Content Moderation of Children's VideosAhmed, Syed Hammad 01 January 2024 (has links) (PDF)
Online video platforms receive hundreds of hours of uploads every minute, making manual moderation of inappropriate content impossible. The most vulnerable consumers of malicious video content are children from ages 1-5 whose attention is easily captured by bursts of color and sound. Prominent video hosting platforms like YouTube have taken measures to mitigate malicious content, but these videos often go undetected by current automated content moderation tools that are focused on removing explicit or copyrighted content. Scammers attempting to monetize their content may craft malicious children's videos that are superficially similar to educational videos, but include scary and disgusting characters, violent motions, loud music, and disturbing noises. A robust classification of malicious videos requires audio representations in addition to video features. However, recent content moderation approaches rarely employ multimodal architectures that explicitly consider non-speech audio cues. Additionally, there is a dearth of comprehensive datasets for content moderation tasks which include these audio-visual feature annotations. This dissertation addresses these challenges and makes several contributions to the problem of content moderation for children’s videos. The first contribution is identifying a set of malicious features that are harmful to preschool children but remain unaddressed and publishing a labeled dataset (Malicious or Benign) of cartoon video clips that include these features. We provide a user-friendly web-based video annotation tool which can easily be customized and used for video classification tasks with any number of ground truth classes. The second contribution is adapting state-of-the-art Vision-Language models to apply content moderation techniques on the MOB benchmark. We perform prompt engineering and an in-depth analysis of how context-specific language prompts affect the content moderation performance of different CLIP (Contrastive Language-Image Pre-training) variants. This dissertation introduces new benchmark natural language prompt templates for cartoon videos that can be used with Vision-Language models. Finally, we introduce a multimodal framework that includes the audio modality for more robust content moderation of children's cartoon videos and extend our dataset to include audio labels. We present ablations to demonstrate the enhanced performance of adding audio. The audio modality and prompt learning are incorporated while keeping the backbone modules of each modality frozen. Experiments were conducted on a multimodal version of the MOB (Malicious or Benign) dataset in both supervised and few-shot settings.
|
4 |
Sind Sprachmodelle in der Lage die Arbeit von Software-Testern zu übernehmen?: automatisierte JUnit Testgenerierung durch Large Language ModelsSchäfer, Nils 20 September 2024 (has links)
Die Bachelorarbeit untersucht die Qualität von Sprachmodellen im Kontext der Generierung
von Unit Tests für Java Anwendungen. Ziel der Arbeit ist es, zu analysieren,
inwieweit JUnit Tests durch den Einsatz von Sprachmodellen automatisiert generiert
werden können und daraus abzuleiten mit welcher Qualität sie die Arbeit von Software-
Testern übernehmen und ersetzen. Hierzu wird ein automatisiertes Testerstellungssystem
in Form eines Python-Kommandozeilen-Tools konzipiert sowie implementiert, welches mithilfe
von Anfragen an das Sprachmodell Testfälle generiert. Um dessen Qualität messen zu
können, werden die generierten Tests ohne manuellen Einfluss übernommen. Als Grundlage
der Evaluierung findet eine Durchführung statt, in der für 3 Java-Maven Projekte, mit
unterschiedlichen Komplexitätsgraden, Tests generiert werden. Die anschließende Analyse
besteht aus einem festen Bewertungsverfahren, welches die Testcodeabdeckung sowie
Erfolgsquote evaluiert und mit manuellen Tests vergleicht. Die Ergebnisse zeigen, dass
Sprachmodelle in der Lage sind, JUnit Tests mit einer zufriedenstellenden Testabdeckung
zu generieren, jedoch eine unzureichende Erfolsquote im Vergleich zu manuellen Tests
aufweisen. Es wird deutlich, dass sie aufgrund von Qualitätsmängeln bei der Generierung
von Testcode die Arbeit von Software-Testern nicht vollständig ersetzen können. Jedoch
bieten sie eine Möglichkeit, Testerstellungsprozesse zu übernehmen, welche mit einer
anschließenden manuellen Nachkontrolle enden und reduzieren somit den Arbeitsaufwand
der Tester.:Abbildungsverzeichnis IV
Tabellenverzeichnis V
Quellcodeverzeichnis VI
Abkürzungsverzeichnis VIII
1 Einleitung 1
1.1 Problemstellung 1
1.2 Zielstellung 2
2 Grundlagen 4
2.1 Software Development Lifecycle 4
2.2 Large Language Models 6
2.2.1 Begriff und Einführung 6
2.2.2 Generative Pre-trained Transformer 8
2.3 Prompt Engineering 9
2.3.1 Prompt Elemente 10
2.3.2 Prompt Techniken 10
2.4 Unit Testing 12
2.4.1 Grundlagen 12
2.4.2 Java mit JUnit5 14
2.5 SonarQube 16
3 Konzeption 18
3.1 Voraussetzungen 18
3.2 Anforderungsanalyse 19
3.3 Wahl des Large Language Models 21
3.4 Design des Prompts 22
3.5 Programmablaufplan 25
4 Implementation 28
4.1 Funktionalitäten 28
4.1.1 Nutzerabfrage 28
4.1.2 Java-Datei Erfassung im Projekt 30
4.1.3 Prompt-Erstellung 30
4.1.4 API-Anfrage zur Generierung von Tests 33
4.1.5 Testüberprüfung mit Repair Rounds 34
4.1.6 Logging 37
4.2 Integration von SonarQube, Plugins, Dependencies 39
4.3 Testdurchlauf 40
5 Durchführung und Analyse 43
5.1 Durchführung 43
5.2 Evaluation der Tests 44
5.2.1 Line Coverage 45
5.2.2 Branch Coverage 47
5.2.3 Overall Coverage 49
5.2.4 Erfolgsquote 51
5.3 Testcodeanalyse 52
5.4 Vergleich mit manuellen Testergebnissen 56
5.5 Einordnung der Ergebnisse 57
6 Fazit 58
6.1 Schlussfolgerung 58
6.2 Ausblick 59
Literaturverzeichnis I
A Anhang - Quelltexte
|
5 |
Components of reusable prompts for humanity-centered automationWilbers, S., Günther, N., van de Sand, R., Prell, B., Reiff-Stephan, J. 18 February 2025 (has links)
The increasing integration of large language models (LLMs) into cyber-physical production systems,
necessitating solutions that are both efficient and aligned with human values and needs. Designing
effective and reusable prompts is crucial for creating cyber-physical systems(CPS) that are effective,
flexible, reliable, user-friendly and aligned with Humanity-centered Automation. This paper introduces
essential components of reusable prompts: Versioning, Model Selection, Purpose Definition,
Variables, Examples, and Output Structuring. By applying the proposed components, developers
might enable CPS employing LLMs to operate with more predictable inputs and outputs, enabling
better control over results and facilitating the chaining of multiple prompts or collaboration between
different LLMs.
These components emerged from extensive experimentation with various LLMs and prompt configurations.
The resulting framework supports the maintenance of prompt collections similar to codebases
and thus enhances traceability, maintainability, and human oversight. Such structured components
for prompt design supports humanity-centered automation by ensuring that technological
advances serve human values and improve interaction with complex systems.
|
6 |
Utveckling av en anonymiseringsprototyp för säker interaktion med chatbotarHanna, John Nabil, Berjlund, William January 2024 (has links)
I denna studie presenteras en prototyp för anonymisering av känslig information itextdokument, med syfte att möjliggöra säker interaktion med stora språkmodeller(LLM:er), såsom ChatGPT. Prototypen erbjuder en plattform där användare kanladda upp dokument för att anonymisera specifika känsliga ord. Efter anonymiseringkan användare ställa frågor till ChatGPT baserat på det anonymiserade innehållet.Prototypen återställer de anonymiserade delarna i svaren från ChatGPT innan de visas för användaren, vilket säkerställer att känslig information förblir skyddad underhela interaktionen.I studien används metoden Design Science Research in Information Systems (DSRIS). Prototypen utvecklas i Java och testas med påhittade dokument, medan enkätsvar samlasin för att utvärdera användarupplevelsen.Resultaten visar att prototypens funktioner fungerar väl och skyddar känslig information vid interaktionen med ChatGPT. Prototypen har utvärderats med hjälp av svarfrån enkäten som dessutom tar upp förbättringsmöjligheter.Avslutningsvis visar studien att det är möjligt att anonymisera textdokument effektivt och samtidigt få korrekt och användbar feedback från ChatGPT. Trots vissa begränsningar i användargränssnittet på grund av tidsramen visar studien på potentialför säker datahantering med ChatGPT. / This study presents a prototype for anonymizing sensitive information in text documents, with the aim of enabling secure interactions with large language models(LLMs) such as ChatGPT. The prototype offers a platform where users can uploaddocuments to anonymize specific sensitive words. After anonymization, users canpose questions to ChatGPT based on the anonymized content. The prototype restores the anonymized parts in the responses from ChatGPT before they are displayed to the user, ensuring that sensitive information remains protected throughoutthe entire interaction.The study uses the Design Science Research in Information Systems (DSRIS)method. The prototype is developed in Java and tested with fabricated documents,while survey responses were collected to evaluate the user experience.The results show that the prototype's functionalities work well and protect sensitiveinformation during interaction with ChatGPT. The prototype has been evaluated using survey responses that also highlight opportunities for improvement.In conclusion, the study demonstrates that it is possible to effectively anonymizetext documents while obtaining accurate and useful feedback from ChatGPT. Despite some limitations in the user interface due to the timeframe, the study showspotential for secure data handling with ChatGPT.
|
7 |
Detecting Plagiarism with ChatGPT Using Prompt Engineering / Upptäcka Plagiering med ChatGPT med Hjälp av PromptkonstruktionBiörck, Johann, Eriksson, Sofia January 2023 (has links)
Prompt engineering is the craft of designing prompts in order to get desired answers from language models such as ChatGPT. This thesis investigates how ChatGPT, specifically GPT-4, can be used to detect plagiarism in simple programming exercises. We used a dataset containing seven different original solutions for programming tasks. Every programming task also contained solutions that were plagiarizing the original as well as solutions that did not plagiarize the original. After testing various different prompts on a subset of the dataset, four different prompts were tested on the majority of the dataset. Three of the prompts produced unreliable results to the point that simply guessing whether or not the task solutions were plagiarized would have frequently been more accurate. The fourth prompt was more accurate although still not accurate enough for it to be recommended to use ChatGPT in order to identify plagiarism. / Promptkonstruktion (prompt engineering) är konsten att skapa instruktioner som ger bästa möjliga svar från språkmodeller (language models) såsom ChatGPT. Denna avhandling undersöker hur ChatGPT kan användas för att upptäcka plagiat i enkla programmeringsuppgifter. Vi använde ett dataset som innehåller sju olika originallösningar på enkla programmeringsuppgifter. Varje programmeringsuppgift har plagierade lösningar som löser samma uppgift och icke-plagierade lösningar som också löser samma uppgift. Efter att ha testat olika instruktioner med ChatGPT på en liten delmängd av datasetet, testades fyra olika instruktioner på majoriteten av datasetet. Tre av instruktionerna gav opålitliga resultat till den grad att det ofta skulle gett ett bättre resultat att gissa om lösningarna var plagierade eller inte. Den fjärde instruktionen gav bättre resultat, men fortfarande inte tillräckligt bra för att rekommendera att använda ChatGPT för att identifiera plagiat.
|
8 |
Token Budget Minimisation of Large Language Model based Program RepairHidvégi, Dávid January 2023 (has links)
Automated Program Repair (APR) is gaining popularity in the field of software engineering. APR reduces the time and effort needed to find and fix software bugs, with a goal of completely automating bug fixing without any human input. The first part of this study focuses on replicating ChatRepair, an advanced APR tool, and benchmarking it on 6 projects of the Defects4J 2.0. The evaluation revealed three enhancement options: Data Augmentation, Prompt Engineering, and Response Parsing. The second part of the study entails the design and implementation of a new tool, called RapidCapr, based on the newly found features and the structure of ChatRepair. Subsequently, RapidCapr was benchmarked on the same data set as ChatRepair. RapidCapr outperformed ChatRepair in terms of efficiency by producing comparable amount of plausible patches using 7 times fewer tokens. Regarding performance RapidCapr exceeded ChatRepair by generating 15% more plausible and 10% more fixed patches while using 63% fewer tokens. Importantly, the novel approach introduced in this study offers a dual advantage: it significantly reduces the cost associated with conversational-based Automated Program Repair (APR) while concurrently enhancing repair performance. / Automatisk programreparation (APR) ökar i popularitet inom mjukvaruutvecklingsområdet. APR minskar den tid och ansträngning som krävs för att hitta och åtgärda mjukvarubuggar, med målet att helt automatisera buggfixering utan något mänskligt ingripande. Den första delen av denna studie fokuserar på att replikera ChatRepair, ett avancerat APR-verktyg, och att utvärdera det på 6 projekt från Defects4J 2.0. Utvärderingen avslöjade tre förbättringsalternativ: Dataaugmentering, Prompt Engineering och Responsanalys. Den andra delen av studien innefattar design och implementation av ett nytt verktyg, kallat RapidCapr, baserat på de nyligen funna funktionerna och strukturen hos ChatRepair. Därefter utvärderades RapidCapr på samma datamängd som ChatRepair. RapidCapr presterade bättre än ChatRepair i fråga om effektivitet genom att producera en jämförbar mängd möjliga patchar och åtgärdade patchar med 3 till 7 gånger färre ”tokens” och 11 till 16 gånger färre anrop, beroende på stoppvillkor. När det gäller prestanda överträffade RapidCapr ChatRepair genom att generera 15% fler möjliga patchar och 10% fler åtgärdade patchar samtidigt som det använde 7% till 63% färre ”tokens”, beroende på stoppvillkor. Viktigt att notera är att det nya tillvägagångssättet som introduceras i denna studie erbjuder en dubbel fördel: det minskar betydligt kostnaderna för konversationsbaserad automatisk programreparation (APR) samtidigt som det förbättrar reparationsprestandan.
|
9 |
Exploring the Effects of Prompt Engineering and Interaction Quality Feedback on ChatGPT-3.5 Performance in the realm of Voice Assistants : An Empirical Study on Enhancing Response Accuracy and System EfficiencyHöggren, Felix, Victor, Chicinas January 2024 (has links)
This Bachelor thesis investigates the influence of prompt engineering and the integration of an Interaction Quality (IQ) feedback loop on the performance of ChatGPT-3.5 as a voice assistant. By analysing empirical data across multiple configurations, this study explores how these interventions affect response accuracy and efficiency. Findings suggest that prompt engineering tends to enhance system performance, though the benefits of the IQ feedback loop remain less clear and require further investigation. This study contributes to the field by delineating the potential for targeted modifications to improve dialogue system outputs in real-time applications.
|
10 |
Prompt engineering and its usability to improve modern psychology chatbots / Prompt engineering och dess användbarhet för att förbättra psykologichatbottarNordgren, Isak, E. Svensson, Gustaf January 2023 (has links)
As advancements in chatbots and Large Language Models (LLMs) such as GPT-3.5 and GPT-4 continue, their applications in diverse fields, including psychology, expand. This study investigates the effectiveness of LLMs optimized through prompt engineering, aiming to enhance their performance in psychological applications. To this end, two distinct versions of a GPT-3.5-based chatbot were developed: a version similar to the base model, and a version equipped with a more extensive system prompt detailing expected behavior. A panel of professional psychologists evaluated these models based on a predetermined set of questions, providing insight into their potential future use as psychological tools. Our results indicate that an overly prescriptive system prompt can unintentionally limit the versatility of the chatbot, making a careful balance in instruction specificity essential. Furthermore, while our study suggests that current LLMs such as GPT-3.5 are not capable of fully replacing human psychologists, they can provide valuable assistance in tasks such as basic question answering, consolation and validation, and triage. These findings provide a foundation for future research into the effective integration of LLMs in psychology and contribute valuable insights into the promising field of AI-assisted psychological services. / I takt med att framstegen inom chatbots och stora språkmodeller (LLMs) som GPT-3.5 och GPT-4 fortsätter utvidgas deras potentiella tillämpningar inom olika områden, inklusive psykologi. Denna studie undersöker effektiviteten av LLMs optimerade genom prompt engineering, med målet att förbättra deras prestanda inom psykologiska tillämpningar. I detta syfte utvecklades två distinkta versioner av en chatbot baserad på GPT-3.5: en version som liknar bas-modellen, och en version utrustad med en mer omfattande systemprompt som detaljerar förväntat beteende. En panel av professionella psykologer utvärderade dessa modeller baserat på en förbestämd uppsättning frågor, vilket ger inblick i deras potentiella framtida användning som psykologiska verktyg. Våra resultat tyder på att en överdrivet beskrivande systemprompt kan ofrivilligt begränsa chatbotens mångsidighet, vilket kräver en noggrann balans i specificiteten av prompten. Vidare antyder vår studie att nuvarande LLMs som GPT-3.5 inte kan ersätta mänskliga psykologer helt och hållet, men att de kan ge värdefull hjälp i uppgifter som grundläggande frågebesvaring, tröst och bekräftelse, samt triage. Dessa resultat ger en grund för framtida forskning om effektiv integration av LLMs inom psykologi och bidrar med värdefulla insikter till det lovande fältet av AI-assisterade psykologtjänster.
|
Page generated in 0.1026 seconds