Detecting Plagiarism with ChatGPT Using Prompt Engineering / Upptäcka Plagiering med ChatGPT med Hjälp av Promptkonstruktion

Biörck, Johann, Eriksson, Sofia January 2023 (has links)
Prompt engineering is the craft of designing prompts in order to get desired answers from language models such as ChatGPT. This thesis investigates how ChatGPT, specifically GPT-4, can be used to detect plagiarism in simple programming exercises. We used a dataset containing seven different original solutions for programming tasks. Every programming task also contained solutions that were plagiarizing the original as well as solutions that did not plagiarize the original. After testing various different prompts on a subset of the dataset, four different prompts were tested on the majority of the dataset. Three of the prompts produced unreliable results to the point that simply guessing whether or not the task solutions were plagiarized would have frequently been more accurate. The fourth prompt was more accurate although still not accurate enough for it to be recommended to use ChatGPT in order to identify plagiarism. / Promptkonstruktion (prompt engineering) är konsten att skapa instruktioner som ger bästa möjliga svar från språkmodeller (language models) såsom ChatGPT. Denna avhandling undersöker hur ChatGPT kan användas för att upptäcka plagiat i enkla programmeringsuppgifter. Vi använde ett dataset som innehåller sju olika originallösningar på enkla programmeringsuppgifter. Varje programmeringsuppgift har plagierade lösningar som löser samma uppgift och icke-plagierade lösningar som också löser samma uppgift. Efter att ha testat olika instruktioner med ChatGPT på en liten delmängd av datasetet, testades fyra olika instruktioner på majoriteten av datasetet. Tre av instruktionerna gav opålitliga resultat till den grad att det ofta skulle gett ett bättre resultat att gissa om lösningarna var plagierade eller inte. Den fjärde instruktionen gav bättre resultat, men fortfarande inte tillräckligt bra för att rekommendera att använda ChatGPT för att identifiera plagiat.

Token Budget Minimisation of Large Language Model based Program Repair

Hidvégi, Dávid January 2023 (has links)
Automated Program Repair (APR) is gaining popularity in the field of software engineering. APR reduces the time and effort needed to find and fix software bugs, with a goal of completely automating bug fixing without any human input. The first part of this study focuses on replicating ChatRepair, an advanced APR tool, and benchmarking it on 6 projects of the Defects4J 2.0. The evaluation revealed three enhancement options: Data Augmentation, Prompt Engineering, and Response Parsing. The second part of the study entails the design and implementation of a new tool, called RapidCapr, based on the newly found features and the structure of ChatRepair. Subsequently, RapidCapr was benchmarked on the same data set as ChatRepair. RapidCapr outperformed ChatRepair in terms of efficiency by producing comparable amount of plausible patches using 7 times fewer tokens. Regarding performance RapidCapr exceeded ChatRepair by generating 15% more plausible and 10% more fixed patches while using 63% fewer tokens. Importantly, the novel approach introduced in this study offers a dual advantage: it significantly reduces the cost associated with conversational-based Automated Program Repair (APR) while concurrently enhancing repair performance. / Automatisk programreparation (APR) ökar i popularitet inom mjukvaruutvecklingsområdet. APR minskar den tid och ansträngning som krävs för att hitta och åtgärda mjukvarubuggar, med målet att helt automatisera buggfixering utan något mänskligt ingripande. Den första delen av denna studie fokuserar på att replikera ChatRepair, ett avancerat APR-verktyg, och att utvärdera det på 6 projekt från Defects4J 2.0. Utvärderingen avslöjade tre förbättringsalternativ: Dataaugmentering, Prompt Engineering och Responsanalys. Den andra delen av studien innefattar design och implementation av ett nytt verktyg, kallat RapidCapr, baserat på de nyligen funna funktionerna och strukturen hos ChatRepair. Därefter utvärderades RapidCapr på samma datamängd som ChatRepair. RapidCapr presterade bättre än ChatRepair i fråga om effektivitet genom att producera en jämförbar mängd möjliga patchar och åtgärdade patchar med 3 till 7 gånger färre ”tokens” och 11 till 16 gånger färre anrop, beroende på stoppvillkor. När det gäller prestanda överträffade RapidCapr ChatRepair genom att generera 15% fler möjliga patchar och 10% fler åtgärdade patchar samtidigt som det använde 7% till 63% färre ”tokens”, beroende på stoppvillkor. Viktigt att notera är att det nya tillvägagångssättet som introduceras i denna studie erbjuder en dubbel fördel: det minskar betydligt kostnaderna för konversationsbaserad automatisk programreparation (APR) samtidigt som det förbättrar reparationsprestandan.

Prompt engineering and its usability to improve modern psychology chatbots / Prompt engineering och dess användbarhet för att förbättra psykologichatbottar

Nordgren, Isak, E. Svensson, Gustaf January 2023 (has links)
As advancements in chatbots and Large Language Models (LLMs) such as GPT-3.5 and GPT-4 continue, their applications in diverse fields, including psychology, expand. This study investigates the effectiveness of LLMs optimized through prompt engineering, aiming to enhance their performance in psychological applications. To this end, two distinct versions of a GPT-3.5-based chatbot were developed: a version similar to the base model, and a version equipped with a more extensive system prompt detailing expected behavior. A panel of professional psychologists evaluated these models based on a predetermined set of questions, providing insight into their potential future use as psychological tools. Our results indicate that an overly prescriptive system prompt can unintentionally limit the versatility of the chatbot, making a careful balance in instruction specificity essential. Furthermore, while our study suggests that current LLMs such as GPT-3.5 are not capable of fully replacing human psychologists, they can provide valuable assistance in tasks such as basic question answering, consolation and validation, and triage. These findings provide a foundation for future research into the effective integration of LLMs in psychology and contribute valuable insights into the promising field of AI-assisted psychological services. / I takt med att framstegen inom chatbots och stora språkmodeller (LLMs) som GPT-3.5 och GPT-4 fortsätter utvidgas deras potentiella tillämpningar inom olika områden, inklusive psykologi. Denna studie undersöker effektiviteten av LLMs optimerade genom prompt engineering, med målet att förbättra deras prestanda inom psykologiska tillämpningar. I detta syfte utvecklades två distinkta versioner av en chatbot baserad på GPT-3.5: en version som liknar bas-modellen, och en version utrustad med en mer omfattande systemprompt som detaljerar förväntat beteende. En panel av professionella psykologer utvärderade dessa modeller baserat på en förbestämd uppsättning frågor, vilket ger inblick i deras potentiella framtida användning som psykologiska verktyg. Våra resultat tyder på att en överdrivet beskrivande systemprompt kan ofrivilligt begränsa chatbotens mångsidighet, vilket kräver en noggrann balans i specificiteten av prompten. Vidare antyder vår studie att nuvarande LLMs som GPT-3.5 inte kan ersätta mänskliga psykologer helt och hållet, men att de kan ge värdefull hjälp i uppgifter som grundläggande frågebesvaring, tröst och bekräftelse, samt triage. Dessa resultat ger en grund för framtida forskning om effektiv integration av LLMs inom psykologi och bidrar med värdefulla insikter till det lovande fältet av AI-assisterade psykologtjänster.

Artificial Intelligence-driven web development and agile project management using OpenAI API and GPT technology : A detailed report on technical integration and implementation of GPT models in CMS with API and agile web development for quality user-centered AI chat service experience

Tosic, Damjan January 2023 (has links)
This graduation report explores the integration of Artificial Intelligence (AI) tools, specifically OpenAI's Generative Pre-trained Transformer (GPT) technology, into web development processes using WordPress (WP) for developing a AI-driven chat service. The focus of the project is on ImagineX AB, a private company that offers the educational service ChatGPT Utbildning aimed at teaching professionals to effectively utilize ChatGPT. The project is motivated by the rapid growth and adoption of AI tools such as ChatGPT, underpinned by the observed increase in user base and its integration into significant platforms, like Microsoft's Bing and Office packages. Despite its promising potential, the application of such AI tools in web development remains underexplored and untested in several aspects. The graduation report presents the implementation of a GPT model-driven chat service on the ChatGPT Utbildning WP website, enabling visitors to interact with the famous AI tool directly. This feature serves a dual purpose – enhancing user engagement and providing an instant demonstration of the utility of ChatGPT. The agile project management methodology in general is divided into four phases: preliminary work, design solutions, develop solution, and delivery – design and development phases are iterative. In this project, there is two design iterations and three development iterations called “cycles”. The project plan is fulfilled with no deviation. Tests and continuous improvements are done throughout the development, with specific and planned in each phase and cycle. The result is two optimized chat bots in respective well-designed chat boxes with full chat functionality driven by OpenAI API and GPT-3.5/GPT-4 models – user tested and then published on ChatGPT Utbildning website. Additionally, insights in agile management solutions in relation to AI tools have been produced. The detailed construction and in-depth discussion contribute to the wide understanding of AI implementation in web development, providing practical insights into the application of ChatGPT in a real-world setting by agile project management. Furthermore, it underscores the transformative potential of AI tools in shaping web solutions and web development, and propelling innovation in the field. The report delves into discussion of technology, ethics, society, and implications on future web development. / Rapporten ämnar redogöra integreringen av artificiell intelligens (AI) instrument, särskilt OpenAI's Generative Pre-trained Transformer (GPT) teknologi, inom ramen för webbutvecklingsprocesser, inklusive agil projektledning, med användning av WordPress (WP), i syfte att utveckla en AIdrivande chatttjänst. Fokus för projektet är på företaget ImagineX AB, en privat aktör som erbjuder en utbildningstjänst benämnd ChatGPT Utbildning med mål att undervisa yrkesverksamma i effektivt bruk av ChatGPT. Motivationen för projektet härstammar från den snabbt växande tillväxten och adoptionen av AI-instrument som ChatGPT, vilket stärks av den observerade tillväxten av användarbasen och dess integrering i betydande plattformar, såsom Microsofts Bing och Office-paket. Trots den lovande potential som dessa AIinstrument innehar, finns det fortfarande delar inom webbutveckling där användningen av sådana verktyg förblir ouppklarade och otillräckligt utforskade. Rapporten visar implementeringen av en GPT-modell-drivande chattjänst på ChatGPT Utbildning WP-webbplatsen, vilket möjliggör direkt interaktion för besökare med det framstående AI-instrumentet. Denna funktion har ett tvåfaldigt ändamål - att förhöja användarengagemang och att ge en omedelbar demonstration av ChatGPT:s användbarhet. Den använda smidiga projektledningsmetodiken är typiskt uppdelad i fyra faser: preliminärt arbete, designlösningar, utvecklingslösningar samt leverans - designoch utvecklingsfaser är iterativa vilket omfattar två designiterationer och tre utvecklingsiterationer refererade till som "cykler". Projektplanen har följts utan avvikelser. Testning och kontinuerliga förbättringar har genomförts under hela utvecklingsprocessen, med specifika och planerade insatser i varje fas och cykel. Resultatet manifesteras i två optimerade chattrobotar inom respektive välutformade chattfönster, med fullständig chattfunktionalitet som drivs av OpenAI API samt GPT-3.5/GPT-4 modellerna - vilka har användartestats och därefter publicerats på ChatGPT Utbildning webbplatsen. Ytterligare insikter rörande agil projektledning i relation till AI-frågor erhålls också. Den detaljerade konstruktionen och den djupgående diskussionen bidrar till en omfattande förståelse för AI-implementering inom webbutveckling och ger praktiska insikter om tillämpningen av ChatGPT i en realistisk inställning med smidig projektledning. Vidare framhäver det den transformerande potentialen hos AI-instrument för att utforma webblösningar och webbutveckling, vilket främjar innovation inom området. Rapporten avslutas med diskussioner kring teknik, etik, samhälle och implikationer för framtida webbutveckling.

Automatic generation of definitions : Exploring if GPT is useful for defining words

Eriksson, Fanny January 2023 (has links)
When reading a text, it is common to get stuck on unfamiliar words that are difficult to understand in the local context. In these cases, we use dictionaries or similar online resources to find the general meaning of the word. However, maintaining a handwritten dictionary is highly resource demanding as the language is constantly developing, and using generative language models for producing definitions could therefore be a more efficient option. To explore this possibility, this thesis performs an online survey to examine if GPT could be useful for defining words. It also investigates how well the Swedish language model GPT-SW3 (3.5 b) define words compared to the model text-davinci-003, and how prompts should be formatted when defining words with these models. The results indicate that text-davinci-003 generates high quality definitions, and according to students t-test, the definitions received significantly higher ratings from participants than definitions taken from Svensk ordbok (SO). Furthermore, the results showed that GPT-SW3 (3.5 b) received the lowest ratings, indicating that it takes more investment to keep up with the big models developed by OpenAI. Regarding prompt formatting, the most appropriate prompt format for defining words is highly dependent on the model, and the results showed that text- davinci-003 performed well using zero-shot, while GPT-SW3 (3.5 b) required a few shot setting. Considering both the high quality of the definitions generated by text-davinci-003, and the practical advantages with generating definitions automatically, GPT could be a useful method for defining words.

The impact of task specification on code generated via ChatGPT

Lundblad, Jonathan, Thörn, Edwin, Thörn, Linus January 2023 (has links)
ChatGPT has made large language models more accessible and made it possible to code using natural language prompts. This study conducted an experiment comparing prompt engineering techniques called task specification and investigated their impacton code generation in terms of correctness and variety. The hypotheses of this study focused on whether the baseline method had a statistically significant difference in code correctness compared to the other methods. Code is evaluated using a software requirement specification that measures functional and syntactical correctness. Additionally, code variance is measured to identify patterns in code generation. The results show that there is a statistically significant difference in some code correctness criteria between the baseline and the other task specification methods, and the code variance measurements indicate a variety in the generated solutions. Future work could include using another large language model; different programming tasks andprogramming languages; and other prompt engineering techniques.

An Empirical Study on Using Codex for Automated Program Repair

Zhao, Pengyu January 2023 (has links)
This thesis explores the potential of Codex, a pre-trained Large Language Model (LLM), for Automated Program Repair (APR) by assessing its performance on the Defects4J benchmark that includes real-world Java bugs. The study aims to provide a comprehensive understanding of Codex’s capabilities and limitations in generating syntactically and semantically equivalent patches for defects, as well as evaluating its ability to handle defects with different levels of importance and complexity. Additionally, we aim to compare the performance of Codex with other LLMs in the APR domain. To achieve these objectives, we employ a systematic methodology that includes prompt engineering, Codex parameter adjustment, code extraction, patch verification, and Abstract Syntax Tree (AST) comparison. We successfully verified 528 bugs in Defects4J, which represents the highest number among other studies, and achieved 53.98% of plausible and 26.52% correct patches. Furthermore, we introduce the elle-elle-aime framework, which extends the RepairThemAll for Codex-based APR and is adaptable for evaluating other LLMs, such as ChatGPT and GPT-4. The findings of this empirical study provide valuable insights into the factors that impact Codex’s performance on APR, helping to create new prompt strategies and techniques that improve research productivity. / Denna avhandling utforskar potentialen hos Codex, en förtränad LLM, för APR genom att utvärdera dess prestanda på Defects4J-benchmarket som inkluderar verkliga Java-buggar. Studien syftar till att ge en omfattande förståelse för Codex förmågor och begränsningar när det gäller att generera syntaktiskt och semantiskt ekvivalenta patchar för defekter samt att utvärdera dess förmåga att hantera defekter med olika nivåer av betydelse och komplexitet. Dessutom är vårt mål att jämföra prestanda hos Codex med andra LLM inom APR-området. För att uppnå dessa mål använder vi en systematisk metodik som inkluderar prompt engineering, justering av Codex-parametrar, kodextraktion, patchverifiering och jämförelse av AST. Vi verifierade framgångsrikt 528 buggar i Defects4J, vilket representerar det högsta antalet bland andra studier, och uppnådde 53,98% plausibla och 26,52% korrekta patchar. Vidare introducerar vi elle-elle-aime ramverket, som utvidgar RepairThemAll för Codex-baserad APR och är anpassningsbart för att utvärdera andra LLM, såsom ChatGPT och GPT-4. Resultaten av denna empiriska studie ger värdefulla insikter i de faktorer som påverkar Codex prestanda på APR och hjälper till att skapa nya promptstrategier och tekniker som förbättrar forskningsproduktiviteten.

Matching ESCF Prescribed Cyber Security Skills with the Swedish Job Market : Evaluating the Effectiveness of a Language Model

Ahmad, Al Ghaith, Abd ULRAHMAN, Ibrahim January 2023 (has links)
Background: As the demand for cybersecurity professionals continues to rise, it is crucial to identify the key skills necessary to thrive in this field. This research project sheds light on the cybersecurity skills landscape by analyzing the recommendations provided by the European Cybersecurity Skills Framework (ECSF), examining the most required skills in the Swedish job market, and investigating the common skills identified through the findings. The project utilizes the large language model, ChatGPT, to classify common cybersecurity skills and evaluate its accuracy compared to human classification. Objective: The primary objective of this research is to examine the alignment between the European Cybersecurity Skills Framework (ECSF) and the specific skill demands of the Swedish cybersecurity job market. This study aims to identify common skills and evaluate the effectiveness of a Language Model (ChatGPT) in categorizing jobs based on ECSF profiles. Additionally, it seeks to provide valuable insights for educational institutions and policymakers aiming to enhance workforce development in the cybersecurity sector. Methods: The research begins with a review of the European Cybersecurity Skills Framework (ECSF) to understand its recommendations and methodology for defining cybersecurity skills as well as delineating the cybersecurity profiles along with their corresponding key cybersecurity skills as outlined by ECSF. Subsequently, a Python-based web crawler, implemented to gather data on cybersecurity job announcements from the Swedish Employment Agency's website. This data is analyzed to identify the most frequently required cybersecurity skills sought by employers in Sweden. The Language Model (ChatGPT) is utilized to classify these positions according to ECSF profiles. Concurrently, two human agents manually categorize jobs to serve as a benchmark for evaluating the accuracy of the Language Model. This allows for a comprehensive assessment of its performance. Results: The study thoroughly reviews and cites the recommended skills outlined by the ECSF, offering a comprehensive European perspective on key cybersecurity skills (Tables 4 and 5). Additionally, it identifies the most in-demand skills in the Swedish job market, as illustrated in Figure 6. The research reveals the matching between ECSF-prescribed skills in different profiles and those sought after in the Swedish cybersecurity market. The skills of the profiles 'Cybersecurity Implementer' and 'Cybersecurity Architect' emerge as particularly critical, representing over 58% of the market demand. This research further highlights shared skills across various profiles (Table 7). Conclusion: This study highlights the matching between the European Cybersecurity Skills Framework (ECSF) recommendations and the evolving demands of the Swedish cybersecurity job market. Through a review of ECSF-prescribed skills and a thorough examination of the Swedish job landscape, this research identifies crucial areas of alignment. Significantly, the skills associated with 'Cybersecurity Implementer' and 'Cybersecurity Architect' profiles emerge as central, collectively constituting over 58% of market demand. This emphasizes the urgent need for educational programs to adapt and harmonize with industry requisites. Moreover, the study advances our understanding of the Language Model's effectiveness in job categorization. The findings hold significant implications for workforce development strategies and educational policies within the cybersecurity domain, underscoring the pivotal role of informed skills development in meeting the evolving needs of the cybersecurity workforce.

Bridging Language & Data : Optimizing Text-to-SQL Generation in Large Language Models / Från ord till SQL : Optimering av text-till-SQL-generering i stora språkmodeller

Wretblad, Niklas, Gordh Riseby, Fredrik January 2024 (has links)
Text-to-SQL, which involves translating natural language into Structured Query Language (SQL), is crucial for enabling broad access to structured databases without expert knowledge. However, designing models for such tasks is challenging due to numerous factors, including the presence of ’noise,’ such as ambiguous questions and syntactical errors. This thesis provides an in-depth analysis of the distribution and types of noise in the widely used BIRD-Bench benchmark and the impact of noise on models. While BIRD-Bench was created to model dirty and noisy database values, it was not created to contain noise and errors in the questions and gold queries. We found after a manual evaluation that noise in questions and gold queries are highly prevalent in the financial domain of the dataset, and a further analysis of the other domains indicate the presence of noise in other parts as well. The presence of incorrect gold SQL queries, which then generate incorrect gold answers, has a significant impact on the benchmark’s reliability. Surprisingly, when evaluating models on corrected SQL queries, zero-shot baselines surpassed the performance of state-of-the-art prompting methods. The thesis then introduces the concept of classifying noise in natural language questions, aiming to prevent the entry of noisy questions into text-to-SQL models and to annotate noise in existing datasets. Experiments using GPT-3.5 and GPT-4 on a manually annotated dataset demonstrated the viability of this approach, with classifiers achieving up to 0.81 recall and 80% accuracy. Additionally, the thesis explored the use of LLMs for automatically correcting faulty SQL queries. This showed a 100% success rate for specific query corrections, highlighting the potential for LLMs in improving dataset quality. We conclude that informative noise labels and reliable benchmarks are crucial to developing new Text-to-SQL methods that can handle varying types of noise.

Introducing Generative Artificial Intelligence in Tech Organizations : Developing and Evaluating a Proof of Concept for Data Management powered by a Retrieval Augmented Generation Model in a Large Language Model for Small and Medium-sized Enterprises in Tech / Introducering av Generativ Artificiell Intelligens i Tech Organisationer : Utveckling och utvärdering av ett Proof of Concept för datahantering förstärkt av en Retrieval Augmented Generation Model tillsammans med en Large Language Model för små och medelstora företag inom Tech

Lithman, Harald, Nilsson, Anders January 2024 (has links)
In recent years, generative AI has made significant strides, likely leaving an irreversible mark on contemporary society. The launch of OpenAI's ChatGPT 3.5 in 2022 manifested the greatness of the innovative technology, highlighting its performance and accessibility. This has led to a demand for implementation solutions across various industries and companies eager to leverage these new opportunities generative AI brings. This thesis explores the common operational challenges faced by a small-scale Tech Enterprise and, with these challenges identified, examines the opportunities that contemporary generative AI solutions may offer. Furthermore, the thesis investigates what type of generative technology is suitable for adoption and how it can be implemented responsibly and sustainably. The authors approach this topic through 14 interviews involving several AI researchers and the employees and executives of a small-scale Tech Enterprise, which served as a case company, combined with a literature review.  The information was processed using multiple inductive thematic analyses to establish a solid foundation for the investigation, which led to the development of a Proof of Concept. The findings and conclusions of the authors emphasize the high relevance of having a clear purpose for the implementation of generative technology. Moreover, the authors predict that a sustainable and responsible implementation can create the conditions necessary for the specified small-scale company to grow.  When the authors investigated potential operational challenges at the case company it was made clear that the most significant issue arose from unstructured and partially absent documentation. The conclusion reached by the authors is that a data management system powered by a Retrieval model in a LLM presents a potential path forward for significant value creation, as this solution enables data retrieval functionality from unstructured project data and also mitigates a major inherent issue with the technology, namely, hallucinations. Furthermore, in terms of implementation circumstances, both empirical and theoretical findings suggest that responsible use of generative technology requires training; hence, the authors have developed an educational framework named "KLART".  Moving forward, the authors describe that sustainable implementation necessitates transparent systems, as this increases understanding, which in turn affects trust and secure use. The findings also indicate that sustainability is strongly linked to the user-friendliness of the AI service, leading the authors to emphasize the importance of HCD while developing and maintaining AI services. Finally, the authors argue for the value of automation, as it allows for continuous data and system updates that potentially can reduce maintenance.  In summary, this thesis aims to contribute to an understanding of how small-scale Tech Enterprises can implement generative AI technology sustainably to enhance their competitive edge through innovation and data-driven decision-making.

