1 |
Prompt engineering and its usability to improve modern psychology chatbots / Prompt engineering och dess användbarhet för att förbättra psykologichatbottarNordgren, Isak, E. Svensson, Gustaf January 2023 (has links)
As advancements in chatbots and Large Language Models (LLMs) such as GPT-3.5 and GPT-4 continue, their applications in diverse fields, including psychology, expand. This study investigates the effectiveness of LLMs optimized through prompt engineering, aiming to enhance their performance in psychological applications. To this end, two distinct versions of a GPT-3.5-based chatbot were developed: a version similar to the base model, and a version equipped with a more extensive system prompt detailing expected behavior. A panel of professional psychologists evaluated these models based on a predetermined set of questions, providing insight into their potential future use as psychological tools. Our results indicate that an overly prescriptive system prompt can unintentionally limit the versatility of the chatbot, making a careful balance in instruction specificity essential. Furthermore, while our study suggests that current LLMs such as GPT-3.5 are not capable of fully replacing human psychologists, they can provide valuable assistance in tasks such as basic question answering, consolation and validation, and triage. These findings provide a foundation for future research into the effective integration of LLMs in psychology and contribute valuable insights into the promising field of AI-assisted psychological services. / I takt med att framstegen inom chatbots och stora språkmodeller (LLMs) som GPT-3.5 och GPT-4 fortsätter utvidgas deras potentiella tillämpningar inom olika områden, inklusive psykologi. Denna studie undersöker effektiviteten av LLMs optimerade genom prompt engineering, med målet att förbättra deras prestanda inom psykologiska tillämpningar. I detta syfte utvecklades två distinkta versioner av en chatbot baserad på GPT-3.5: en version som liknar bas-modellen, och en version utrustad med en mer omfattande systemprompt som detaljerar förväntat beteende. En panel av professionella psykologer utvärderade dessa modeller baserat på en förbestämd uppsättning frågor, vilket ger inblick i deras potentiella framtida användning som psykologiska verktyg. Våra resultat tyder på att en överdrivet beskrivande systemprompt kan ofrivilligt begränsa chatbotens mångsidighet, vilket kräver en noggrann balans i specificiteten av prompten. Vidare antyder vår studie att nuvarande LLMs som GPT-3.5 inte kan ersätta mänskliga psykologer helt och hållet, men att de kan ge värdefull hjälp i uppgifter som grundläggande frågebesvaring, tröst och bekräftelse, samt triage. Dessa resultat ger en grund för framtida forskning om effektiv integration av LLMs inom psykologi och bidrar med värdefulla insikter till det lovande fältet av AI-assisterade psykologtjänster.
|
2 |
Snort Rule Generation for Malware Detection Using the GPT2 TransformerLaryea, Ebenezer Nii Afotey 04 July 2022 (has links)
Natural Language machine learning methods are applied to rules generated to identify malware at the network level. These rules use a computer-based signature specification "language" called Snort. Using Natural Language processing techniques and other machine learning methods, new rules are generated based on a training set of existing Snort rule signatures for a specific type of malware family. The performance is then measured, in terms of the detection of existing types of malware and the number of "false positive" triggering events.
|
3 |
GPT i Lokal Miljö : Ett webbläsartillägg för analys av medicinska journalerAlizade, Nasir January 2024 (has links)
Med tanke på den ökande digitaliseringen och populariteten för stora språkmodeller (LLM) som ChatGPT, undersöker denna studie utvecklingen och implementeringen av ett webbläsartillägg för att analysera medicinska journaler från 1177 Vårdguiden. Syftet med studien är att förbättra hanteringen av medicinska journaler genom att integrera en lokal instans av en avancerad språkmodell, vilket möjliggör säker och privat bearbetning av patientdata. Projektet fokuserade på att utvärdera tillgängliga språkmodeller, utveckla webbläsartillägget och utföra prestandamätningar. Utmaningar inkluderade modellens förmåga att generera korrekta diagnoser och sammanfattningar samt tekniska begränsningar i systemets prestanda. Den valda modellen, GPT-SW3-356M-Struct, visade sig vara kapabel nog för uppgiften, även om det fanns vissa begränsningar i noggrannheten och detaljrikedomen i de genererade svaren. Studien visade att lokal bearbetning av medicinska data med hjälp av en LLM förbättrar dataskyddet och användarnas förtroende. Dock krävs ytterligare arbete för att förbättra modellens prestanda och användarvänlighet. Framtida forskning bör fokusera på att förbättra noggrannheten i medicinska analyser genom att förbättra träningsdata, testa andra modeller, och utföra fler användartester för att säkerställa att tillägget möter användarens behov. Slutsatsen av denna studie är att medan integrationen av LLM i webbläsartillägg erbjuder lovande möjligheter för medicinsk dataanalys, kräver det fortsatt utveckling och optimering för att fullt ut realisera dess potential inom hälso- och sjukvården. / Given the rising digitalization and increasing popularity of large language models (LLMs) like ChatGPT, this study explores the development and implementation of a browser extension to analyze medical records from 1177 Vårdguiden. The goal is to enhance the management of medical records by integrating a local instance of an advanced language model, allowing for secure and private processing of patient data. The project focused on evaluating available language models, developing the browser extension, and conducting performance measurements. Identified challenges included ensuring the model’s ability to generate accurate diagnoses and summaries and addressing technical limitations in system performance. The selected model, GPT-SW3- 356M-Struct, was found to be capable of the task, although there were some limitations in the accuracy and detail of the generated responses. Based on the results, local processing of medical data using an LLM improves data protection and user trust. However, further work is necessary to enhance the model’s performance and usability. Future research should focus on improving the accuracy of medical analyses by refining training data, testing other models, and conducting additional user tests to ensure the extension meets user needs. In conclusion, while integrating LLMs into browser extensions offers promising opportunities for medical data analysis, it requires ongoing development and optimization to fully realize its potential in healthcare.
|
4 |
ICT and economic growth : a dynamic non-parametric approachWang, Bin January 2010 (has links)
One of important issues of the policymakers is to improve output and/or productivity growth associated with information and communication technology (ICT) adoption, where total factor productivity (TFP) growth related with ICT in the 1990s appeared in the US but not in the UK (Jorgenson and Stiroh, 2000; Oliner and Sichel, 2000). The general agreement is that ICT can raise output and/or productivity growth via an increase in productivity growth in the ICT-producing sectors due to rapid technological progress, through capital deepening driven by high levels of investment in ICT equipments, and via increases in efficiency in ICT-using sectors that successfully adopt this new technology by ICT spillover effects (David, 1990). Due to the small size of ICT-producing industries and relatively low level of ICT investments in the UK (Colecchia and Schreyer, 2001; Daveri, 2002; Vijselaar and Albers, 2002), the utilization of ICT spillover effects was crucial to improving output and/or productivity growth for the UK. However, in most of the previous studies, while many concluded ICT spillover effects existed in the US, they had mixed results as to whether ICT spillover effects existed in the UK (Schreyer, 2000; Basu et al., 2003; Inklaar et al., 2005; Jorgenson et al., 2005). The objective of this thesis is to contribute to the existing literature by investigating the existence of ICT spillover effects in the US and the UK and exploring the reasons for the different effects between them. This thesis argues that the mixed findings in the previous studies are due to the ignorance of the General-purpose technology (GPT) theory and weakness in methodology. Thus, the first step is to build a new framework of measuring ICT spillover effects to solve the problems from the existing studies. The main ignorance of the GPT theory is the lack of guidance for the proxy of co-invention related to ICT investments and for the length of lag. The new framework no longer has this ignorance because it uses efficiency as a proxy of co-invention and captures the length of lag by years with negative return on ICT capital. The methodology employed in the previous studies was inappropriate mainly because of the small sample size taken in the ICT study, the two-stage approach used to explore the effect of the environmental variables on efficiency and the linear and concavity assumptions on the frontiers without taking account of ICT as a GPT. The new framework uses Bayesian technique, one-stage approach and non-parametric frontiers to avoid these three drawbacks. In addition, the new framework introduces the persistent level of inefficiency, using a first-order autoregressive (i.e. AR(1)) structure of inefficiency itself, as one of factors that influence ICT spillover effects. In order to model the new framework which takes into account the non-parametric frontiers for capturing negative return of ICT capital, an AR(1) structure of inefficiency, the small sample size and factors that influence ICT spillover effects, this thesis has developed two non-parametric dynamic stochastic frontier analysis (SFA) models with an AR(1) structure and performed the analysis via Bayesian inference. The first model was a semi-parametric dynamic stochastic frontier with a time-variant non-parametric frontier at the basic level along with a time-invariant linear function for the technical inefficiency at the higher-level. The second model relaxed the time-invariant linear functional form for technical inefficiency at the higher level. The results of the new framework showed strong ICT spillover effects in the US with a lag of about 6-8 years during 1982-83 to 1988-89, while relatively weaker ICT spillover effects in the UK. This can be evidenced by the fact that the UK has been in the process of organizational adjustment up to 2000 due to a longer lag. Thus, in the 1990s, there was a lack of TFP growth in the UK. Related to the different ICT spillover effects between the US and the UK, the results from the new framework suggested that the various persistent levels of inefficiency between the two countries was important, apart from the different levels of ICT investment between them mentioned in the previous studies (Inklaar, O Mahony and Timmer, 2003). JEL Classifications: C51, E13, O30, O33
|
5 |
The Impact of AI-generated Code on Web Development: A Comparative Study of ChatGPT and GitHub CopilotFajkovic, Edvin, Rundberg, Erik January 2023 (has links)
Background. Machine learning and artificial intelligence are advancing faster than ever, code generation is becoming a hot topic and is starting to gain traction in the industry. This creates the question, is it possible to create a complete website from scratch using only code generated by AI? Objectives. To determine whether it is possible to create complete websites from start to finish with the code-generating tools. The tools in question are OpenAI’s ChatGPT and GitHub’s Copilot. Methods. A design-based research was conducted where two tools were evaluated for the task of recreating a wireframe as closely as possible in terms of efficiency, accuracy, maintainability, and ease of use. The code was then analyzedboth manually with a code review and using the tools SonarQube, ESLint, and Pylint. Results. The experiment resulted in that both tools delivered code that was similar in quality, both tools managed to create the websites according to wireframe with minor styling differences. We found that it is easier to create a website from scratch using OpenAI's ChatGPT than it is with GitHub's Copilot even though it uses OpenAI's Codex model which focuses on code generation. Conclusion. Code-generating AI is not advanced enough to create systems from scratch in a time-efficient way without introducing bugs and security risks.
|
6 |
Recommendation of Text Properties for Short Texts with the Use of Machine Learning : A Comparative Study of State-of-the-Art Techniques Including BERT and GPT-2 / Rekommendation av textegenskaper för korta texter med hjälp av maskininlärning : En jämförande studie av de senaste teknikerna inklusive BERT och GPT-2Zapata, Luciano January 2023 (has links)
Text mining has gained considerable attention due to the extensive usage ofelectronic documents. The significant increase in electronic document usagehas created a necessity to process and analyze them effectively. Rule-basedsystems have traditionally been used to evaluate short pieces of text, but theyhave limitations, including the need for significant manual effort to create andmaintain rules and a high risk of complex bugs. As a result, text classificationhas emerged as a promising solution for extracting meaning from short texts,which are defined as texts limited by a specific character count or word count.This study investigates the feasibility and effectiveness of text classification inclassifying short pieces of text according to their appropriate text properties,based on users’ intentions in the text. The study focuses on comparing twotransformer models, GPT-2 and BERT, in their ability to classify short texts.While other studies have compared these models in intention classificationof text, this study is unique in its examination of their performance onshort pieces of text in this specific context. This study uses user-labelleddata to fine-tune the models, which are then tested on a test dataset fromthe same source. The comparative analysis of the models indicates thatBERT generally outperforms GPT-2 in classifying users’ intentions basedon the appropriate text properties, with an F1-score of 0.68 compared toGPT-2’s F1-score of 0.51. However, GPT-2 performed better on certainclosely related classes, suggesting that both models capture interesting featuresof these classes. Furthermore, the results demonstrated that some classeswere accurately classified despite being context-dependent and positionedwithin longer sentences, indicating that the models likely capture features ofthese classes and facilitate their classification. Both models show promisingpotential as classification models for short texts based on users’ intentions andtheir associated text properties. However, further research may be necessary toimprove their accuracy. Suggestions for enhancing their performance includeutilizing more recent versions of GPT, such as GPT-3 or GPT-4, optimizinghyperparameters, adjusting preprocessing methods, and adopting alternativeapproaches to handle data imbalance. Additionally, testing the models ondatasets from diverse domains with more intricate contexts could providegreater insight into their limitations. / Textutvinning har fått stor uppmärksamhet på grund av den omfattande användningen av elektroniska dokument. Den betydande ökningen av användningen av elektroniska dokument har skapat ett behov av att bearbeta och analysera dem på ett effektivt sätt. Regelbaserade system har traditionellt använts för att utvärdera korta textstycken, men de har begränsningar, bland annat behovet av betydande manuellt arbete för att skapa och upprätthålla regler och en hög risk för komplexa fel. Som ett resultat av detta har textklassificering framstått som en lovande lösning för att utvinna mening ur korta texter, som definieras som texter som begränsas av ett visst antal tecken eller ord. I den här studien undersöks om textklassificering är genomförbar och effektiv när det gäller att klassificera korta textstycken enligt deras lämpliga textegenskaper, baserat på användarnas intentioner i texten. Studien fokuserar på att jämföra två transformatormodeller, GPT-2 och BERT, i deras förmåga att klassificera korta texter. Även om andra studier har jämfört dessa modeller vid avsiktsklassificering av text, är denna studie unik i sin undersökning av deras prestanda för korta textstycken i detta specifika sammanhang. I studien används användarmärkta data för att finjustera modellerna, som sedan testas på ett testdataset från samma källa. Den jämförande analysen av modellerna visar att BERT generellt sett presterar bättre än GPT-2 när det gäller att klassificera användarnas avsikter baserat på lämpliga textegenskaper, med ett F1-värde på 0,68 jämfört med GPT-2:s F1-värde på 0,51. GPT-2 presterade dock bättre på vissa närbesläktade klasser, vilket tyder på att båda modellerna fångar intressanta egenskaper hos dessa klasser. Dessutom visade resultaten att vissa klasser klassificerades korrekt trots att de var kontextberoende och placerade i längre meningar, vilket tyder på att modellerna sannolikt fångar upp egenskaper hos dessa klasser och underlättar deras klassificering. Båda modellerna visar lovande potential som klassificeringsmodeller för korta texter baserade på användarnas intentioner och deras tillhörande textegenskaper. Ytterligare forskning kan dock vara nödvändig för att förbättra deras noggrannhet. Förslag för att förbättra deras prestanda är bland annat att använda nyare versioner av GPT, till exempel GPT-3 eller GPT-4, optimera hyperparametrar, justera förbehandlingsmetoder och anta alternativa metoder för att hantera obalans i data. Om modellerna dessutom testas på dataset från olika områden med mer komplicerade sammanhang kan man få en bättre insikt i deras begränsningar.
|
7 |
Skillnad i bedömning av text beroende på om texten uppges vara skriven av en människa eller genererad av ChatGPT / Difference in evaluation of text depending on if the text is stated to be written by a human or generated by ChatGPT.Jonsson, Greta January 2023 (has links)
AI is now more often being used by individuals to write entire texts. This makes it more difficult for readers to determine who or what has written the text. In the future, it may be necessary to disclose who or what has written the text. What does this mean for how we relate to the text? Do we evaluate the text equally regardless of the author? Previous research shows that people have a certain negative bias towards AI in different situations.This study investigates whether there is a difference in how people evaluate text depending on whether the text is stated to be written by a human or generated by ChatGPT. In the study an experiment was conducted with a between-group design. The participants were 20 students from the Media Technology program at KTH who were asked to read and evaluate texts. Half of the participants were informed that the texts were written by a human, while the remaining participants were informed that the texts were generated by ChatGPT. The participants evaluated the texts by responding to twelve statements about the text's quality in a questionnaire. Data was analyzed using the Mann-Whitney U-test to identify any differences in evaluations.The results showed significant differences in evaluations for six out of 48 statements. These were in the evaluations of the statements: mature, personal, emotional, liked and good. Although not all evaluations could show significant differences, the mean values of the evaluations show that all texts were perceived as more emotional and personal by participants who were informed that the texts were written by a human. Additionally, all texts were perceived as more analytical and intelligent by participants who were informed that the texts were generated by AI. This suggests that there is a certain bias among people towards AI. / AI används alltmer av privatpersoner för att författa hela texter. Det gör det svårare att avgöra vem eller vad som skrivit texten. I framtiden kanske man kommer att behöva redovisa vem eller vad som skrivit texten. Vad betyder det för hur vi förhåller oss till texten? Bedömer vi texten lika oavsett vem som är författare? Tidigare forskning visar att människor i olika situationer har viss negativ bias gentemot AI. I denna studie undersöks ifall det finns skillnad i hur människor bedömer text beroende på om texten uppges vara skriven av en människa eller generad av ChatGPT. I studien genomfördes ett experiment med mellangruppsdesign. Deltagarna var 20 studenter från programmet Civilingenjör inom Medieteknik på KTH som fick läsa och bedöma texter. Hälften av deltagarna fick informationen att texterna var skrivna av en människa, resterande att texterna var genererade av ChatGPT. Deltagarna utvärderade texterna genom att svara på tolv påståenden om textens kvalitet i ett frågeformulär. Data analyserades med Mann-Whitney U-test för att identifiera eventuella skillnader i bedömningarna. Resultaten visade signifikanta skillnader i bedömning för sex av 48 bedömningar. Dessa var i bedömning av påståendena mogen, personlig, känslosam, gillade och bra. Även om inte signifikanta skillnader kunde påvisas i alla jämförelser visar medelvärden att samtliga texter bedömdes vara mer känslosamma och mer personliga av deltagare som blev informerade att texterna var skrivna av en människa. Samtliga texter bedömdes även vara mer analytiska och mer intelligenta av deltagare som blev informerade att texterna var genererade av AI. Detta tyder på att det finns en viss bias hos människor gentemot AI.
|
8 |
Prompt Engineering: Toward a Rhetoric and Poetics for Neural Network Augmented Authorship in Composition and RhetoricFoley, Christopher 01 January 2024 (has links) (PDF)
My dissertation introduces the notion of "augmented authorship" and applications for prompt engineering with generative neural networks inspired by Gregory Ulmer's theories of electracy (2003) to the interdisciplinary fields that teach writing and rhetoric. With the goal of inspiring the general practice of electracy, I introduce prompt engineering as practice in flash reason (Ulmer 2008; 2012), a new collective prudence emerging from the apparatus of electracy. By situating electracy and flash reason as threshold concepts in writing studies, and by aligning principles of electracy with ACRL and NCTE digital literacy frameworks, I demonstrate how prompt engineering across modalities can help students meet digital literacy goals, before providing accessible models or "relays" in the form of AI-coauthored texts, course modules, and aesthetic models deployed in the game world Roblox.
|
9 |
Automatická analýza a syntéza písňových textů / Computational analysis and synthesis of song lyricsBřezinová, Patrícia January 2021 (has links)
We explore a dataset of almost half a million English song lyrics through three different processes - automatic evaluation, visualization, and generation. We create our own rhyme detector, using the EM algorithm with several improvements and adjustable parameters. This may, in some cases, replace human evaluators that cannot be used, for example, after each iteration of the lyrics generator to evaluate its improvement. By creating a web-page visualization of the results with interesting matrix rhyme highlighting, we make our evaluation accessible to the public. We discuss interesting genre differences discovered by applying our automatic evaluation on the entire dataset. Finally, we explore lyrics generation using state-of-the-art GPT-2.
|
10 |
Generative Language Models for Automated Programming FeedbackHedberg Segeholm, Lea, Gustafsson, Erik January 2023 (has links)
In recent years, Generative Language Models have exploded into the mainstream with household names like BERT and ChatGPT, proving that text generation could have the potential to solve a variety of tasks. As the number of students enrolled into programming classes has increased significantly, providing adequate feedback for everyone has become a pressing logistical issue. In this work, we evaluate the ability of near state-of-the-art Generative Language Models to provide said feedback on an automated basis. Our results show that the latest publicly available model GPT-3.5 has a significant aptitude for finding errors in code while the older GPT-3 is noticeably more uneven in its analysis. It is our hope that future, potentially fine-tuned models could help fill the role of providing early feedback for beginners, thus significantly alleviating the pressure put upon instructors.
|
Page generated in 0.0271 seconds