Spelling suggestions: "subject:"longchain""
1 |
Responsible AI in Educational Chatbots: Seamless Integration and Content Moderation Strategies / Ansvarsfull AI i pedagogiska chatbots: strategier för sömlös integration och moderering av innehållEriksson, Hanna January 2024 (has links)
With the increasing integration of artificial intelligence (AI) technologies into educational settings, it becomes important to ensure responsible and effective use of these systems. This thesis addresses two critical challenges within AI-driven educational applications: the effortless integration of different Large Language Models (LLMs) and the mitigation of inappropriate content. An AI assistant chatbot was developed, allowing teachers to design custom chatbots and set rules for them, enhancing students’ learning experiences. Evaluation of LangChain as a framework for LLM integration, alongside various prompt engineering techniques including zero-shot, few-shot, zero-shot chain-of-thought, and prompt chaining, revealed LangChain’s suitability for this task and highlighted prompt chaining as the most effective method for mitigating inappropriate content in this use case. Looking ahead, future research could focus on further exploring prompt engineering capabilities and strategies to ensure uniform learning outcomes for all students, as well as leveraging LangChain to enhance the adaptability and accessibility of educational applications.
|
2 |
Enhancing Document Accessibility and User Interaction through Large Language Model: A Comparative Study for Educational Content : A Comparative Analysis of LLM and Traditional Site SearchUmar, Fatima January 2024 (has links)
This research integrates LLMs with RAG (Retrieval-Augmented Generation) to develop a conversational interface allowing users to post queries and ask questions from a website. It compares the LLMRAGmethodwith traditional site search functionality to determine which method users perceive as better, specifically regarding response quality and response time. The perceived results for response quality and response time were evaluated under the null hypothesis that there is no difference between the two methods. The study showed that the LLM RAG method was perceived as better in terms of response quality, and those results were significant. However, for response time, the traditional site search method was perceived as better, but the results were not significant, so the null hypothesis could not be rejected. Overall, the integration of LLMs with RAG frameworks promises to enhance information retrieval systems on digital platforms.
|
3 |
[pt] CONSULTANDO BANCOS DE DADOS COM LINGUAGEM NATURAL: O USO DE MODELOS DE LINGUAGEM GRANDES PARA TAREFAS DE TEXTO-PARA-SQL / [en] QUERYING DATABASES WITH NATURAL LANGUAGE: THE USE OF LARGE LANGUAGE MODELS FOR TEXT-TO-SQL TASKSEDUARDO ROGER SILVA NASCIMENTO 23 May 2024 (has links)
[pt] A tarefa chamada brevemente de Texto-para-SQL envolve a geração de uma consulta SQL com base em um banco de dados relacional e uma pergunta em linguagem natural. Embora os rankings de benchmarks conhecidos indiquem que Modelos de Linguagem Grandes (LLMs) se destacam nessa tarefa, eles são avaliados em bancos de dados com esquemas bastante simples. Esta dissertação investiga inicialmente o desempenho de modelos Texto-para-SQL baseados em LLMs em um banco de dados disponível ao público (Mondial)com um esquema conceitual complexo e um conjunto de 100 perguntas em Linguagem Natural (NL). Executando sob GPT-3.5 e GPT-4, os resultados deste primeiro experimento mostram que as ferramentas baseadas em LLM têm desempenho significativamente inferior ao relatado nesses benchmarks e enfrentam dificuldades com a vinculação de esquemas e joins, sugerindo que o esquema relacional pode não ser adequado para LLMs. Essa dissertação propõe então o uso de visões e descrições de dados amigáveis ao LLM para melhorara precisão na tarefa Texto-para-SQL. Em um segundo experimento, usando a estratégia com melhor performance, custo e benefício do experimento anterior e outro conjunto com 100 perguntas sobre um banco de dados do mundo real, os resultados mostram que a abordagem proposta é suficiente para melhorar consideravelmente a precisão da estratégia de prompt. Esse trabalho conclui com uma discussão dos resultados obtidos e sugere abordagens adicionais para simplificar a tarefa de Texto-para-SQL. / [en] The Text-to-SQL task involves generating an SQL query based on a
given relational database and a Natural Language (NL) question. While the
leaderboards of well-known benchmarks indicate that Large Language Models
(LLMs) excel in this task, they are evaluated on databases with simpler
schemas. This dissertation first investigates the performance of LLM-based
Text-to-SQL models on a complex and openly available database (Mondial)
with a large schema and a set of 100 NL questions. Running under GPT-3.5
and GPT-4, the results of this first experiment show that the performance of
LLM-based tools is significantly less than that reported in the benchmarks
and that these tools struggle with schema linking and joins, suggesting that
the relational schema may not be suitable for LLMs. This dissertation then
proposes using LLM-friendly views and data descriptions for better accuracy
in the Text-to-SQL task. In a second experiment, using the strategy with
better performance, cost and benefit from the previous experiment and another
set with 100 questions over a real-world database, the results show that the
proposed approach is sufficient to considerably improve the accuracy of the
prompt strategy. This work concludes with a discussion of the results obtained
and suggests further approaches to simplify the Text-to-SQL task.
|
4 |
Investigating the impact of Generative AI on newcomers' understanding of Software ProjectsLarsen, Knud Ronau, Edvall, Magnus January 2024 (has links)
Context: In both commercial and open-source software development, newcomers often join the development process in the advanced stages of the software development lifecycle. Newcomers frequently face barriers impeding their ability to make early contributions, often caused by a lack of understanding. For this purpose, we have developed an LLM-based tool called SPAC-B that facilitates project-specific question-answering to aid newcomers' understanding of software projects. Objective: Investigate the LLM-based tool's ability to assist newcomers in understanding software projects by measuring its accuracy and conducting an experiment. Method: In this study, a case study is conducted to investigate the accuracy of the tool, measured in relevance, completeness, and correctness. Furthermore, an experiment is performed among software developers to test the tool's ability to help newcomers formulate better plans for open-source issues. Results: SPAC-B achieved an accuracy of 4.60 in relevance, 4.30 in completeness, and 4.28 in correctness on a scale from 1 to 5. It improved the combined mean score of the plans of the 10 participants in our experiments from 1.90 to 2.70, and 8 out of 10 participants found the tool helpful. Conclusions: SPAC-B has demonstrated high accuracy and helpfulness, but further research is needed to confirm if these results can be generalized to a larger population and other contexts of use.
|
5 |
Stora språkmodeller för bedömning av applikationsrecensioner : Implementering och undersökning av stora språkmodeller för att sammanfatta, extrahera och analysera nyckelinformation från användarrecensioner / Large Language Models for application review data : Implementation survey of Large Language Models (LLM) to summarize, extract, and analyze key information from user reviewsvon Reybekiel, Algot, Wennström, Emil January 2024 (has links)
Manuell granskning av användarrecensioner för att extrahera relevant informationkan vara en tidskrävande process. Denna rapport har undersökt om stora språkmodeller kan användas för att sammanfatta, extrahera och analysera nyckelinformation från recensioner, samt hur en sådan applikation kan konstrueras. Det visade sig att olika modeller presterade olika bra beroende på mätvärden ochviktning mellan recall och precision. Vidare visade det sig att fine-tuning av språkmodeller som Llama 3 förbättrade prestationen vid klassifikation av användbara recensioner och ledde, enligt vissa mätvärden, till högre prestation än större språkmodeller som Chat-Bison. För engelskt översatta recensioner hade Llama 3:8b:Instruct, Chat-Bison samt den fine-tunade versionen av Llama 3:8b ett F4-makro-score på 0.89, 0.90 och 0.91 respektive. Ytterligare ett resultat är att de större modellerna Chat-Bison, Text-Bison och Gemini, presterade bättre i fallet för generering av sammanfattande texter, än de mindre modeller som testades vid inmatning av flertalet recensioner åt gången. Generellt sett presterade språkmodellerna också bättre om recensioner först översattes till engelska innan bearbetning, snarare än då recensionerna var skrivna i originalspråk där de majoriteten av recensionerna var skrivna på svenska. En annan lärdom från förbearbetning av recensioner är att antal anrop till dessa språkmodeller kan minimeras genom att filtrera utifrån ordlängd och betyg. Utöver språkmodeller visade resultaten att användningen av vektordatabaser och embeddings kan ge en större överblick över användbara recensioner genom vektordatabasers inbyggda förmåga att hitta semantiska likheter och samla liknande recensioner i kluster. / Manually reviewing user reviews to extract relevant information can be a time consuming process. This report investigates if large language models can be used to summarize, extract, and analyze key information from reviews, and how such anapplication can be constructed. It was discovered that different models exhibit varying degrees of performance depending on the metrics and the weighting between recall and precision. Furthermore, fine-tuning of language models such as Llama 3 was found to improve performance in classifying useful reviews and, according to some metrics, led to higher performance than larger language models like Chat-bison. Specifically, for English translated reviews, Llama 3:8b:Instruct, Chat-bison, and Llama 3:8b fine-tuned had an F4 macro score 0.89, 0.90, 0.91 respectively. A further finding is that the larger models, Chat-Bison, Text-Bison, and Gemini performed better than the smaller models that was tested, when inputting multiple reviews at a time in the case of summary text generation. In general, language models performed better if reviews were first translated into English before processing rather than when reviews were written in the original language where most reviews were written in Swedish. Additionally, another insight from the pre-processing phase, is that the number of API-calls to these language models can be minimized by filtering based on word length and rating. In addition to findings related to language models, the results also demonstrated that the use of vector databases and embeddings can provide a greater overview of reviews by leveraging the databases’ built-in ability to identify semantic similarities and cluster similar reviews together.
|
Page generated in 0.0535 seconds