Spelling suggestions: "subject:"[een] LANGUAGE MODELS"" "subject:"[enn] LANGUAGE MODELS""
21 |
Similarity Learning and Stochastic Language Models for Tree-Represented MusicBernabeu Briones, José Francisco 20 July 2017 (has links)
Similarity computation is a difficult issue in music information retrieval tasks, because it tries to emulate the special ability that humans show for pattern recognition in general, and particularly in the presence of noisy data. A number of works have addressed the problem of what is the best representation for symbolic music in this context. The tree representation, using rhythm for defining the tree structure and pitch information for leaf and node labelling has proven to be effective in melodic similarity computation. In this dissertation we try to built a system that allowed to classify and generate melodies using the information from the tree encoding, capturing the inherent dependencies which are inside this kind of structure, and improving the current methods in terms of accuracy and running time. In this way, we try to find more efficient methods that is key to use the tree structure in large datasets. First, we study the possibilities of the tree edit similarity to classify melodies using a new approach for estimate the weights of the edit operations. Once the possibilities of the cited approach are studied, an alternative approach is used. For that a grammatical inference approach is used to infer tree languages. The inference of these languages give us the possibility to use them to classify new trees (melodies).
|
22 |
The impacts of code structure analysis, powered by the language model FastTextIvarsson, Gabriel, Håkansson, Noah January 2023 (has links)
The goal of this study was to investigate how the use of language models in the context of code structure analysis could impact how developers manage code structure. To do this, a prototype tool GOSPLAT (GoLang Static Package Language-model Analysis Tool) was created. The objective was to, in a qualitative manner, find themes of both the strengths and shortcomings of GOSPLAT as well as the perceived need and willingness of a tool like this in a company setting. Methods used for this case study were primarily interviews and observations, where the researchers observed subjects when using the tool, as well as further investigating by conducting interviews at which they were more freely able to talk about their experiences. In this case study, both project managers and developers in a company participated. The results were mixed, with the solution both showing promising results for improvements in code quality, as well as limitations where it might have misled the developer. However, during the entire study, all subjects were adamant in their belief in a tool like GOSPLAT, showing genuine interest in incorporating such a tool into their workflow. In conclusion, a genuine need for tools like GOSPLAT was found to exist, and improvement areas were identified to enhance their effectiveness.
|
23 |
Comparative Analysis of Language Models: hallucinations in ChatGPT : Prompt Study / Jämförande analys av språkmodeller: hallucinationer i ChatGPT : Prompt StudieHanna, Elias, Levic, Alija January 2023 (has links)
This thesis looks at the percentage of hallucinations in two large language models (LLM), ChatGPT 3.5 and ChatGPT 4 output for a set of prompts. This work was motivated by two factors: the release of ChatGPT 4 and its parent company OpenAI, claiming it to be much more potent than its predecessor ChatGPT 3.5, which raised questions regarding the capabilities of the LLM. Furthermore, the other factor is that ChatGPT 3.5 showcased hallucinations (creating material that is factually wrong, deceptive, or untrue.) in response to different prompts, as shown by other studies. The intended audience was members of the computer science community, such as researchers, software developers, and policymakers. The aim was to highlight large language models' potential capabilities and provide insights into their dependability. This study used a quasi-experimental study design and a systematic literature review.Our hypothesis predicted that the percentage of hallucinations (creating factually wrong, deceptive, or untrue material) would be more prevalent in ChatGPT 3.5 compared to ChatGPT 4. We based our prediction on the fact that OpenAI trained ChatGPT 4 on more material than ChatGPT 3.5. We experimented on both LLMS, and our findings supported The hypothesis. Furthermore, we looked into the literature and found studies that also agree that ChatGPT 4 is better than ChatGPT 3.5. The research concluded with suggestions for future work, like using extensive datasets and comparing the performance of different models, not only ChatGPT 3.5 and ChatGPT 4.
|
24 |
[pt] SUMARIZAÇÃO AUTOMÁTICA DE MULTIPLAS AVALIAÇÕES UTILIZANDO AJUSTE FINO DE MODELOS DE LINGUAGEM TRANSFORMERS / [en] UNSUPERVISED MULTI-REVIEW SUMMARIZATION USING FINE-TUNED TRANSFORMER LANGUAGE MODELSLUCAS ROBERTO DA SILVA 05 July 2021 (has links)
[pt] Sumarização automática é a tarefa de gerar resumos concisos, corretos e
com consistência factual. A tarefa pode ser aplicada a diversos estilos textuais,
dentre eles notícias, publicações acadêmicas e avaliações de produtos ou
lugares. A presente dissertação aborda a sumarização de múltiplas avaliações.
Esse tipo de aplicação se destaca por sua natureza não supervisionada e
pela necessidade de lidar com a redundância das informações presentes nas
avaliações. Os trabalhos de sumarização automática são avaliados utilizando
a métrica ROUGE, que se baseia na comparação de n-gramas entre o texto
de referência e o resumo gerado. A falta de dados supervisionados motivou a
criação da arquitetura MeanSum, que foi a primeira arquitetura de rede neural
baseada em um modelo não supervisionado para essa tarefa. Ela é baseada
em auto-encoder e foi estendida por outros trabalhos, porém nenhum deles
apresentou os efeitos do uso do mecanismo de atenção e tarefas auxiliares
durante o treinamento do modelo. O presente trabalho é dividido em duas
etapas. A primeira trata de um experimento no qual extensões à arquitetura
do MeanSum foram propostas para acomodar mecanismos de atenção e tarefas
auxiliares de classificação de sentimento. Ainda nessa etapa, explora-se o
uso de dados sintéticos para adaptar modelos supervisionados a tarefas não
supervisionadas. Na segunda etapa, os resultados obtidos anteriormente foram
utilizados para realizar um estudo sobre o uso de ajuste fino (fine-tuning)
de modelos de linguagem Transformers pré-treinados. A utilização desses
modelos mostrou ser uma alternativa promissora para enfrentar a natureza não
supervisionada do problema, apresentando um desempenho de + 4 ROUGE
quando comparado a trabalhos anteriores. / [en] Automatic summarization is the task of generating concise, correct, and
factual summaries. The task can be applied to different textual styles, including
news, academic publications, and product or place reviews. This dissertation
addresses the summary of multiple evaluations. This type of application stands
out for its unsupervised nature and the need to deal with the redundancy of
the information present in the reviews. The automatic summarization works
are evaluated using the ROUGE metric, which is based on the comparison of
n-grans between the reference text and the generated summary. The lack of
supervised data motivated the creation of the MeanSum architecture, which
was the first neural network architecture based on an unsupervised model for
this task. It is based on auto-encoder and has been extended to other works,
but none explored the effects of using the attention mechanism and auxiliary
tasks during training. The present work is divided into two parts: the first deals
with an experiment in which we make extensions to the MeanSum architecture,
adding attention mechanisms and auxiliary sentiment classification tasks. In
the same experiment, we explore synthetic data to adapt supervised models
for unsupervised tasks. In the second part, we used the results previously
obtained to carry out a second study on fine-tuning pre-trained Transformer
language models. The use of these models showed a promising alternative to
the unsupervised nature of the problem, outperforming previous works by +
4 ROUGE.
|
25 |
The Influence of Political Media on Large Language Models: Impacts on Information Synthesis, Reasoning, and Demographic RepresentationShaw, Alexander Glenn 16 August 2023 (has links) (PDF)
This thesis investigates the impact of finetuning the LLaMA 33B language model on partisan news datasets, revealing negligible changes and underscoring the enduring influence of pretraining datasets on model opinions. Training nine models across nine distinct news datasets spanning three topics and two ideologies, the study found consistent demographic representation, predominantly favoring liberal, college-educated, high-income, and non-religious demographics. Interestingly, a depolarizing effect emerged from partisan news finetuning, suggesting that intense exposure to topic-specific information might lead to depolarization, irrespective of ideological alignment. Despite the exposure to contrasting viewpoints, LLaMA 33B maintained its common sense reasoning ability, showing minimal variance on evaluation metrics like Hellaswag accuracy, ARC accuracy, and TruthfulQA MC1 and MC2. These results might indicate robustness in common sense reasoning or a deficiency in synthesizing diverse contextual information. Ultimately, this thesis demonstrates the resilience of high-performing language models like LLaMA 33B against targeted ideological bias, demonstrating their continued functionality and reasoning ability, even when subjected to highly partisan information environments.
|
26 |
Leveraging Large Language Models Trained on Code for Symbol BindingRobinson, Joshua 09 August 2022 (has links) (PDF)
While large language models like GPT-3 have achieved impressive results in the zero-, one-, and few-shot settings, they still significantly underperform on some tasks relative to the state of the art (SOTA). For many tasks it would be useful to have answer options explicitly listed out in a multiple choice format, decreasing computational cost and allowing the model to reason about the relative merits of possible answers. We argue that the reason this hasn't helped models like GPT-3 close the gap with the SOTA is that these models struggle with symbol binding - associating each answer option with a symbol that represents it. To ameliorate this situation we introduce index prompting, a way of leveraging language models trained on code to successfully answer multiple choice formatted questions. When used with the OpenAI Codex model, our method improves accuracy by about 18% on average in the few-shot setting relative to GPT-3 across 8 datasets representing 4 common NLP tasks. It also achieves a new single-model state of the art on ANLI R3, ARC (Easy), and StoryCloze, suggesting that GPT-3's latent "understanding" has been previously underestimated.
|
27 |
PROMPT-ASSISTED RELATION FUSION IN KNOWLEDGE GRAPH ACQUISITIONXiaonan Jing (14230196) 08 December 2022 (has links)
<p> </p>
<p>Knowledge Base (KB) systems have been studied for decades. Various approaches have been explored in acquiring accurate and scalable KBs. Recently, many studies focus on Knowledge Graphs (KG) which uses a simple triple representation. A triple consists of a head entity, a predicate, and a tail entity. The head entity and the tail entity are connected by the predicate which indicates a certain relation between them. Three main research fields can be identified in KG acquisition. First, relation extraction aims at extracting the triples from the raw data. Second, entity linking addresses mapping the same entity together. Last, knowledge fusion integrates heterogeneous sources into one. This dissertation focuses on relation fusion, which is a sub-process of knowledge fusion. More specifically, this dissertation aims to investigate if the concurrently popular prompt-based learning method can assist with relation fusion. A framework to acquire a KG is proposed to work with a real world dataset. The framework contains a Preprocessing module which annotates raw sentences and links known entities to the triples; a Prompting module, which generates and processes prompts for prediction with Pretrained Language Models (PLMs); and a Relation Fusion module, which creates predicate representations, clusters embeddings, and derives cluster labels. A series of experiments with comparison prompting groups are conducted. The results indicate that prompt-based learning, if applied appropriately, can help with grouping similar predicates. The framework proposed in this dissertation can be used eectively for assisting human experts with the creation of relation types during knowledge acquisition. </p>
|
28 |
CALaMo: a Construsctionist perspective on the Analysis of linguistic behaviour of Language ModelsPannitto, Ludovica 17 May 2023 (has links)
In recent years, Neural Language Models (NLMs) have consistently demonstrated increasing linguistic abilities. However, the extent to which such networks can actually learn grammar remains an object of investigation, and experimental results are often inconclusive. Notably, the mainstream evaluation framework in which NLMs are tested seems largely based on Generative Grammar and nativist principles, and a shared constructionist approach on the matter has not yet emerged: this is at odds with the fact that usage-based theories are actually better suited to inspect the behaviour of such models. The main contribution of this thesis is the introduction of CALaMo, a novel framework for evaluating Neural Language Models’ linguistic abilities, using a constructionist approach. We especially aim at formalizing the relationship between the computational modelling phase and the underlying linguistic theory, thus allowing a more refined and informed discussion of settings and results. We focus on two specific areas that, we believe, are currently not easily tractable within the mainstream evaluation framework. The first scenario deals with language acquisition from child-directed data. Our main experimental result shows how it is possible to follow schematization paths during the acquisition process of the model, and how this relates to core hypotheses in constructionist theories. The second scenario deconstructs the mainstream view of the Neural Model as an average idealized speaker by proposing a way to simulate and analyze a population of artificial individuals. We show how the
amount of “shared linguistic knowledge” across speakers is highly dependent on the specific linguistic background of each individual. Overall, we believe our framework opens the path for future discussion on the role of computational modelling in usage-based linguistic theory and vice versa, and provides a new formal methodology to both fields of study.
|
29 |
ChatGPT: A Good Computer Engineering Student? : An Experiment on its Ability to Answer Programming Questions from ExamsLoubier, Michael January 2023 (has links)
The release of ChatGPT has really set new standards for what an artificial intelligence chatbot should be. It has even shown its potential in answering university-level exam questions from different subjects. This research is focused on evaluating its capabilities in programming subjects. To achieve this, coding questions taken from software engineering exams were posed to the AI (N = 23) through an experiment. Then, statistical analysis was done to find out how good of a student ChatGPT is by analyzing its answer’s correctness, degree of completion, diversity of response, speed of response, extraneity, number of errors, length of response and confidence levels. GPT-3.5 is the version analyzed. The experiment was done using questions from three different programming subjects. Afterwards, results showed a 93% rate of correct answer generation, demonstrating its competence. However, it was found that the AI occasionally produces unnecessary lines of code that were not asked for and thus treated as extraneity. The confidence levels given by ChatGPT, which were always high, also didn't always align with response quality which showed the subjectiveness of the AI’s self-assessment. Answer diversity was also a concern, where most answers were repeatedly written nearly the same way. Moreover, when there was diversity in the answers, it also caused much more extraneous code. If ChatGPT was to be blind tested for a software engineering exam containing a good number of coding questions, unnecessary lines of code and comments could be what gives it away as being an AI. Nonetheless, ChatGPT was found to have great potential as a learning tool. It can offer explanations, debugging help, and coding guidance just as any other tool or person could. It is not perfect though, so it should be used with caution.
|
30 |
Innovating the Study of Self-Regulated Learning: An Exploration through NLP, Generative AI, and LLMsGamieldien, Yasir 12 September 2023 (has links)
This dissertation explores the use of natural language processing (NLP) and large language models (LLMs) to analyze student self-regulated learning (SRL) strategies in response to exam wrappers. Exam wrappers are structured reflection activities that prompt students to practice SRL after they get their graded exams back. The dissertation consists of three manuscripts that compare traditional qualitative analysis with NLP-assisted approaches using transformer-based models including GPT-3.5, a state-of-the-art LLM. The data set comprises 3,800 student responses from an engineering physics course. The first manuscript develops two NLP-assisted codebooks for identifying learning strategies related to SRL in exam wrapper responses and evaluates the agreement between them and traditional qualitative analysis. The second manuscript applies a novel NLP technique called zero-shot learning (ZSL) to classify student responses into the codes developed in the first manuscript and assesses the accuracy of this method by evaluating a subset of the full dataset. The third manuscript identifies the distribution and differences of learning strategies and SRL constructs among students of different exam performance profiles using the results from the second manuscript. The dissertation demonstrates the potential of NLP and LLMs to enhance qualitative research by providing scalable, robust, and efficient methods for analyzing large corpora of textual data. The dissertation also contributes to the understanding of SRL in engineering education by revealing the common learning strategies, impediments, and SRL constructs that students report they use while preparing for exams in a first-year engineering physics course. The dissertation suggests implications, limitations, and directions for future research on NLP, LLMs, and SRL. / Doctor of Philosophy / This dissertation is about using artificial intelligence (AI) to help researchers and teachers understand how students learn from their exams. Exams are not only a way to measure what students know, but also a chance for students to reflect on how they studied and what they can do better next time. One way that students can reflect is by using exam wrappers, which are short questions that students answer after they get their graded exams back. A type of AI called natural language processing (NLP) is used in this dissertation, which can analyze text and find patterns and meanings in it. This study also uses a powerful AI tool called GPT-3.5, which can generate text and answer questions. The dissertation has three manuscripts that compare the traditional way of analyzing exam wrappers, which is done by hand, with the new way of using NLP and GPT-3.5, evaluate a specific promising NLP method, and use this method to try and gain a deeper understanding in students self-regulated learning (SRL) while preparing for exams. The data comes from 3,800 exam wrappers from a physics course for engineering students. The first manuscript develops a way of using NLP and GPT-3.5 to find out what learning strategies and goals students talk about in their exam wrappers and compares it to more traditional methods of analysis. The second manuscript tests how accurate a specific NLP technique is in finding these strategies and goals. The third manuscript looks at how different students use different strategies and goals depending on how well they did on the exams using the NLP technique in the second manuscript. I found that NLP and GPT-3.5 can aid in analyzing exam wrappers faster and provide nuanced insights when compared with manual approaches. The dissertation also shows what learning strategies and goals are most discussed for engineering students as they prepare for exams. The dissertation gives some suggestions, challenges, and ideas for future research on AI and learning from exams.
|
Page generated in 0.032 seconds