• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 2
  • 1
  • Tagged with
  • 3
  • 3
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Improving Context Awareness of Transformer Networks using Retrieval-Augmented Generation

Do, Anh, Tran, Saga January 2024 (has links)
The Thermo-Calc software is a key tool in the research process for many material engineers. However, integrating multiple modules in Thermo-Calc requires the user to write code in a Python-based language, which can be challenging for novice programmers. This project aims to enable the generation of such code from user prompts by using existing generative AI models. In particular, we use a retrieval-augmented generation architecture applied to LLaMA and Mistral models. We use Code LLaMA-Instruct models with 7, 13, and 34 billion parameters, and a Mistral-Instruct model with 7 billion parameters. These models are all based on LLaMA 2. We also use a LLaMA 3-Instruct model with 8 billion parameters. All these models are instruction-tuned, which suggests that they have the capability to interpret natural language and identify appropriate options for a command-line program such as Python. In our testing, the LLaMA 3-Instruct model performed best, achieving 53% on the industry benchmark HumanEval and 49% on our internal adequacy assessment at pass@1, which is the expected probability of getting a correct solution when generating a response. This indicates that the model generates approximately every other answer correct. Due to GPU memory limitations, we had to apply quantisation to process the 13 and 34 billion parameter models. Our results revealed a mismatch between model size and optimal levels of quantisation, indicating that reduced precision adversely affects the performance of these models. Our findings suggest that a properly customised large language model can greatly reduce the coding effort of novice programmers, thereby improving productivity in material research.
2

Context matters : Classifying Swedish texts using BERT's deep bidirectional word embeddings

Holmer, Daniel January 2020 (has links)
When classifying texts using a linear classifier, the texts are commonly represented as feature vectors. Previous methods to represent features as vectors have been unable to capture the context of individual words in the texts, in theory leading to a poor representation of natural language. Bidirectional Encoder Representations from Transformers (BERT), uses a multi-headed self-attention mechanism to create deep bidirectional feature representations, able to model the whole context of all words in a sequence. A BERT model uses a transfer learning approach, where it is pre-trained on a large amount of data and can be further fine-tuned for several down-stream tasks. This thesis uses one multilingual, and two dedicated Swedish BERT models, for the task of classifying Swedish texts as of either easy-to-read or standard complexity in their respective domains. The performance on the text classification task using the different models is then compared both with feature representation methods used in earlier studies, as well as with the other BERT models. The results show that all models performed better on the classification task than the previous methods of feature representation. Furthermore, the dedicated Swedish models show better performance than the multilingual model, with the Swedish model pre-trained on more diverse data outperforming the other.
3

Исследование методов оценки выхода продукции предприятия Урал-Асбест при помощи системы компьютерного зрения : магистерская диссертация / Study of methods for assessing the output of the Ural-Asbest enterprise using a computer vision system

Иванов, С. С., Ivanov, S. S. January 2024 (has links)
The object of the study is a computer vision system for quality control of the outgoing products of the mining industry. The subject of the study is semantic segmentation methods, deep neural networks, feature encoders, loss functions. The purpose of the work is to study modern methods of machine learning and architectures of deep neural networks for solving the problem of assessing the output of an open pit mine. The study included: consideration of approaches to image segmentation using neural networks, development and implementation of experiments to compare the effectiveness of different architectures of deep neural networks in the problem of assessing an open pit mine. The work demonstrates the effectiveness of the approach using the transformer architecture, and shows the possibilities of applying the model in further solving the problem. Practical application area: the proposed approach can be used to improve the markup of the original data set, as well as an independent assessment to help an expert determine the quality of the outgoing product. / Объект исследования – система компьютерного зрения для контроля качества выходящей продукции горнодобывающей промышленности. Предмет исследования являются методы семантической сегментации, глубокие нейронные сети, кодировщики признаков, функции потерь. Цель работы – исследование современных методов машинного обучения и архитектур глубоких нейронных сетей для решения задачи оценки выхода продукции с открытого карьера. В процессе исследования проводились: рассмотрение подходов сегментации изображений с помощью нейронных сетей, разработка и реализация экспериментов для сравнения эффективности разных архитектур глубоких нейронных сетей в задаче оценки открытого карьера. В работе продемонстрирована эффективность подхода с использованием архитектуры трансформер, показаны возможности применения модели в дальнейшем решении задачи. Область практического применения: предложенный подход может быть использован для улучшения разметки исходного набора данных, а также независимой оценки для помощи эксперту в определении качества выходящей продукции.

Page generated in 0.056 seconds