Global ETD Search

81	Frontiers of Large Language Models: Empowering Decision Optimization, Scene Understanding, and Summarization Through Advanced Computational Approaches de Curtò i Díaz, Joaquim 23 January 2024 (has links) Tesis por compendio / [ES] El advenimiento de los Large Language Models (LLMs) marca una fase transformadora en el campo de la Inteligencia Artificial (IA), significando el cambio hacia sistemas inteligentes y autónomos capaces de una comprensión y toma de decisiones complejas. Esta tesis profundiza en las capacidades multifacéticas de los LLMs, explorando sus posibles aplicaciones en la optimización de decisiones, la comprensión de escenas y tareas avanzadas de resumen de video en diversos contextos. En el primer segmento de la tesis, el foco está en la comprensión semántica de escenas de Vehículos Aéreos No Tripulados (UAVs). La capacidad de proporcionar instantáneamente datos de alto nivel y señales visuales sitúa a los UAVs como plataformas ideales para realizar tareas complejas. El trabajo combina el potencial de los LLMs, los Visual Language Models (VLMs), y los sistemas de detección objetos de última generación para ofrecer descripciones de escenas matizadas y contextualmente precisas. Se presenta una implementación práctica eficiente y bien controlada usando microdrones en entornos complejos, complementando el estudio con métricas de legibilidad estandarizadas propuestas para medir la calidad de las descripciones mejoradas por los LLMs. Estos avances podrían impactar significativamente en sectores como el cine, la publicidad y los parques temáticos, mejorando las experiencias de los usuarios de manera exponencial. El segundo segmento arroja luz sobre el problema cada vez más crucial de la toma de decisiones bajo incertidumbre. Utilizando el problema de Multi-Armed Bandits (MAB) como base, el estudio explora el uso de los LLMs para informar y guiar estrategias en entornos dinámicos. Se postula que el poder predictivo de los LLMs puede ayudar a elegir el equilibrio correcto entre exploración y explotación basado en el estado actual del sistema. A través de pruebas rigurosas, la estrategia informada por los LLMs propuesta demuestra su adaptabilidad y su rendimiento competitivo frente a las estrategias convencionales. A continuación, la investigación se centra en el estudio de las evaluaciones de bondad de ajuste de las Generative Adversarial Networks (GANs) utilizando la Signature Transform. Al proporcionar una medida eficiente de similitud entre las distribuciones de imágenes, el estudio arroja luz sobre la estructura intrínseca de las muestras generadas por los GANs. Un análisis exhaustivo utilizando medidas estadísticas como las pruebas de Kruskal-Wallis proporciona una comprensión más amplia de la convergencia de los GANs y la bondad de ajuste. En la sección final, la tesis introduce un nuevo benchmark para la síntesis automática de vídeos, enfatizando la integración armoniosa de los LLMs y la Signature Transform. Se propone un enfoque innovador basado en los componentes armónicos capturados por la Signature Transform. Las medidas son evaluadas extensivamente, demostrando ofrecer una precisión convincente que se correlaciona bien con el concepto humano de un buen resumen. Este trabajo de investigación establece a los LLMs como herramientas poderosas para abordar tareas complejas en diversos dominios, redefiniendo la optimización de decisiones, la comprensión de escenas y las tareas de resumen de video. No solo establece nuevos postulados en las aplicaciones de los LLMs, sino que también establece la dirección para futuros trabajos en este emocionante y rápidamente evolucionante campo. / [CA] L'adveniment dels Large Language Models (LLMs) marca una fase transformadora en el camp de la Intel·ligència Artificial (IA), significat el canvi cap a sistemes intel·ligents i autònoms capaços d'una comprensió i presa de decisions complexes. Aquesta tesi profunditza en les capacitats multifacètiques dels LLMs, explorant les seues possibles aplicacions en l'optimització de decisions, la comprensió d'escenes i tasques avançades de resum de vídeo en diversos contexts. En el primer segment de la tesi, el focus està en la comprensió semàntica d'escenes de Vehicles Aeris No Tripulats (UAVs). La capacitat de proporcionar instantàniament dades d'alt nivell i senyals visuals situa els UAVs com a plataformes ideals per a realitzar tasques complexes. El treball combina el potencial dels LLMs, els Visual Language Models (VLMs), i els sistemes de detecció d'objectes d'última generació per a oferir descripcions d'escenes matisades i contextualment precises. Es presenta una implementació pràctica eficient i ben controlada usant microdrons en entorns complexos, complementant l'estudi amb mètriques de llegibilitat estandarditzades proposades per a mesurar la qualitat de les descripcions millorades pels LLMs. Aquests avenços podrien impactar significativament en sectors com el cinema, la publicitat i els parcs temàtics, millorant les experiències dels usuaris de manera exponencial. El segon segment arroja llum sobre el problema cada vegada més crucial de la presa de decisions sota incertesa. Utilitzant el problema dels Multi-Armed Bandits (MAB) com a base, l'estudi explora l'ús dels LLMs per a informar i guiar estratègies en entorns dinàmics. Es postula que el poder predictiu dels LLMs pot ajudar a triar l'equilibri correcte entre exploració i explotació basat en l'estat actual del sistema. A través de proves rigoroses, l'estratègia informada pels LLMs proposada demostra la seua adaptabilitat i el seu rendiment competitiu front a les estratègies convencionals. A continuació, la recerca es centra en l'estudi de les avaluacions de bondat d'ajust de les Generative Adversarial Networks (GANs) utilitzant la Signature Transform. En proporcionar una mesura eficient de similitud entre les distribucions d'imatges, l'estudi arroja llum sobre l'estructura intrínseca de les mostres generades pels GANs. Una anàlisi exhaustiva utilitzant mesures estadístiques com les proves de Kruskal-Wallis proporciona una comprensió més àmplia de la convergència dels GANs i la bondat d'ajust. En la secció final, la tesi introdueix un nou benchmark per a la síntesi automàtica de vídeos, enfatitzant la integració harmònica dels LLMs i la Signature Transform. Es proposa un enfocament innovador basat en els components harmònics capturats per la Signature Transform. Les mesures són avaluades extensivament, demostrant oferir una precisió convincent que es correlaciona bé amb el concepte humà d'un bon resum. Aquest treball de recerca estableix els LLMs com a eines poderoses per a abordar tasques complexes en diversos dominis, redefinint l'optimització de decisions, la comprensió d'escenes i les tasques de resum de vídeo. No solament estableix nous postulats en les aplicacions dels LLMs, sinó que també estableix la direcció per a futurs treballs en aquest emocionant i ràpidament evolucionant camp. / [EN] The advent of Large Language Models (LLMs) marks a transformative phase in the field of Artificial Intelligence (AI), signifying the shift towards intelligent and autonomous systems capable of complex understanding and decision-making. This thesis delves deep into the multifaceted capabilities of LLMs, exploring their potential applications in decision optimization, scene understanding, and advanced summarization tasks in diverse contexts. In the first segment of the thesis, the focus is on Unmanned Aerial Vehicles' (UAVs) semantic scene understanding. The capability of instantaneously providing high-level data and visual cues positions UAVs as ideal platforms for performing complex tasks. The work combines the potential of LLMs, Visual Language Models (VLMs), and state-of-the-art detection pipelines to offer nuanced and contextually accurate scene descriptions. A well-controlled, efficient practical implementation of microdrones in challenging settings is presented, supplementing the study with proposed standardized readability metrics to gauge the quality of LLM-enhanced descriptions. This could significantly impact sectors such as film, advertising, and theme parks, enhancing user experiences manifold. The second segment brings to light the increasingly crucial problem of decision-making under uncertainty. Using the Multi-Armed Bandit (MAB) problem as a foundation, the study explores the use of LLMs to inform and guide strategies in dynamic environments. It is postulated that the predictive power of LLMs can aid in choosing the correct balance between exploration and exploitation based on the current state of the system. Through rigorous testing, the proposed LLM-informed strategy showcases its adaptability and its competitive performance against conventional strategies. Next, the research transitions into studying the goodness-of-fit assessments of Generative Adversarial Networks (GANs) utilizing the Signature Transform. By providing an efficient measure of similarity between image distributions, the study sheds light on the intrinsic structure of the samples generated by GANs. A comprehensive analysis using statistical measures, such as the test Kruskal-Wallis, provides a more extensive understanding of the GAN convergence and goodness of fit. In the final section, the thesis introduces a novel benchmark for automatic video summarization, emphasizing the harmonious integration of LLMs and Signature Transform. An innovative approach grounded in the harmonic components captured by the Signature Transform is put forth. The measures are extensively evaluated, proving to offer compelling accuracy that correlates well with the concept of a good summary. This research work establishes LLMs as powerful tools in addressing complex tasks across diverse domains, redefining decision optimization, scene understanding, and summarization tasks. It not only breaks new ground in the applications of LLMs but also sets the direction for future work in this exciting and rapidly evolving field. / De Curtò I Díaz, J. (2023). Frontiers of Large Language Models: Empowering Decision Optimization, Scene Understanding, and Summarization Through Advanced Computational Approaches [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/202200 / Compendio Autonomous Systems Artificial Intelligence Large Language Models Visual Language Models Unmanned Aerial Vehicles Semantic Scene Understanding Multi-Armed Bandits Signature Transform Generative Adversarial Networks (GANs)
82	AI-Enhanced Methods in Autonomous Systems: Large Language Models, DL Techniques, and Optimization Algorithms de Zarzà i Cubero, Irene 23 January 2024 (has links) Tesis por compendio / [ES] La proliferación de sistemas autónomos y su creciente integración en la vida humana cotidiana han abierto nuevas fronteras de investigación y desarrollo. Dentro de este ámbito, la presente tesis se adentra en las aplicaciones multifacéticas de los LLMs (Large Language Models), técnicas de DL (Deep Learning) y algoritmos de optimización en el ámbito de estos sistemas autónomos. A partir de los principios de los métodos potenciados por la Inteligencia Artificial (IA), los estudios englobados en este trabajo convergen en la exploración y mejora de distintos sistemas autónomos que van desde sistemas de platooning de camiones en sistemas de comunicaciones Beyond 5G (B5G), Sistemas Multi-Agente (SMA), Vehículos Aéreos No Tripulados (UAV), estimación del área de incendios forestales, hasta la detección temprana de enfermedades como el glaucoma. Un enfoque de investigación clave, perseguido en este trabajo, gira en torno a la implementación innovadora de controladores PID adaptativos en el platooning de vehículos, facilitada a través de la integración de los LLMs. Estos controladores PID, cuando se infunden con capacidades de IA, ofrecen nuevas posibilidades en términos de eficiencia, fiabilidad y seguridad de los sistemas de platooning. Desarrollamos un modelo de DL que emula un controlador PID adaptativo, mostrando así su potencial en las redes y radios habilitadas para IA. Simultáneamente, nuestra exploración se extiende a los sistemas multi-agente, proponiendo una Teoría Coevolutiva Extendida (TCE) que amalgama elementos de la dinámica coevolutiva, el aprendizaje adaptativo y las recomendaciones de estrategias basadas en LLMs. Esto permite una comprensión más matizada y dinámica de las interacciones estratégicas entre agentes heterogéneos en los SMA. Además, nos adentramos en el ámbito de los vehículos aéreos no tripulados (UAVs), proponiendo un sistema para la comprensión de vídeos que crea una log de la historia basada en la descripción semántica de eventos y objetos presentes en una escena capturada por un UAV. El uso de los LLMs aquí permite razonamientos complejos como la predicción de eventos con mínima intervención humana. Además, se aplica una metodología alternativa de DL para la estimación del área afectada durante los incendios forestales. Este enfoque aprovecha una nueva arquitectura llamada TabNet, integrada con Transformers, proporcionando así una estimación precisa y eficiente del área. En el campo de la salud, nuestra investigación esboza una metodología exitosa de detección temprana del glaucoma. Utilizando un enfoque de entrenamiento de tres etapas con EfficientNet en imágenes de retina, logramos una alta precisión en la detección de los primeros signos de esta enfermedad. A través de estas diversas aplicaciones, el foco central sigue siendo la exploración de metodologías avanzadas de IA dentro de los sistemas autónomos. Los estudios dentro de esta tesis buscan demostrar el poder y el potencial de las técnicas potenciadas por la IA para abordar problemas complejos dentro de estos sistemas. Estas investigaciones en profundidad, análisis experimentales y soluciones desarrolladas arrojan luz sobre el potencial transformador de las metodologías de IA en la mejora de la eficiencia, fiabilidad y seguridad de los sistemas autónomos, contribuyendo en última instancia a la futura investigación y desarrollo en este amplio campo. / [CA] La proliferació de sistemes autònoms i la seua creixent integració en la vida humana quotidiana han obert noves fronteres de recerca i desenvolupament. Dins d'aquest àmbit, la present tesi s'endinsa en les aplicacions multifacètiques dels LLMs (Large Language Models), tècniques de DL (Deep Learning) i algoritmes d'optimització en l'àmbit d'aquests sistemes autònoms. A partir dels principis dels mètodes potenciats per la Intel·ligència Artificial (IA), els estudis englobats en aquest treball convergeixen en l'exploració i millora de diferents sistemes autònoms que van des de sistemes de platooning de camions en sistemes de comunicacions Beyond 5G (B5G), Sistemes Multi-Agent (SMA), Vehicles Aeris No Tripulats (UAV), estimació de l'àrea d'incendis forestals, fins a la detecció precoç de malalties com el glaucoma. Un enfocament de recerca clau, perseguit en aquest treball, gira entorn de la implementació innovadora de controladors PID adaptatius en el platooning de vehicles, facilitada a través de la integració dels LLMs. Aquests controladors PID, quan s'infonen amb capacitats d'IA, ofereixen noves possibilitats en termes d'eficiència, fiabilitat i seguretat dels sistemes de platooning. Desenvolupem un model de DL que emula un controlador PID adaptatiu, mostrant així el seu potencial en les xarxes i ràdios habilitades per a IA. Simultàniament, la nostra exploració s'estén als sistemes multi-agent, proposant una Teoria Coevolutiva Estesa (TCE) que amalgama elements de la dinàmica coevolutiva, l'aprenentatge adaptatiu i les recomanacions d'estratègies basades en LLMs. Això permet una comprensió més matissada i dinàmica de les interaccions estratègiques entre agents heterogenis en els SMA. A més, ens endinsem en l'àmbit dels Vehicles Aeris No Tripulats (UAVs), proposant un sistema per a la comprensió de vídeos que crea un registre de la història basat en la descripció semàntica d'esdeveniments i objectes presents en una escena capturada per un UAV. L'ús dels LLMs aquí permet raonaments complexos com la predicció d'esdeveniments amb mínima intervenció humana. A més, s'aplica una metodologia alternativa de DL per a l'estimació de l'àrea afectada durant els incendis forestals. Aquest enfocament aprofita una nova arquitectura anomenada TabNet, integrada amb Transformers, proporcionant així una estimació precisa i eficient de l'àrea. En el camp de la salut, la nostra recerca esbossa una metodologia exitosa de detecció precoç del glaucoma. Utilitzant un enfocament d'entrenament de tres etapes amb EfficientNet en imatges de retina, aconseguim una alta precisió en la detecció dels primers signes d'aquesta malaltia. A través d'aquestes diverses aplicacions, el focus central continua sent l'exploració de metodologies avançades d'IA dins dels sistemes autònoms. Els estudis dins d'aquesta tesi busquen demostrar el poder i el potencial de les tècniques potenciades per la IA per a abordar problemes complexos dins d'aquests sistemes. Aquestes investigacions en profunditat, anàlisis experimentals i solucions desenvolupades llançen llum sobre el potencial transformador de les metodologies d'IA en la millora de l'eficiència, fiabilitat i seguretat dels sistemes autònoms, contribuint en última instància a la futura recerca i desenvolupament en aquest ampli camp. / [EN] The proliferation of autonomous systems, and their increasing integration with day-to-day human life, have opened new frontiers of research and development. Within this scope, the current thesis dives into the multifaceted applications of Large Language Models (LLMs), Deep Learning (DL) techniques, and Optimization Algorithms within the realm of these autonomous systems. Drawing from the principles of AI-enhanced methods, the studies encapsulated within this work converge on the exploration and enhancement of different autonomous systems ranging from B5G Truck Platooning Systems, Multi-Agent Systems (MASs), Unmanned Aerial Vehicles, Forest Fire Area Estimation, to the early detection of diseases like Glaucoma. A key research focus, pursued in this work, revolves around the innovative deployment of adaptive PID controllers in vehicle platooning, facilitated through the integration of LLMs. These PID controllers, when infused with AI capabilities, offer new possibilities in terms of efficiency, reliability, and security of platooning systems. We developed a DL model that emulates an adaptive PID controller, thereby showcasing its potential in AI-enabled radio and networks. Simultaneously, our exploration extends to multi-agent systems, proposing an Extended Coevolutionary (EC) Theory that amalgamates elements of coevolutionary dynamics, adaptive learning, and LLM-based strategy recommendations. This allows for a more nuanced and dynamic understanding of the strategic interactions among heterogeneous agents in MASs. Moreover, we delve into the realm of Unmanned Aerial Vehicles (UAVs), proposing a system for video understanding that employs a language-based world-state history of events and objects present in a scene captured by a UAV. The use of LLMs here enables open-ended reasoning such as event forecasting with minimal human intervention. Furthermore, an alternative DL methodology is applied for the estimation of the affected area during forest fires. This approach leverages a novel architecture called TabNet, integrated with Transformers, thus providing accurate and efficient area estimation. In the field of healthcare, our research outlines a successful early detection methodology for glaucoma. Using a three-stage training approach with EfficientNet on retinal images, we achieved high accuracy in detecting early signs of this disease. Across these diverse applications, the core focus remains: the exploration of advanced AI methodologies within autonomous systems. The studies within this thesis seek to demonstrate the power and potential of AI-enhanced techniques in tackling complex problems within these systems. These in-depth investigations, experimental analyses, and developed solutions shed light on the transformative potential of AI methodologies in improving the efficiency, reliability, and security of autonomous systems, ultimately contributing to future research and development in this expansive field. / De Zarzà I Cubero, I. (2023). AI-Enhanced Methods in Autonomous Systems: Large Language Models, DL Techniques, and Optimization Algorithms [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/202201 / Compendio Autonomous Systems Large Language Models Deep Learning Optimization Vehicle Platooning Multi-Agent Systems B5G Networks Unmanned Aerial Vehicles AI-Enhanced Control Systems Coevolutionary Algorithms Socratic Video Understanding Forest Fire Estimation Glaucoma AI in Healthcare Diagnostics
83	Direct Preference Optimization for Improved Technical WritingAssistance : A Study of How Language Models Can Support the Writing of Technical Documentation at Saab / En studie i hur språkmodeller kan stödja skrivandet av teknisk dokumentation på Saab Bengtsson, Hannes, Habbe, Patrik January 2024 (has links) This thesis explores the potential of Large Language Models (LLMs) to assist in the technical documentation process at Saab. With the increasing complexity and regulatory demands on such documentation, the objective is to investigate advanced natural language processing techniques as a means of streamlining the creation of technical documentation. Although many standards exist, this thesis particularly focuses on the standard ASD-STE100, Simplified Technical English abbrv. STE, a controlled language for technical documentation. STE's primary aim is to ensure that technical documents are understandable to individuals regardless of their native language or English proficiency. The study focuses on the implementation of Direct Preference Optimization (DPO) and Supervised Instruction Fine-Tuning (SIFT) to refine the capabilities of LLMs in producing clear and concise outputs that comply with STE. Through a series of experiments, we investigate the effectiveness of LLMs in interpreting and simplifying technical language, with a particular emphasis on adherence to STE standards. The study utilizes a dataset comprised of target data paired with synthetic source data generated by a LLM. We apply various model training strategies, including zero-shot performance, supervised instruction fine-tuning, and direct preference optimization. We evaluate the various models' output using established quantitative metrics for text simplification and substitute human evaluators with company internal software for evaluating adherence to company standards and STE. Our findings suggest that while LLMs can significantly contribute to the technical writing process, the choice of training methods and the quality of data play crucial roles in the model's performance. This study shows how LLMs can improve productivity and reduce manual work. It also looks at the problems and suggests ways to make technical documentation automation better in the future. Large Language Models LLM Natural Language Processing NLP Technical Writing Simplified Technical English Direct Preference Optimization DPO Supervised Instruction Fine-tuning LoRA AI
84	Разработка инструмента для автоматического выявления уязвимостей в исходном коде на основе глубоких нейронных сетей : магистерская диссертация / Development of a tool for automatic detection of vulnerabilities in the source code based on deep neural networks Русинова, З. Р., Rusinova, Z. R. January 2024 (has links) This work is devoted to the development of an automatic code testing tool that allows to effectively detect and classify vulnerabilities using deep learning methods, in particular, natural language processing methods. The paper provides an overview of existing approaches and methods of machine learning, analyzes and selects datasets and machine learning algorithms to solve the task, describes the infrastructure for conducting research and tracking their results. In the course of the study, binary classification models, multiclass classification models for determining CWE identifiers, and large language models for generating descriptions of detected vulnerabilities were studied. A new approach has also been developed to localize vulnerabilities at the line level of program code using the explainability methods of the SHAP and LIME machine learning models. / Данная работа посвящена разработке инструмента автоматического тестирования программного кода, который позволяет эффективно обнаруживать и классифицировать уязвимости с помощью методов глубокого обучения, в частности, методов обработки естественного языка. В работе представлен обзор существующих подходов и методов машинного обучения, проведен анализ и подбор наборов данных и алгоритмов машинного обучения для решения поставленной задачи, описана инфраструктура для проведения исследований и отслеживания их результатов. В ходе исследования изучены модели бинарной классификации, модели многоклассовой классификации для определения идентификаторов CWE, большие языковые модели для генерации описаний обнаруженных уязвимостей. Также был разработан новый подход для локализации уязвимостей на уровне строк программного кода с использованием методов объяснимости моделей машинного обучения SHAP и LIME. MASTER'S THESIS MACHINE LEARNING MACHINE LEARNING MODELS VULNERABILITY DETECTION IN CODE LARGE LANGUAGE MODELS МАШИННОЕ ОБУЧЕНИЕ
85	Исследование возможности применения базисных моделей для прогнозирования временных рядов : магистерская диссертация / Study of the possibility of using basic models for forecasting time series Семерикова, К. А., Semerikova, K. A. January 2024 (has links) In this work, a comparative analysis of basic models with classical methods was carried out, a conclusion was formed on the possibility of practical application of basic models for forecasting time series. In the first part of the work, a theoretical analysis of the available literature on the topic of the study was carried out, the main features of modern basic models were studied. In the second chapter of the work, a scheme of the experiments was presented. A list of classical forecasting methods used in the study is provided, the process of their automated training using the AutoGluon framework is described. Among the basic models, the following were selected: Chronos, TimeGPT and Lag-Llama. The Chronos model was used only in the mode without examples, and the Time-GPT and Lag-Llama models, in addition, were evaluated after fine-tuning. To conduct the assessment, reference data sets from the Monash Repository were selected. In the third chapter of the work, the obtained results were interpreted and recommendations for the use of basic models in forecasting were formulated. / В данной работе был проведен сравнительный анализ базисных моделей с классическими методами, сформирован вывод о возможности практического применения базисных моделей для прогнозирования временных рядов. В первой части работы был проведен теоретический анализ доступной литературы по теме исследования, изучены основные особенности современных базисных моделей. Во второй главе работы была представлена схема проводившихся экспериментов. Приведен перечень классических методов прогнозирования, используемых в исследовании, описан процесс их автоматизированного обучения с применением фреймворка AutoGluon. Среди базисных моделей были выбраны: Chronos, TimeGPT и Lag-Llama. Модель Chronos использовалась только в режиме без примеров, а модели Time-GPT и Lag-Llama, помимо этого, оценивались после точной настройки. Для проведения оценки были выбраны эталонные наборы данных из Monash Repository. В третьей главе работы была проведена интерпретация полученных результатов, сформулированы рекомендации по использованию базисных моделей в прогнозировании. MASTER'S THESIS BASIC MODELS TIME SERIES FORECASTING MULTIMODAL MODELS PROBABILISTIC TIME SERIES FORECASTING LARGE LANGUAGE MODELS TRANSFORMER БАЗИСНЫЕ МОДЕЛИ ТРАНСФОРМЕР
86	Генератор драматургических текстов : магистерская диссертация / Generator of dramatic texts Данилов, Е. М., Danilov, E. M. January 2024 (has links) The object of development is a code based on large language models for generating dramatic texts. The object of the study is a dramatic text generated by a large language model of machine learning. The subject of the study is methods based on large language models for generating dramatic text. The purpose of the work is to create a generator of dramatic text, taking into account the features of the dramatic style of narration, without the direct participation of the author. The work considers the features of the dramatic text, as well as the problems that arise during its generation. To solve them, the method of hierarchical plot generation was used, with explicit narrative structures and characters, which helps to generate more coherent stories, especially when creating long texts such as theater scripts. To preserve the style, a set of written prefixes taken from the ancient Greek tragedy "Medea" by Euripides (431 BC) was used. The evaluation was carried out using the NLI-score metric and by surveying people involved in script writing. An algorithm for generating and evaluating scripts was written in the Python programming language. / Объектом разработки является код на основе больших языковых моделей для генерации драматургических текстов. Объектом исследования является драматургический текст, сгенерированный большой языковой моделью машинного обучения. Предметом исследования является методы на основе больших языковых модель для генерация драматургического текста. Цель работы создание генератора драматургического текста, с учетом особенностей драматургического стиля повествования, без непосредственного участия автора. В работе рассмотрены особенности драматургического текста, а также проблемы, возникающие при его генерации. Для их решения использовался метод иерархической генерации сюжета, с явными структурами повествования и персонажами, что помогает генерировать более связные истории, особенно при создании таких длинных текстов, как театральные сценарии. Для сохранения стиля использовался набор прописанных префиксов, взятых из древнегреческой трагедии «Medea» Еврипида (431 г. до н. э.). Оценка проводилась с использованием метрики NLI-score и с помощью анкетирования людей, связанных с написанием сценариев. На языке программирования Python написан алгоритм для генерации и оценки сценариев. MASTER'S THESIS LARGE LANGUAGE MODELS NATURAL LANGUAGE GENERATION NATURAL LANGUAGE ASSESSMENT SEMANTIC SIMILARITY ASSESSMENT QUERY ENGINEERING ИНЖЕНЕРИЯ ЗАПРОСОВ
87	Дообучение больших языковых моделей для решения специализированных задач : магистерская диссертация / LLM Tuning for Specific Tasks Молчанова, Т. А., Molchanova, T. A. January 2024 (has links) В выпускной квалификационной работе рассмотрены методы дообучения больших языковых моделей для решения специализированных задач. В качестве специализированной задачи был выбран мультиязычный перевод в сфере информационной безопасности. Для дообучения и оценки моделей был собран датасет из 1001 тройки параллельных предложений на русском, английском и испанском языках из документов компаний Trellix, IBM, Kaspersky и Dr. Web. В качестве моделей для дообучения были выбраны Mistral Instruct 7B и Llama Chat 7B. Дообучение моделей проводилось методами zero-shot, few-shot и PEFT ввиду ограничений исследования, заключающихся в использовании одного устройства с одной видеокартой объёмом 12-24 ГБ. Оценка качества переводов полученных моделей рассчитывалась на основе метрики BLEU. / The work is devoted to comparison of LLM-tuning methods for specific tasks. Multilingual translation in the domain of information security was chosen as a specific task. In order to tune and evaluate the models, a dataset of 1001 triples of parallel sentences in Russian, English and Spanish was collected from documentation of Trellix, IBM, Kaspersky and Dr. Web. The models that were used for tuning are Mistral Instruct 7B and Llama Chat 7B. As for the tuning technics, zero-shot, few-shot and PEFT were used due to the limitations grounded in usage of one device with one GPU of 12-24 GB. The translation capabilities of the resulting models were measured by the BLEU metric. MASTER'S THESIS LANGUAGE MODELLING LARGE LANGUAGE MODELS TRANSFORMERS MODEL TUNING MODEL QUANTIZATION MACHINE TRANSLATION MULTILINGUAL MACHINE TRANSLATION ТРАНСФОРМЕРЫ ДООБУЧЕНИЕ МОДЕЛЕЙ КВАНТИЗАЦИЯ МОДЕЛЕЙ МАШИННЫЙ ПЕРЕВОД
88	Går det att lita på ChatGPT? En kvalitativ studie om studenters förtroende för ChatGPT i lärandesammanhang Härnström, Alexandra, Bergh, Isak Eljas January 2023 (has links) Världens tekniska utveckling går framåt i snabb takt, inte minst när det kommer till ”smarta” maskiner och algoritmer med förmågan att anpassa sig efter sin omgivning. Detta delvis på grund av den enorma mängd data som finns tillgänglig och delvis tack vare en ökad lagringskapacitet. I november 2022 släpptes ett av de senaste AI-baserade programmen; chatboten ChatGPT. Inom två månader hade ChatGPT fått över 100 miljoner användare. Denna webbaserade mjukvara kan i realtid konversera med användare genom att besvara textbaserade frågor. Genom att snabbt och ofta korrekt besvara användarnas frågor på ett mänskligt och övertygande sätt, har tjänsten på kort tid genererat mycket uppmärksamhet. Det finns flera studier som visar på hur ett stort antal människor saknar ett generellt förtroende för AI. Vissa studier menar att de svar som ChatGPT genererar inte alltid kan antas vara helt korrekta och därför bör följas upp med en omfattande kontroll av faktan, eftersom de annars kan bidra till spridandet av falsk information. Eftersom förtroende för AI har visat sig vara en viktig del i hur väl teknologin utvecklas och integreras, kan brist på förtroende för sådana tjänster, såsom ChatGPT, vara ett hinder för en välfungerande användning. Trots att man sett på ökad produktivitet vid införandet av AI-teknologi hos företag så har det inom högre utbildning, som ett hjälpmedel för studenter, inte integrerats i samma utsträckning. Genom att ta reda på vilket förtroende studenter har för ChatGPT i lärandesammanhang, kan man erhålla information som kan vara till hjälp för integrationen av sådan AI-teknik. Dock saknas det specifik forskning kring studenters förtroende för ChatGPT i lärandesammanhang. Därför syftar denna studie till att fylla denna kunskapslucka, genom att utföra en kartläggning. Vår frågeställning är: ” Vilket förtroende har studenter för ChatGPT i lärandesammanhang?”. Kartläggningen utfördes med semistrukturerade intervjuer av åtta studenter som använt ChatGPT i lärandesammanhang. Intervjuerna genererade kvalitativa data som analyserades med tematisk analys, och resultatet visade på att studenters förtroende för ChatGPT i lärandesammanhang beror på en rad faktorer. Under analysen identifierade vi sex teman som ansågs vara relevanta för att besvara frågeställningen: ● Erfarenheter ● Användning ● ChatGPT:s karaktär ● Yttre påverkan ● Organisationer ● Framtida förtroende / The world's technological development is advancing rapidly, especially when it comes to "smart" machines and algorithms with the ability to adapt to their surroundings. This is partly due to the enormous amount of available data and partly thanks to increased storage capacity. In November 2022, one of the latest AI-based programs was released; the chatbot ChatGPT. This web-based software can engage in real-time conversations with users by answering text-based questions. By quickly, and often accurately, answering users' questions in a human-like and convincing manner, the service has generated a lot of attention in a short period of time. Within two months, ChatGPT had over 100 million users. There are several studies that show how a large number of people lack a general trust in AI. Some studies argue that the responses generated by ChatGPT may not always be assumed to be completely accurate and should therefore be followed up with extensive fact-checking, as otherwise they may contribute to the spreading of false information. Since trust in AI has been shown to be an important part of how well the technology develops and integrates, a lack of trust in services like ChatGPT can be a hindrance to effective usage. Despite the increased productivity observed in the implementation of AI technology in companies, it has not been integrated to the same extent within higher education as an aid for students. By determining the level of trust that students have in ChatGPT in an educational context, valuable information can be obtained to assist in the integration of such AI technology. However, there is a lack of specific research on students' trust in ChatGPT in an educational context. Therefore, this study aims to fill this knowledge gap by conducting a survey. Our research question is: “What trust do students have in ChatGPT in a learning context?”. The survey was conducted through semi-structured interviews with eight students who have used ChatGPT in an educational context. The interviews generated qualitative data that was analyzed using thematic analysis, and the results showed that students' trust in ChatGPT in an educational context depends on several factors. During the analysis, six themes were identified as relevant for answering the research question: • Experiences • Usage • ChatGPT’s character • Influences • Organizations • Future trust Artificial intelligence Generative AI LLM NLP ChatGPT GPT-3 GPT-3.5 Trust Educational context Language technology Large language models Information retrieval Artificiell intelligens Generativ AI LLM NLP ChatGPT GPT-3 GPT-3.5 Förtroende Lärandesammanhang Språkteknologi Stora språkmodeller Informationsinhämtning Computer Sciences Datavetenskap (datalogi)
89	Contextual cues for deep learning models of code Shrivastava, Disha 09 1900 (has links) Le code source offre un domaine d'application passionnant des méthodes d'apprentissage en profondeur, englobant des tâches telles que la synthèse, la réparation et l'analyse de programmes, ainsi que des tâches à l'intersection du code et du langage naturel. Bien que les modèles d’apprentissage profond pour le code, en particulier les grands modèles de langage, aient récemment connu un succès significatif, ils peuvent avoir du mal à se généraliser à du code invisible. Cela peut conduire à des inexactitudes, en particulier lorsque vous travaillez avec des référentiels contenant des logiciels propriétaires ou du code en cours de travail. L'objectif principal de cette thèse est d'exploiter efficacement les signaux utiles du contexte disponible afin d'améliorer les performances des modèles de code d'apprentissage profond pour une tâche donnée. En incorporant ces indices contextuels, les capacités de généralisation du modèle sont amplifiées, fournissant des informations supplémentaires non évidentes à partir de l'entrée d'origine et orientant son attention vers des détails essentiels. De plus, l'utilisation d'indices contextuels facilite l'adaptation aux nouvelles tâches et améliore les performances des tâches existantes en effectuant des prédictions plus contextuelles. Pour y parvenir, nous présentons un cadre général comprenant deux étapes : (a) l'amélioration du contexte, qui implique l'enrichissement de l'entrée avec un contexte de support obtenu grâce à l'identification et à la sélection d'indices contextuels pertinents, et (b) la prédiction à l'aide du contexte amélioré, où nous exploitez le contexte de support combiné aux entrées pour faire des prédictions précises. La thèse présente quatre articles qui proposent diverses approches pour ces étapes. Le premier article divise le problème standard de la programmation par exemples en deux étapes: (a) trouver des programmes qui satisfont des exemples individuels (solutions par exemple) et, (b) combiner ces solutions par exemple en tirant parti de leurs états d'exécution de programme pour trouver un programme qui satisfait tous les exemples donnés. Le deuxième article propose une approche pour sélectionner des informations ciblées à partir du fichier actuel et les utiliser pour adapter le modèle de complétion de code à un contexte local jamais vu précédemment. Le troisième article s'appuie sur le deuxième article en tirant parti des indices contextuels de l'ensemble du répertoire de code à l'aide d'un ensemble de requêtes ({\it prompts}) proposées suggérant l'emplacement et le contenu du contexte particulièrement utile à extraire du répertoire. Nous proposons un cadre pour sélectionner la requête la plus pertinente, qui est ensuite utilisée pour demander à un modèle de langage de code de générer des prédictions pour le reste de la ligne de code suivant un curseur positionné dans un fichier. Le quatrième article prolonge le troisième article en proposant un cadre qui apprend à combiner plusieurs contextes divers à partir du répertoire. Nous montrons que la formation de modèles de language de code plus petits de cette manière fonctionne mieux ou à égalité avec des modèles beaucoup plus grands qui n'utilisent pas le contexte du répertoire de code. / Source code provides an exciting application area of deep learning methods, encompassing tasks like program synthesis, repair, and analysis, as well as tasks at the intersection of code and natural language. Although deep learning models for code, particularly large language models, have recently seen significant success, they can face challenges in generalizing to unseen code. This can lead to inaccuracies especially when working with repositories that contain proprietary software or work-in-progress code. The main focus of this thesis is to effectively harness useful signals from the available context such that it can improve the performance of the deep learning models of code at the given task. By incorporating these contextual cues, the model's generalization capabilities are amplified, providing additional insights not evident from the original input and directing its focus toward essential details. Furthermore, the use of contextual cues aids in adapting to new tasks and boosts performance on existing ones by making more context-aware predictions. To achieve this, we present a general framework comprising two stages: (a) Context Enhancement, which involves enriching the input with support context obtained through the identification and selection of relevant contextual cues, and (b) Prediction using the Enhanced Context, where we leverage the support context combined with the input to make accurate predictions. The thesis presents four articles that propose diverse approaches for these stages. The first article breaks the standard problem of programming by examples into two stages: (a) finding programs that satisfy individual examples (per-example solutions) and, (b) combining these per-example solutions by leveraging their program execution states to find a program that satisfies all given examples. The second article proposes an approach for selecting targeted information from the current file and using it to adapt the code completion model to an unseen, local context. The third article builds upon the second article by leveraging contextual cues from the entire code repository using a set of prompt proposals that govern the location and content of the context that should be taken from the repository. We propose a framework to select the most relevant prompt proposal context which is then used to prompt a large language model of code to generate predictions for the tokens in the rest of the line following the cursor in a file. The fourth article extends the third article by proposing a framework that learns to combine multiple diverse contexts from the repository. We show that training smaller models of code this way performs better or at par with significantly larger models that are not trained with repository context. Deep Learning Program Synthesis Code Completion Machine Learning for Code Software Engineering Information Retrieval Large Language Models Apprentissage profond synthèse de programmes complétion de code apprentissage automatique pour le code génie logiciel recherche d'informations grands modèles de langage
90	Topological regularization and relative latent representations / Topologisk regularisering och relativa latenta representationer García Castellanos, Alejandro January 2023 (has links) This Master's Thesis delves into the application of topological regularization techniques and relative latent representations within the realm of zero-shot model stitching. Building upon the prior work of Moschella et al. (2022) that introduces relative latent representations to enhance the similarities between latent spaces of different models, we incorporate the approach of Hofer et al. (2021), which combines Topological Data Analysis (TDA) and Machine Learning techniques for topological densification of class distributions in the latent space. The main research objective is to investigate the impact of topological regularization on zero-shot stitching performance when employing relative latent representations. Theoretical foundations for the relative transformation are established based on the intertwiner groups of activation functions. Empirical analyses are conducted to validate the assumptions underlying the construction of the relative transformation in the latent space. Moreover, experiments are performed on a Large Language Model trained on multilingual Amazon Reviews datasets to evaluate the effectiveness of zero-shot stitching while using the topological densification technique and the relative transformation. The findings indicate that the proposed methodologies can enhance the performance of multilingual model stitching. Specifically, enforcing the relative transformation to preserve the H0 homology death times distributions proves beneficial. Additionally, the presence of similar topological features plays a crucial role in achieving higher model compatibility. However, a more in-depth exploration of the geometric properties of the post-relative transformation latent space is necessary to further improve the topological densification technique. Overall, this work contributes to the emerging field of Topological Machine Learning and provides valuable insights for researchers in transfer learning and representation learning domains. / Denna masteruppsats undersöker tillämpningen av topologiska regleringstekniker och relativa latenta representationer inom området för zero-shot model stitching. Genom att bygga vidare på tidigare arbete av Moschella et al. (2022), som introducerade relativa latenta representationer för att förbättra likheterna mellan latenta rummet hos olika modeller, inkorporerar vi tillvägagångssättet av Hofer et al. (2021), som kombinerar topologisk dataanalys (TDA) och maskininlärningstekniker för topologisk ``förtätning'' av klassfördelningar i det latenta utrymmet. Den huvudsakliga forskningsuppgiften är att undersöka effekten av topologisk reglering på zero-shot model stitching-prestanda när man använder relativa latenta representationer. Teoretiska grunder för den relativa transformationen etableras baserat på intertwinergrupperna för aktiveringsfunktioner. Empiriska analyser genomförs för att validera antagandena som ligger till grund för konstruktionen av den relativa transformationen i det latenta rummen. Dessutom utförs experiment på en stor språkmodell tränad på multilinguella Amazon Reviews-dataset för att utvärdera effektiviteten hos zero-shot model stitching med Hofer's topologiska reglering och relativa transformation. Resultaten visar att de föreslagna metoderna kan förbättra prestationen hos zero-shot model stitching för flerspråkiga modeller. Specifikt är det fördelaktigt att tvinga den relativa transformationen att bevara H0 homologins dödstidsfördelningar. Dessutom spelar närvaron av liknande topologiska egenskaper en avgörande roll för att uppnå högre modellkompatibilitet. Dock krävs en mer ingående utforskning av de geometriska egenskaperna hos det latenta utrymmet efter den relativa transformationen för att ytterligare förbättra Hofer's topologiska reglering. Sammanfattningsvis bidrar detta arbete till det framväxande området Topologisk Maskininlärning och ger värdefulla insikter för forskare inom ``transfer-inlärning'' och representationsinlärningsdomäner. Algebraic Topology Large Language Models Relative Representation Representation Learning Model Stitching Topological DataAnalysis Zero-shot Algebraisk topologi Stora språkmodeller Relativ representation Representationsinlärning Modell sömmar Topologisk dataanalys Zero-shot Computer and Information Sciences Data- och informationsvetenskap

Search results