• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 75
  • 2
  • 1
  • 1
  • Tagged with
  • 94
  • 94
  • 94
  • 63
  • 46
  • 42
  • 34
  • 30
  • 27
  • 26
  • 25
  • 19
  • 18
  • 16
  • 15
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
71

Advances in deep learning with limited supervision and computational resources

Almahairi, Amjad 12 1900 (has links)
Les réseaux de neurones profonds sont la pierre angulaire des systèmes à la fine pointe de la technologie pour une vaste gamme de tâches, comme la reconnaissance d'objets, la modélisation du langage et la traduction automatique. Mis à part le progrès important établi dans les architectures et les procédures de formation des réseaux de neurones profonds, deux facteurs ont été la clé du succès remarquable de l'apprentissage profond : la disponibilité de grandes quantités de données étiquetées et la puissance de calcul massive. Cette thèse par articles apporte plusieurs contributions à l'avancement de l'apprentissage profond, en particulier dans les problèmes avec très peu ou pas de données étiquetées, ou avec des ressources informatiques limitées. Le premier article aborde la question de la rareté des données dans les systèmes de recommandation, en apprenant les représentations distribuées des produits à partir des commentaires d'évaluation de produits en langage naturel. Plus précisément, nous proposons un cadre d'apprentissage multitâches dans lequel nous utilisons des méthodes basées sur les réseaux de neurones pour apprendre les représentations de produits à partir de textes de critiques de produits et de données d'évaluation. Nous démontrons que la méthode proposée peut améliorer la généralisation dans les systèmes de recommandation et atteindre une performance de pointe sur l'ensemble de données Amazon Reviews. Le deuxième article s'attaque aux défis computationnels qui existent dans l'entraînement des réseaux de neurones profonds à grande échelle. Nous proposons une nouvelle architecture de réseaux de neurones conditionnels permettant d'attribuer la capacité du réseau de façon adaptative, et donc des calculs, dans les différentes régions des entrées. Nous démontrons l'efficacité de notre modèle sur les tâches de reconnaissance visuelle où les objets d'intérêt sont localisés à la couche d'entrée, tout en maintenant une surcharge de calcul beaucoup plus faible que les architectures standards des réseaux de neurones. Le troisième article contribue au domaine de l'apprentissage non supervisé, avec l'aide du paradigme des réseaux antagoniste génératifs. Nous introduisons un cadre fléxible pour l'entraînement des réseaux antagonistes génératifs, qui non seulement assure que le générateur estime la véritable distribution des données, mais permet également au discriminateur de conserver l'information sur la densité des données à l'optimum global. Nous validons notre cadre empiriquement en montrant que le discriminateur est capable de récupérer l'énergie de la distribution des données et d'obtenir une qualité d'échantillons à la fine pointe de la technologie. Enfin, dans le quatrième article, nous nous attaquons au problème de l'apprentissage non supervisé à travers différents domaines. Nous proposons un modèle qui permet d'apprendre des transformations plusieurs à plusieurs à travers deux domaines, et ce, à partir des données non appariées. Nous validons notre approche sur plusieurs ensembles de données se rapportant à l'imagerie, et nous montrons que notre méthode peut être appliquée efficacement dans des situations d'apprentissage semi-supervisé. / Deep neural networks are the cornerstone of state-of-the-art systems for a wide range of tasks, including object recognition, language modelling and machine translation. In the last decade, research in the field of deep learning has led to numerous key advances in designing novel architectures and training algorithms for neural networks. However, most success stories in deep learning heavily relied on two main factors: the availability of large amounts of labelled data and massive computational resources. This thesis by articles makes several contributions to advancing deep learning, specifically in problems with limited or no labelled data, or with constrained computational resources. The first article addresses sparsity of labelled data that emerges in the application field of recommender systems. We propose a multi-task learning framework that leverages natural language reviews in improving recommendation. Specifically, we apply neural-network-based methods for learning representations of products from review text, while learning from rating data. We demonstrate that the proposed method can achieve state-of-the-art performance on the Amazon Reviews dataset. The second article tackles computational challenges in training large-scale deep neural networks. We propose a conditional computation network architecture which can adaptively assign its capacity, and hence computations, across different regions of the input. We demonstrate the effectiveness of our model on visual recognition tasks where objects are spatially localized within the input, while maintaining much lower computational overhead than standard network architectures. The third article contributes to the domain of unsupervised learning with the generative adversarial networks paradigm. We introduce a flexible adversarial training framework, in which not only the generator converges to the true data distribution, but also the discriminator recovers the relative density of the data at the optimum. We validate our framework empirically by showing that the discriminator is able to accurately estimate the true energy of data while obtaining state-of-the-art quality of samples. Finally, in the fourth article, we address the problem of unsupervised domain translation. We propose a model which can learn flexible, many-to-many mappings across domains from unpaired data. We validate our approach on several image datasets, and we show that it can be effectively applied in semi-supervised learning settings.
72

Fast Simulations of Radio Neutrino Detectors : Using Generative Adversarial Networks and Artificial Neural Networks

Holmberg, Anton January 2022 (has links)
Neutrino astronomy is expanding into the ultra-high energy (>1017eV) frontier with the use of in-ice detection of Askaryan radio emission from neutrino-induced particle showers. There are already pilot arrays for validating the technology and the next few years will see the planning and construction of IceCube-Gen2, an upgrade to the current neutrino telescope IceCube. This thesis aims to facilitate that planning by providing faster simulations using deep learning surrogate models. Faster simulations could enable proper optimisation of the antenna stations providing better sensitivity and reconstruction of neutrino properties. The surrogates are made for two parts of the end-to-end simulations: the signal generation and the signal propagation. These two steps are the most time-consuming parts of the simulations. The signal propagation is modelled with a standard fully connected neural network whereas for the signal generation a conditional Wasserstein generative adversarial network is used. There are multiple reasons for using these types of models. For both problems the neural networks provide the speed necessary as well as being differentiable -both important factors for optimisation. Generative adversarial networks are used in the signal generation because of the inherent stochasticity in the particle shower development that leads to the Askaryan radio signal. A more standard neural network is used for the signal propagation as it is a regression task. Promising results are obtained for both tasks. The signal propagation surrogate model can predict the parameters of interest at the desired accuracy, except for the travel time which needs further optimisation to reduce the uncertainty from 0.5 ns to 0.1 ns. The signal generation surrogate model predicts the Askaryan emission well for the limited parameter space of hadronic showers and within 5° of the Cherenkov cone. The two models provide a first step and a proof of concept. It is believed that the models can reach the required accuracies with more work.
73

Building Information Extraction and Refinement from VHR Satellite Imagery using Deep Learning Techniques

Bittner, Ksenia 26 March 2020 (has links)
Building information extraction and reconstruction from satellite images is an essential task for many applications related to 3D city modeling, planning, disaster management, navigation, and decision-making. Building information can be obtained and interpreted from several data, like terrestrial measurements, airplane surveys, and space-borne imagery. However, the latter acquisition method outperforms the others in terms of cost and worldwide coverage: Space-borne platforms can provide imagery of remote places, which are inaccessible to other missions, at any time. Because the manual interpretation of high-resolution satellite image is tedious and time consuming, its automatic analysis continues to be an intense field of research. At times however, it is difficult to understand complex scenes with dense placement of buildings, where parts of buildings may be occluded by vegetation or other surrounding constructions, making their extraction or reconstruction even more difficult. Incorporation of several data sources representing different modalities may facilitate the problem. The goal of this dissertation is to integrate multiple high-resolution remote sensing data sources for automatic satellite imagery interpretation with emphasis on building information extraction and refinement, which challenges are addressed in the following: Building footprint extraction from Very High-Resolution (VHR) satellite images is an important but highly challenging task, due to the large diversity of building appearances and relatively low spatial resolution of satellite data compared to airborne data. Many algorithms are built on spectral-based or appearance-based criteria from single or fused data sources, to perform the building footprint extraction. The input features for these algorithms are usually manually extracted, which limits their accuracy. Based on the advantages of recently developed Fully Convolutional Networks (FCNs), i.e., the automatic extraction of relevant features and dense classification of images, an end-to-end framework is proposed which effectively combines the spectral and height information from red, green, and blue (RGB), pan-chromatic (PAN), and normalized Digital Surface Model (nDSM) image data and automatically generates a full resolution binary building mask. The proposed architecture consists of three parallel networks merged at a late stage, which helps in propagating fine detailed information from earlier layers to higher levels, in order to produce an output with high-quality building outlines. The performance of the model is examined on new unseen data to demonstrate its generalization capacity. The availability of detailed Digital Surface Models (DSMs) generated by dense matching and representing the elevation surface of the Earth can improve the analysis and interpretation of complex urban scenarios. The generation of DSMs from VHR optical stereo satellite imagery leads to high-resolution DSMs which often suffer from mismatches, missing values, or blunders, resulting in coarse building shape representation. To overcome these problems, a methodology based on conditional Generative Adversarial Network (cGAN) is developed for generating a good-quality Level of Detail (LoD) 2 like DSM with enhanced 3D object shapes directly from the low-quality photogrammetric half-meter resolution satellite DSM input. Various deep learning applications benefit from multi-task learning with multiple regression and classification objectives by taking advantage of the similarities between individual tasks. Therefore, an observation of such influences for important remote sensing applications such as realistic elevation model generation and roof type classification from stereo half-meter resolution satellite DSMs, is demonstrated in this work. Recently published deep learning architectures for both tasks are investigated and a new end-to-end cGAN-based network is developed, which combines different models that provide the best results for their individual tasks. To benefit from information provided by multiple data sources, a different cGAN-based work-flow is proposed where the generative part consists of two encoders and a common decoder which blends the intensity and height information within one network for the DSM refinement task. The inputs to the introduced network are single-channel photogrammetric DSMs with continuous values and pan-chromatic half-meter resolution satellite images. Information fusion from different modalities helps in propagating fine details, completes inaccurate or missing 3D information about building forms, and improves the building boundaries, making them more rectilinear. Lastly, additional comparison between the proposed methodologies for DSM enhancements is made to discuss and verify the most beneficial work-flow and applicability of the resulting DSMs for different remote sensing approaches.
74

Image generation through feature extraction and learning using a deep learning approach

Bruneel, Tibo January 2023 (has links)
With recent advancements, image generation has become more and more possible with the introduction of stronger generative artificial intelligence (AI) models. The idea and ability of generating non-existing images that highly resemble real world images is interesting for many use cases. Generated images could be used, for example, to augment, extend or replace real data sets for training AI models, therefore being capable of minimising costs on data collection and similar processes. Deep learning, a sub-field within the AI field has been on the forefront of such methodologies due to its nature of being able to capture and learn highly complex and feature-rich data. This work focuses on deep generative learning approaches within a forestry application, with the goal of generating tree log end images in order to enhance an AI model that uses such images. This approach would not only reduce costs of data collection for this model, but also many other information extraction models within the forestry field. This thesis study includes research on the state of the art within deep generative modelling and experiments using a full pipeline from a deep generative modelling stage to a log end recognition model. On top of this, a variant architecture and image sampling algorithm are proposed to add in this pipeline and evaluate its performance. The experiments and findings show that the applied generative model approaches show good feature learning, but lack the high-quality and realistic generation, resulting in more blurry results. The variant approach resulted in slightly better feature learning with a trade-off in generation quality. The proposed sampling algorithm proved to work well on a qualitative basis. The problems found in the generative models propagated further into the training of the recognition model, making the improvement of another AI model based on purely generated data impossible at this point in the research. The results of this research show that more work is needed on improving the application and generation quality to make it resemble real world data more, so that other models can be trained on artificial data. The variant approach does not improve much and its findings contribute to the field by proving its strengths and weaknesses, as with the proposed image sampling algorithm. At last this study provides a good starting point for research within this application, with many different directions and opportunities for future work.
75

Automatic Question Paraphrasing in Swedish with Deep Generative Models / Automatisk frågeparafrasering på svenska med djupa generativa modeller

Lindqvist, Niklas January 2021 (has links)
Paraphrase generation refers to the task of automatically generating a paraphrase given an input sentence or text. Paraphrase generation is a fundamental yet challenging natural language processing (NLP) task and is utilized in a variety of applications such as question answering, information retrieval, conversational systems etc. In this study, we address the problem of paraphrase generation of questions in Swedish by evaluating two different deep generative models that have shown promising results on paraphrase generation of questions in English. The first model is a Conditional Variational Autoencoder (C-VAE) and the other model is an extension of the first one where a discriminator network is introduced into the model to form a Generative Adversarial Network (GAN) architecture. In addition to these models, a method not based on machine-learning was implemented to act as a baseline. The models were evaluated using both quantitative and qualitative measures including grammatical correctness and equivalence to source question. The results show that the deep generative models outperformed the baseline across all quantitative metrics. Furthermore, from the qualitative evaluation it was shown that the deep generative models outperformed the baseline at generating grammatically correct sentences, but there was no noticeable difference in terms of equivalence to the source question between the models. / Parafrasgenerering syftar på uppgiften att, utifrån en given mening eller text, automatiskt generera en parafras, det vill säga en annan text med samma betydelse. Parafrasgenerering är en grundläggande men ändå utmanande uppgift inom naturlig språkbehandling och används i en rad olika applikationer som informationssökning, konversionssystem, att besvara frågor givet en text etc. I den här studien undersöker vi problemet med parafrasgenerering av frågor på svenska genom att utvärdera två olika djupa generativa modeller som visat lovande resultat på parafrasgenerering av frågor på engelska. Den första modellen är en villkorsbaserad variationsautokodare (C-VAE). Den andra modellen är också en C-VAE men introducerar även en diskriminator vilket gör modellen till ett generativt motståndarnätverk (GAN). Förutom modellerna presenterade ovan, implementerades även en icke maskininlärningsbaserad metod som en baslinje. Modellerna utvärderades med både kvantitativa och kvalitativa mått inklusive grammatisk korrekthet och likvärdighet mellan parafras och originalfråga. Resultaten visar att de djupa generativa modellerna presterar bättre än baslinjemodellen på alla kvantitativa mätvärden. Vidare, visade the kvalitativa utvärderingen att de djupa generativa modellerna kunde generera grammatiskt korrekta frågor i större utsträckning än baslinjemodellen. Det var däremot ingen större skillnad i semantisk ekvivalens mellan parafras och originalfråga för de olika modellerna.
76

Ensembles of Single Image Super-Resolution Generative Adversarial Networks / Ensembler av generative adversarial networks för superupplösning av bilder

Castillo Araújo, Victor January 2021 (has links)
Generative Adversarial Networks have been used to obtain state-of-the-art results for low-level computer vision tasks like single image super-resolution, however, they are notoriously difficult to train due to the instability related to the competing minimax framework. Additionally, traditional ensembling mechanisms cannot be effectively applied with these types of networks due to the resources they require at inference time and the complexity of their architectures. In this thesis an alternative method to create ensembles of individual, more stable and easier to train, models by using interpolations in the parameter space of the models is found to produce better results than those of the initial individual models when evaluated using perceptual metrics as a proxy of human judges. This method can be used as a framework to train GANs with competitive perceptual results in comparison to state-of-the-art alternatives. / Generative Adversarial Networks (GANs) har använts för att uppnå state-of-the- art resultat för grundläggande bildanalys uppgifter, som generering av högupplösta bilder från bilder med låg upplösning, men de är notoriskt svåra att träna på grund av instabiliteten relaterad till det konkurrerande minimax-ramverket. Dessutom kan traditionella mekanismer för att generera ensembler inte tillämpas effektivt med dessa typer av nätverk på grund av de resurser de behöver vid inferenstid och deras arkitekturs komplexitet. I det här projektet har en alternativ metod för att samla enskilda, mer stabila och modeller som är lättare att träna genom interpolation i parameterrymden visat sig ge bättre perceptuella resultat än de ursprungliga enskilda modellerna och denna metod kan användas som ett ramverk för att träna GAN med konkurrenskraftig perceptuell prestanda jämfört med toppmodern teknik.
77

Navigating the Metric Zoo: Towards a More Coherent Model For Quantitative Evaluation of Generative ML Models

Dozier, Robbie 26 August 2022 (has links)
No description available.
78

Generation of layouts for living spaces using conditional generative adversarial networks : Designing floor plans that respect both a boundary and high-level requirements / Generering av layouts för boendeytor med conditional generative adversarial networks : Design av planritningar som respekterar både en gräns och krav på hög nivå

Chen, Anton January 2022 (has links)
Architectural design is a complex subject involving many different aspects that need to be considered. Drafting a floor plan from a blank slate can require iterating over several designs in the early phases of planning, and it is likely an even more daunting task for non-architects to tackle. This thesis investigates the opportunities of using conditional generative adversarial networks to generate floor plans for living spaces. The pix2pixHD method is used to learn a mapping between building boundaries and color-mapped floor plan layouts from the RPLAN dataset consisting of over 80k images. Previous work has mainly focused on either preserving an input boundary or generating layouts based on a set of conditions. To give potential users more control over the generation process, it would be useful to generate floor plans that respect both an input boundary and some high-level client requirements. By encoding requirements about desired room types and their locations in colored centroids, and stacking this image with the boundary input, we are able to train a model to synthesize visually plausible floor plan images that adapt to the given conditions. This model is compared to another model trained on only the building boundary images that acts as a baseline. Results from visual inspection, image properties, and expert evaluation show that the model trained with centroid conditions generates samples with superior image quality to the baseline model. Feeding additional information to the networks is therefore not only a way to involve the user in the design process, but it also has positive effects on the model training. The results from this thesis demonstrate that floor plan generation with generative adversarial networks can respect different kinds of conditions simultaneously, and can be a source of inspiration for future work seeking to make computer-aided design a more collaborative process between users and models. / Arkitektur och design är komplexa områden som behöver ta hänsyn till ett flertal olika aspekter. Att skissera en planritning helt från början kan kräva flera iterationer av olika idéer i de tidiga stadierna av planering, och det är troligtvis en ännu mer utmanande uppgift för en icke-arkitekt att angripa. Detta examensarbete syftar till att undersöka möjligheterna till att använda conditional generative adversarial networks för att generera planritningar för boendeytor. Pix2pixHD-metoden används för att lära en modell ett samband mellan gränsen av en byggnad och en färgkodad planritning från datasamlingen RPLAN bestående av över 80 tusen bilder. Tidigare arbeten har främst fokuserat på att antingen bevara en given byggnadsgräns eller att generera layouts baserat på en mängd av villkor. För att ge potentiella slutanvändare mer kontroll över genereringsprocessen skulle det vara användbart att generera planritningar som respekterar både en given byggnadsgräns och några klientbehov på en hög nivå. Genom att koda krav relaterade till önskade rumstyper och deras placering som färgade centroider, och sedan kombinera denna bild med byggnadsgränsen, kan vi träna en modell som kan framställa visuellt rimliga bilder på planritningar som kan anpassa sig till de givna villkoren. Denna modell jämförs med en annan modell som tränas endast på byggnadsgränser och som kan agera som en baslinje. Resultat från inspektion av genererade bilder och deras egenskaper, samt expertevaluering visar att modellen som tränas med centroidvillkor genererar bilder med högre bildkvalitet jämfört med baslinjen. Att ge mer information till modellen kan därmed både involvera användaren mer i designprocessen och bidra till positiva effekter på träningen av modellen. Resultaten från detta examensarbete visar att generering av planritningar med generative adversarial networks kan respektera olika typer av villkor samtidigt, och kan vara en källa till inspiration för framtida arbete som syftar till att göra datorstödd design en mer kollaborativ process mellan användare och modeller.
79

Generating Extreme Value Distributions in Finance using Generative Adversarial Networks / Generering av Extremvärdesfördelningar inom Finans med hjälp av Generativa Motstridande Nätverk

Nord-Nilsson, William January 2023 (has links)
This thesis aims to develop a new model for stress-testing financial portfolios using Extreme Value Theory (EVT) and General Adversarial Networks (GANs). The current practice of risk management relies on mathematical or historical models, such as Value-at-Risk and expected shortfall. The problem with historical models is that the data which is available for very extreme events is limited, and therefore we need a method to interpolate and extrapolate beyond the available range. EVT is a statistical framework that analyzes extreme events in a distribution and allows such interpolation and extrapolation, and GANs are machine-learning techniques that generate synthetic data. The combination of these two areas can generate more realistic stress-testing scenarios to help financial institutions manage potential risks better. The goal of this thesis is to develop a new model that can handle complex dependencies and high-dimensional inputs with different kinds of assets such as stocks, indices, currencies, and commodities and can be used in parallel with traditional risk measurements. The evtGAN algorithm shows promising results and is able to mimic actual distributions, and is also able to extrapolate data outside the available data range. / Detta examensarbete handlar om att utveckla en ny modell för stresstestning av finansiella portföljer med hjälp av extremvärdesteori (EVT) och Generative Adversarial Networks (GAN). Dom modeller för riskhantering som används idag bygger på matematiska eller historiska modeller, som till exempel Value-at-Risk och Expected Shortfall. Problemet med historiska modeller är att det finns begränsat med data för mycket extrema händelser. EVT är däremot en del inom statistisk som analyserar extrema händelser i en fördelning, och GAN är maskininlärningsteknik som genererar syntetisk data. Genom att kombinera dessa två områden kan mer realistiska stresstestscenarier skapas för att hjälpa finansiella institutioner att bättre hantera potentiella risker. Målet med detta examensarbete är att utveckla en ny modell som kan hantera komplexa beroenden i högdimensionell data med olika typer av tillgångar, såsom aktier, index, valutor och råvaror, och som kan användas parallellt med traditionella riskmått. Algoritmen evtGAN visar lovande resultat och kan imitera verkliga fördelningar samt extrapolera data utanför tillgänglig datamängd.
80

Frontiers of Large Language Models: Empowering Decision Optimization, Scene Understanding, and Summarization Through Advanced Computational Approaches

de Curtò i Díaz, Joaquim 23 January 2024 (has links)
Tesis por compendio / [ES] El advenimiento de los Large Language Models (LLMs) marca una fase transformadora en el campo de la Inteligencia Artificial (IA), significando el cambio hacia sistemas inteligentes y autónomos capaces de una comprensión y toma de decisiones complejas. Esta tesis profundiza en las capacidades multifacéticas de los LLMs, explorando sus posibles aplicaciones en la optimización de decisiones, la comprensión de escenas y tareas avanzadas de resumen de video en diversos contextos. En el primer segmento de la tesis, el foco está en la comprensión semántica de escenas de Vehículos Aéreos No Tripulados (UAVs). La capacidad de proporcionar instantáneamente datos de alto nivel y señales visuales sitúa a los UAVs como plataformas ideales para realizar tareas complejas. El trabajo combina el potencial de los LLMs, los Visual Language Models (VLMs), y los sistemas de detección objetos de última generación para ofrecer descripciones de escenas matizadas y contextualmente precisas. Se presenta una implementación práctica eficiente y bien controlada usando microdrones en entornos complejos, complementando el estudio con métricas de legibilidad estandarizadas propuestas para medir la calidad de las descripciones mejoradas por los LLMs. Estos avances podrían impactar significativamente en sectores como el cine, la publicidad y los parques temáticos, mejorando las experiencias de los usuarios de manera exponencial. El segundo segmento arroja luz sobre el problema cada vez más crucial de la toma de decisiones bajo incertidumbre. Utilizando el problema de Multi-Armed Bandits (MAB) como base, el estudio explora el uso de los LLMs para informar y guiar estrategias en entornos dinámicos. Se postula que el poder predictivo de los LLMs puede ayudar a elegir el equilibrio correcto entre exploración y explotación basado en el estado actual del sistema. A través de pruebas rigurosas, la estrategia informada por los LLMs propuesta demuestra su adaptabilidad y su rendimiento competitivo frente a las estrategias convencionales. A continuación, la investigación se centra en el estudio de las evaluaciones de bondad de ajuste de las Generative Adversarial Networks (GANs) utilizando la Signature Transform. Al proporcionar una medida eficiente de similitud entre las distribuciones de imágenes, el estudio arroja luz sobre la estructura intrínseca de las muestras generadas por los GANs. Un análisis exhaustivo utilizando medidas estadísticas como las pruebas de Kruskal-Wallis proporciona una comprensión más amplia de la convergencia de los GANs y la bondad de ajuste. En la sección final, la tesis introduce un nuevo benchmark para la síntesis automática de vídeos, enfatizando la integración armoniosa de los LLMs y la Signature Transform. Se propone un enfoque innovador basado en los componentes armónicos capturados por la Signature Transform. Las medidas son evaluadas extensivamente, demostrando ofrecer una precisión convincente que se correlaciona bien con el concepto humano de un buen resumen. Este trabajo de investigación establece a los LLMs como herramientas poderosas para abordar tareas complejas en diversos dominios, redefiniendo la optimización de decisiones, la comprensión de escenas y las tareas de resumen de video. No solo establece nuevos postulados en las aplicaciones de los LLMs, sino que también establece la dirección para futuros trabajos en este emocionante y rápidamente evolucionante campo. / [CA] L'adveniment dels Large Language Models (LLMs) marca una fase transformadora en el camp de la Intel·ligència Artificial (IA), significat el canvi cap a sistemes intel·ligents i autònoms capaços d'una comprensió i presa de decisions complexes. Aquesta tesi profunditza en les capacitats multifacètiques dels LLMs, explorant les seues possibles aplicacions en l'optimització de decisions, la comprensió d'escenes i tasques avançades de resum de vídeo en diversos contexts. En el primer segment de la tesi, el focus està en la comprensió semàntica d'escenes de Vehicles Aeris No Tripulats (UAVs). La capacitat de proporcionar instantàniament dades d'alt nivell i senyals visuals situa els UAVs com a plataformes ideals per a realitzar tasques complexes. El treball combina el potencial dels LLMs, els Visual Language Models (VLMs), i els sistemes de detecció d'objectes d'última generació per a oferir descripcions d'escenes matisades i contextualment precises. Es presenta una implementació pràctica eficient i ben controlada usant microdrons en entorns complexos, complementant l'estudi amb mètriques de llegibilitat estandarditzades proposades per a mesurar la qualitat de les descripcions millorades pels LLMs. Aquests avenços podrien impactar significativament en sectors com el cinema, la publicitat i els parcs temàtics, millorant les experiències dels usuaris de manera exponencial. El segon segment arroja llum sobre el problema cada vegada més crucial de la presa de decisions sota incertesa. Utilitzant el problema dels Multi-Armed Bandits (MAB) com a base, l'estudi explora l'ús dels LLMs per a informar i guiar estratègies en entorns dinàmics. Es postula que el poder predictiu dels LLMs pot ajudar a triar l'equilibri correcte entre exploració i explotació basat en l'estat actual del sistema. A través de proves rigoroses, l'estratègia informada pels LLMs proposada demostra la seua adaptabilitat i el seu rendiment competitiu front a les estratègies convencionals. A continuació, la recerca es centra en l'estudi de les avaluacions de bondat d'ajust de les Generative Adversarial Networks (GANs) utilitzant la Signature Transform. En proporcionar una mesura eficient de similitud entre les distribucions d'imatges, l'estudi arroja llum sobre l'estructura intrínseca de les mostres generades pels GANs. Una anàlisi exhaustiva utilitzant mesures estadístiques com les proves de Kruskal-Wallis proporciona una comprensió més àmplia de la convergència dels GANs i la bondat d'ajust. En la secció final, la tesi introdueix un nou benchmark per a la síntesi automàtica de vídeos, enfatitzant la integració harmònica dels LLMs i la Signature Transform. Es proposa un enfocament innovador basat en els components harmònics capturats per la Signature Transform. Les mesures són avaluades extensivament, demostrant oferir una precisió convincent que es correlaciona bé amb el concepte humà d'un bon resum. Aquest treball de recerca estableix els LLMs com a eines poderoses per a abordar tasques complexes en diversos dominis, redefinint l'optimització de decisions, la comprensió d'escenes i les tasques de resum de vídeo. No solament estableix nous postulats en les aplicacions dels LLMs, sinó que també estableix la direcció per a futurs treballs en aquest emocionant i ràpidament evolucionant camp. / [EN] The advent of Large Language Models (LLMs) marks a transformative phase in the field of Artificial Intelligence (AI), signifying the shift towards intelligent and autonomous systems capable of complex understanding and decision-making. This thesis delves deep into the multifaceted capabilities of LLMs, exploring their potential applications in decision optimization, scene understanding, and advanced summarization tasks in diverse contexts. In the first segment of the thesis, the focus is on Unmanned Aerial Vehicles' (UAVs) semantic scene understanding. The capability of instantaneously providing high-level data and visual cues positions UAVs as ideal platforms for performing complex tasks. The work combines the potential of LLMs, Visual Language Models (VLMs), and state-of-the-art detection pipelines to offer nuanced and contextually accurate scene descriptions. A well-controlled, efficient practical implementation of microdrones in challenging settings is presented, supplementing the study with proposed standardized readability metrics to gauge the quality of LLM-enhanced descriptions. This could significantly impact sectors such as film, advertising, and theme parks, enhancing user experiences manifold. The second segment brings to light the increasingly crucial problem of decision-making under uncertainty. Using the Multi-Armed Bandit (MAB) problem as a foundation, the study explores the use of LLMs to inform and guide strategies in dynamic environments. It is postulated that the predictive power of LLMs can aid in choosing the correct balance between exploration and exploitation based on the current state of the system. Through rigorous testing, the proposed LLM-informed strategy showcases its adaptability and its competitive performance against conventional strategies. Next, the research transitions into studying the goodness-of-fit assessments of Generative Adversarial Networks (GANs) utilizing the Signature Transform. By providing an efficient measure of similarity between image distributions, the study sheds light on the intrinsic structure of the samples generated by GANs. A comprehensive analysis using statistical measures, such as the test Kruskal-Wallis, provides a more extensive understanding of the GAN convergence and goodness of fit. In the final section, the thesis introduces a novel benchmark for automatic video summarization, emphasizing the harmonious integration of LLMs and Signature Transform. An innovative approach grounded in the harmonic components captured by the Signature Transform is put forth. The measures are extensively evaluated, proving to offer compelling accuracy that correlates well with the concept of a good summary. This research work establishes LLMs as powerful tools in addressing complex tasks across diverse domains, redefining decision optimization, scene understanding, and summarization tasks. It not only breaks new ground in the applications of LLMs but also sets the direction for future work in this exciting and rapidly evolving field. / De Curtò I Díaz, J. (2023). Frontiers of Large Language Models: Empowering Decision Optimization, Scene Understanding, and Summarization Through Advanced Computational Approaches [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/202200 / Compendio

Page generated in 0.0317 seconds