Spelling suggestions: "subject:"generative adversarial betworks"" "subject:"generative adversarial conetworks""
91 |
Synthesis of Tabular Financial Data using Generative Adversarial Networks / Syntes av tabulär finansiell data med generativa motstridande nätverkKarlsson, Anton, Sjöberg, Torbjörn January 2020 (has links)
Digitalization has led to tons of available customer data and possibilities for data-driven innovation. However, the data needs to be handled carefully to protect the privacy of the customers. Generative Adversarial Networks (GANs) are a promising recent development in generative modeling. They can be used to create synthetic data which facilitate analysis while ensuring that customer privacy is maintained. Prior research on GANs has shown impressive results on image data. In this thesis, we investigate the viability of using GANs within the financial industry. We investigate two state-of-the-art GAN models for synthesizing tabular data, TGAN and CTGAN, along with a simpler GAN model that we call WGAN. A comprehensive evaluation framework is developed to facilitate comparison of the synthetic datasets. The results indicate that GANs are able to generate quality synthetic datasets that preserve the statistical properties of the underlying data and enable a viable and reproducible subsequent analysis. It was however found that all of the investigated models had problems with reproducing numerical data. / Digitaliseringen har fört med sig stora mängder tillgänglig kunddata och skapat möjligheter för datadriven innovation. För att skydda kundernas integritet måste dock uppgifterna hanteras varsamt. Generativa Motstidande Nätverk (GANs) är en ny lovande utveckling inom generativ modellering. De kan användas till att syntetisera data som underlättar dataanalys samt bevarar kundernas integritet. Tidigare forskning på GANs har visat lovande resultat på bilddata. I det här examensarbetet undersöker vi gångbarheten av GANs inom finansbranchen. Vi undersöker två framstående GANs designade för att syntetisera tabelldata, TGAN och CTGAN, samt en enklare GAN modell som vi kallar för WGAN. Ett omfattande ramverk för att utvärdera syntetiska dataset utvecklas för att möjliggöra jämförelse mellan olika GANs. Resultaten indikerar att GANs klarar av att syntetisera högkvalitativa dataset som bevarar de statistiska egenskaperna hos det underliggande datat, vilket möjliggör en gångbar och reproducerbar efterföljande analys. Alla modellerna som testades uppvisade dock problem med att återskapa numerisk data.
|
92 |
GAN-Based Synthesis of Brain Tumor Segmentation Data : Augmenting a dataset by generating artificial imagesForoozandeh, Mehdi January 2020 (has links)
Machine learning applications within medical imaging often suffer from a lack of data, as a consequence of restrictions that hinder the free distribution of patient information. In this project, GANs (generative adversarial networks) are used to generate data synthetically, in an effort to circumvent this issue. The GAN framework PGAN is trained on the brain tumor segmentation dataset BraTS to generate new, synthetic brain tumor masks with the same visual characteristics as the real samples. The image-to-image translation network SPADE is subsequently trained on the image pairs in the real dataset, to learn a transformation from segmentation masks to brain MR images, and is in turn used to map the artificial segmentation masks generated by PGAN to corresponding artificial MR images. The images generated by these networks form a new, synthetic dataset, which is used to augment the original dataset. Different quantities of real and synthetic data are then evaluated in three different brain tumor segmentation tasks, where the image segmentation network U-Net is trained on this data to segment (real) MR images into the classes in question. The final segmentation performance of each training instance is evaluated over test data from the real dataset with the Weighted Dice Loss metric. The results indicate a slight increase in performance across all segmentation tasks evaluated in this project, when including some quantity of synthetic images. However, the differences were largest when the experiments were restricted to using only 20 % of the real data, and less significant when the full dataset was made available. A majority of the generated segmentation masks appear visually convincing to an extent (although somewhat noisy with regards to the intra-tumoral classes), while a relatively large proportion appear heavily noisy and corrupted. However, the translation of segmentation masks to MR images via SPADE proved more reliable and consistent.
|
93 |
Understanding, improving, and generalizing generative modelsJolicoeur-Martineau, Alexia 08 1900 (has links)
Les modèles génératifs servent à générer des échantillons d'une loi de probabilité (ex. : du texte, des images, de la musique, des vidéos, des molécules, et beaucoup plus) à partir d'un jeu de données (ex. : une banque d'images, de texte, ou autre). Entrainer des modèles génératifs est une tâche très difficile, mais ces outils ont un très grand potentiel en termes d'applications. Par exemple, dans le futur lointain, on pourrait envisager qu'un modèle puisse générer les épisodes d'une émission de télévision à partir d'un script et de voix générés par d'autres modèles génératifs.
Il existe plusieurs types de modèles génératifs. Pour la génération d'images, l'approche la plus fructueuse est sans aucun doute la méthode de réseaux adverses génératifs (GANs). Les GANs apprennent à générer des images par un jeu compétitif entre deux joueurs, le Discriminateur et le Générateur. Le Discriminateur tente de prédire si une image est vraie ou fausse, tandis que le Générateur tente de générer des images plus réalistes en apprenant à faire croire au discriminateur que ces fausses images générées sont vraies. En complétant ce jeu, les GANs arrivent à générer des images presque photo-réalistes. Il est souvent possible pour des êtres humains de distinguer les fausses images (générés par les GANs) des vraies images (ceux venant du jeu de données), mais la tâche devient plus difficile au fur et à mesure que cette technologie s'améliore. Le plus gros défaut des GANs est que les données générées par les GANs manquent souvent de diversité (ex. : les chats au visage aplati sont rares dans la banque d'images, donc les GANs génèrent juste des races de chats plus fréquentes). Ces méthodes souvent aussi souvent très instables. Il y a donc encore beaucoup de chemin à faire avant l'obtention d'images parfaitement photo-réalistes et diverses.
De nouvelles méthodes telles que les modèles de diffusion à la base de score semblent produire de meilleurs résultats que les GANs, donc tout n'est pas gagné pour les GANs. C'est pourquoi cette thèse n'est pas concentrée seulement sur les GANs, mais aussi sur les modèles de diffusion. Notez que cette thèse est exclusivement concentrée sur la génération de données continues (ex. : images, musique, vidéos) plutôt que discrètes (ex. : texte), car cette dernière fait usage de méthodes complètement différentes.
Le premier objectif de cette thèse est d'étudier les modèles génératifs de façon théorique pour mieux les comprendre. Le deuxième objectif de cette thèse est d'inventer de nouvelles astuces (nouvelles fonctions objectives, régularisations, architectures, etc.) permettant d'améliorer les modèles génératifs. Le troisième objectif est de généraliser ces approches au-delà de leur formulation initiale, pour permettre la découverte de nouveaux liens entre différentes approches.
Ma première contribution est de proposer un discriminateur relativiste qui estime la probabilité qu'une donnée réelle, soit plus réaliste qu'une donnée fausse (inventée par un modèle générateur). Les GANs relativistes forment une nouvelle classe de fonctions de perte qui apportent beaucoup de stabilité durant l'entrainement. Ma seconde contribution est de prouver que les GANs relativistes forment une mesure de dissimilarité. Ma troisième contribution est de concevoir une variante adverse au appariement de score pour produire des données de meilleure qualité avec les modèles de diffusion. Ma quatrième contribution est d'améliorer la vitesse de génération des modèles de diffusion par la création d'une méthode numérique de résolution pour équations différentielles stochastiques (SDEs). / Generative models are powerful tools to generate samples (e.g., images, music, text) from an unknown distribution given a finite set of examples. Generative models are hard to train successfully, but they have the potential to revolutionize arts, science, and business. These models can generate samples from various data types (e.g., text, images, audio, videos, 3d). In the future, we can envision generative models being used to create movies or episodes from a TV show given a script (possibly also generated by a generative model).
One of the most successful methods for generating images is Generative Adversarial Networks (GANs). This approach consists of a game between two players, the Discriminator and the Generator. The goal of the Discriminator is to classify an image as real or fake, while the Generator attempts to fool the Discriminator into thinking that the fake images it generates are real. Through this game, GANs are able to generate very high-quality samples, such as photo-realistic images. Humans are still generally able to distinguish real images (from the training dataset) from fake images (generated by GANs), but the gap is lessening as GANs become better over time. The biggest weakness of GANs is that they have trouble generating diverse data representative of the full range of the data distribution. Thus, there is still much progress to be made before GANs reach their full potential.
New methods performing better than GANs are also appearing. One prime example is score-based diffusion models. This thesis focuses on generative models that seemed promising at the time for continuous data generation: GANs and score-based diffusion models.
I seek to improve generative models so that they reach their full potential (Objective 1: Improving) and to understand these approaches better on a theoretical level (Objective 2: Theoretical understanding). I also want to generalize these approaches beyond their original setting (Objective 3: Generalizing), allowing the discovery of new connections between different concepts/fields.
My first contribution is to propose using a relativistic discriminator, which estimates the probability that a given real data is more realistic than a randomly sampled fake data. Relativistic GANs form a new class of GAN loss functions that are much more stable with respect to optimization hyperparameters. My second contribution is to take a more rigorous look at relativistic GANs and prove that they are proper statistical divergences. My third contribution is to devise an adversarial variant to denoising score matching, which leads to higher quality data with score-based diffusion models. My fourth contribution is to significantly improve the speed of score-based diffusion models through a carefully devised Stochastic Differential Equation (SDE) solver.
|
94 |
Generating Extreme Value Distributions in Finance using Generative Adversarial Networks / Generering av Extremvärdesfördelningar inom Finans med hjälp av Generativa Motstridande NätverkNord-Nilsson, William January 2023 (has links)
This thesis aims to develop a new model for stress-testing financial portfolios using Extreme Value Theory (EVT) and General Adversarial Networks (GANs). The current practice of risk management relies on mathematical or historical models, such as Value-at-Risk and expected shortfall. The problem with historical models is that the data which is available for very extreme events is limited, and therefore we need a method to interpolate and extrapolate beyond the available range. EVT is a statistical framework that analyzes extreme events in a distribution and allows such interpolation and extrapolation, and GANs are machine-learning techniques that generate synthetic data. The combination of these two areas can generate more realistic stress-testing scenarios to help financial institutions manage potential risks better. The goal of this thesis is to develop a new model that can handle complex dependencies and high-dimensional inputs with different kinds of assets such as stocks, indices, currencies, and commodities and can be used in parallel with traditional risk measurements. The evtGAN algorithm shows promising results and is able to mimic actual distributions, and is also able to extrapolate data outside the available data range. / Detta examensarbete handlar om att utveckla en ny modell för stresstestning av finansiella portföljer med hjälp av extremvärdesteori (EVT) och Generative Adversarial Networks (GAN). Dom modeller för riskhantering som används idag bygger på matematiska eller historiska modeller, som till exempel Value-at-Risk och Expected Shortfall. Problemet med historiska modeller är att det finns begränsat med data för mycket extrema händelser. EVT är däremot en del inom statistisk som analyserar extrema händelser i en fördelning, och GAN är maskininlärningsteknik som genererar syntetisk data. Genom att kombinera dessa två områden kan mer realistiska stresstestscenarier skapas för att hjälpa finansiella institutioner att bättre hantera potentiella risker. Målet med detta examensarbete är att utveckla en ny modell som kan hantera komplexa beroenden i högdimensionell data med olika typer av tillgångar, såsom aktier, index, valutor och råvaror, och som kan användas parallellt med traditionella riskmått. Algoritmen evtGAN visar lovande resultat och kan imitera verkliga fördelningar samt extrapolera data utanför tillgänglig datamängd.
|
95 |
<b>Advanced Algorithms for X-ray CT Image Reconstruction and Processing</b>Madhuri Mahendra Nagare (17897678) 05 February 2024 (has links)
<p dir="ltr">X-ray computed tomography (CT) is one of the most widely used imaging modalities for medical diagnosis. Improving the quality of clinical CT images while keeping the X-ray dosage of patients low has been an active area of research. Recently, there have been two major technological advances in the commercial CT systems. The first is the use of Deep Neural Networks (DNN) to denoise and sharpen CT images, and the second is use of photon counting detectors (PCD) which provide higher spectral and spatial resolution compared to the conventional energy-integrating detectors. While both techniques have potential to improve the quality of CT images significantly, there are still challenges to improve the quality further.</p><p dir="ltr"><br></p><p dir="ltr">A denoising or sharpening algorithm for CT images must retain a favorable texture which is critically important for radiologists. However, commonly used methodologies in DNN training produce over-smooth images lacking texture. The lack of texture is a systematic error leading to a biased estimator.</p><p><br></p><p dir="ltr">In the first portion of this thesis, we propose three algorithms to reduce the bias, thereby to retain the favorable texture. The first method proposes a novel approach to designing a loss function that penalizes bias in the image more while training a DNN, producing more texture and detail in results. Our experiments verify that the proposed loss function outperforms the commonly used mean squared error loss function. The second algorithm proposes a novel approach to designing training pairs for a DNN-based sharpener. While conventional sharpeners employ noise-free ground truth producing over-smooth images, the proposed Noise Preserving Sharpening Filter (NPSF) adds appropriately scaled noise to both the input and the ground truth to keep the noise texture in the sharpened result similar to that of the input. Our evaluations show that the NPSF can sharpen noisy images while producing desired noise level and texture. The above two algorithms merely control the amount of texture retained and are not designed to produce texture that matches to a target texture. A Generative Adversarial Network (GAN) can produce the target texture. However, naive application of GANs can introduce inaccurate or even unreal image detail. Therefore, we propose a Texture Matching GAN (TMGAN) that uses parallel generators to separate anatomical features from the generated texture, which allows the GAN to be trained to match the target texture without directly affecting the underlying CT image. We demonstrate that TMGAN generates enhanced image quality while also producing texture that is desirable for clinical application.</p><p><br></p><p dir="ltr">In the second portion of this research, we propose a novel algorithm for the optimal statistical processing of photon-counting detector data for CT reconstruction. Current reconstruction and material decomposition algorithms for photon counting CT are not able to utilize simultaneously both the measured spectral information and advanced prior models. We propose a modular framework based on Multi-Agent Consensus Equilibrium (MACE) to obtain material decomposition and reconstructions using the PCD data. Our method employs a detector agent that uses PCD measurements to update an estimate along with a prior agent that enforces both physical and empirical knowledge about the material-decomposed sinograms. Importantly, the modular framework allows the two agents to be designed and optimized independently. Our evaluations on simulated data show promising results.</p>
|
96 |
Generative Adversarial Networks for Image-to-Image Translation on Street View and MR ImagesKarlsson, Simon, Welander, Per January 2018 (has links)
Generative Adversarial Networks (GANs) is a deep learning method that has been developed for synthesizing data. One application for which it can be used for is image-to-image translations. This could prove to be valuable when training deep neural networks for image classification tasks. Two areas where deep learning methods are used are automotive vision systems and medical imaging. Automotive vision systems are expected to handle a broad range of scenarios which demand training data with a high diversity. The scenarios in the medical field are fewer but the problem is instead that it is difficult, time consuming and expensive to collect training data. This thesis evaluates different GAN models by comparing synthetic MR images produced by the models against ground truth images. A perceptual study is also performed by an expert in the field. It is shown by the study that the implemented GAN models can synthesize visually realistic MR images. It is also shown that models producing more visually realistic synthetic images not necessarily have better results in quantitative error measurements, when compared to ground truth data. Along with the investigations on medical images, the thesis explores the possibilities of generating synthetic street view images of different resolution, light and weather conditions. Different GAN models have been compared, implemented with our own adjustments, and evaluated. The results show that it is possible to create visually realistic images for different translations and image resolutions.
|
97 |
[pt] SINTETIZAÇÃO DE IMAGENS ÓTICAS MULTIESPECTRAIS A PARTIR DE DADOS SAR/ÓTICOS USANDO REDES GENERATIVAS ADVERSARIAS CONDICIONAIS / [en] SYNTHESIS OF MULTISPECTRAL OPTICAL IMAGES FROM SAR/OPTICAL MULTITEMPORAL DATA USING CONDITIONAL GENERATIVE ADVERSARIAL NETWORKSJOSE DAVID BERMUDEZ CASTRO 08 April 2021 (has links)
[pt] Imagens óticas são frequentemente afetadas pela presença de nuvens. Com o objetivo de reduzir esses efeitos, diferentes técnicas de reconstrução foram propostas nos últimos anos. Uma alternativa comum é explorar dados de sensores ativos, como Radar de Abertura Sintética (SAR), dado que são pouco dependentes das condições atmosféricas e da iluminação solar. Por outro lado, as imagens SAR são mais difíceis de interpretar do que as imagens óticas, exigindo um tratamento específico. Recentemente, as Redes Adversárias Generativas Condicionais (cGANs - Conditional Generative Adversarial Networks) têm sido amplamente utilizadas para aprender funções de mapeamento que relaciona dados de diferentes domínios. Este trabalho, propõe um método baseado em cGANSs para sintetizar dados óticos a partir de dados de outras fontes, incluindo dados de múltiplos sensores, dados multitemporais e dados em múltiplas resoluções. A hipótese desse trabalho é que a qualidade das imagens geradas se beneficia do número de dados utilizados como variáveis condicionantes para a cGAN. A solução proposta foi avaliada em duas bases de dados. Foram utilizadas como variáveis condicionantes dados corregistrados SAR, de uma ou duas datas produzidos pelo sensor Sentinel 1, e dados óticos de sensores da série Sentinel 2 e LANDSAT,
respectivamente. Os resultados coletados dos experimentos demonstraram que a solução proposta é capaz de sintetizar dados óticos realistas. A qualidade das imagens sintetizadas foi medida de duas formas: primeiramente, com base na acurácia da classificação das imagens geradas e, em segundo lugar, medindo-se a similaridade espectral das imagens sintetizadas com imagens de referência. Os experimentos confirmaram a hipótese de que o método proposto tende a produzir melhores resultados à medida que se
exploram mais variáveis condicionantes para a cGAN. / [en] Optical images from Earth Observation are often affected by the presence of clouds. In order to reduce these effects, different reconstruction techniques have been proposed in recent years. A common alternative is to explore data from active sensors, such as Synthetic Aperture Radar (SAR), as they are nearly independent on atmospheric conditions and solar lighting. On the other hand, SAR images are more difficult to interpret than optical images, requiring specific treatment. Recently, conditional
Generative Adversarial Networks (cGANs) have been widely used to learn mapping functions that relate data of different domains. This work proposes a method based on cGANs to synthesize optical data from data of other sources: data of multiple sensors, multitemporal data and data at multiple resolutions. The working hypothesis is that the quality of the generated images benefits from the number of data used as conditioning variables for cGAN. The proposed solution was evaluated in two databases. As conditioning data we used co-registered data from SAR at one or two dates produced by the Sentinel 1 sensor, and optical images produced by the Sentinel 2 and LANDSAT satellite series, respectively. The experimental
results demonstrated that the proposed solution is able to synthesize realistic optical data. The quality of the synthesized images was measured in two ways: firstly, based on the classification accuracy of the generated images and, secondly, on the spectral similarity of the synthesized images with reference images. The experiments confirmed the hypothesis that the proposed method tends to produce better results as we explore more conditioning data for the cGANs.
|
98 |
<b>Explaining Generative Adversarial Network Time Series Anomaly Detection using Shapley Additive Explanations</b>Cher Simon (18324174) 10 July 2024 (has links)
<p dir="ltr">Anomaly detection is an active research field that widely applies to commercial applications to detect unusual patterns or outliers. Time series anomaly detection provides valuable insights into mission and safety-critical applications using ever-growing temporal data, including continuous streaming time series data from the Internet of Things (IoT), sensor networks, healthcare, stock prices, computer metrics, and application monitoring. While Generative Adversarial Networks (GANs) demonstrate promising results in time series anomaly detection, the opaque nature of generative deep learning models lacks explainability and hinders broader adoption. Understanding the rationale behind model predictions and providing human-interpretable explanations are vital for increasing confidence and trust in machine learning (ML) frameworks such as GANs. This study conducted a structured and comprehensive assessment of post-hoc local explainability in GAN-based time series anomaly detection using SHapley Additive exPlanations (SHAP). Using publicly available benchmarking datasets approved by Purdue’s Institutional Review Board (IRB), this study evaluated state-of-the-art GAN frameworks identifying their advantages and limitations for time series anomaly detection. This study demonstrated a systematic approach in quantifying the extent of GAN-based time series anomaly explainability, providing insights for businesses when considering adopting generative deep learning models. The presented results show that GANs capture complex time series temporal distribution and are applicable for anomaly detection. The analysis from this study shows SHAP can identify the significance of contributing features within time series data and derive post-hoc explanations to quantify GAN-detected time series anomalies.</p>
|
99 |
Frontiers of Large Language Models: Empowering Decision Optimization, Scene Understanding, and Summarization Through Advanced Computational Approachesde Curtò i Díaz, Joaquim 23 January 2024 (has links)
Tesis por compendio / [ES] El advenimiento de los Large Language Models (LLMs) marca una fase transformadora en el campo de la Inteligencia Artificial (IA), significando el cambio hacia sistemas inteligentes y autónomos capaces de una comprensión y toma de decisiones complejas. Esta tesis profundiza en las capacidades multifacéticas de los LLMs, explorando sus posibles aplicaciones en la optimización de decisiones, la comprensión de escenas y tareas avanzadas de resumen de video en diversos contextos.
En el primer segmento de la tesis, el foco está en la comprensión semántica de escenas de Vehículos Aéreos No Tripulados (UAVs). La capacidad de proporcionar instantáneamente datos de alto nivel y señales visuales sitúa a los UAVs como plataformas ideales para realizar tareas complejas. El trabajo combina el potencial de los LLMs, los Visual Language Models (VLMs), y los sistemas de detección objetos de última generación para ofrecer descripciones de escenas matizadas y contextualmente precisas. Se presenta una implementación práctica eficiente y bien controlada usando microdrones en entornos complejos, complementando el estudio con métricas de legibilidad estandarizadas propuestas para medir la calidad de las descripciones mejoradas por los LLMs. Estos avances podrían impactar significativamente en sectores como el cine, la publicidad y los parques temáticos, mejorando las experiencias de los usuarios de manera exponencial.
El segundo segmento arroja luz sobre el problema cada vez más crucial de la toma de decisiones bajo incertidumbre. Utilizando el problema de Multi-Armed Bandits (MAB) como base, el estudio explora el uso de los LLMs para informar y guiar estrategias en entornos dinámicos. Se postula que el poder predictivo de los LLMs puede ayudar a elegir el equilibrio correcto entre exploración y explotación basado en el estado actual del sistema. A través de pruebas rigurosas, la estrategia informada por los LLMs propuesta demuestra su adaptabilidad y su rendimiento competitivo frente a las estrategias convencionales.
A continuación, la investigación se centra en el estudio de las evaluaciones de bondad de ajuste de las Generative Adversarial Networks (GANs) utilizando la Signature Transform. Al proporcionar una medida eficiente de similitud entre las distribuciones de imágenes, el estudio arroja luz sobre la estructura intrínseca de las muestras generadas por los GANs. Un análisis exhaustivo utilizando medidas estadísticas como las pruebas de Kruskal-Wallis proporciona una comprensión más amplia de la convergencia de los GANs y la bondad de ajuste.
En la sección final, la tesis introduce un nuevo benchmark para la síntesis automática de vídeos, enfatizando la integración armoniosa de los LLMs y la Signature Transform. Se propone un enfoque innovador basado en los componentes armónicos capturados por la Signature Transform. Las medidas son evaluadas extensivamente, demostrando ofrecer una precisión convincente que se correlaciona bien con el concepto humano de un buen resumen.
Este trabajo de investigación establece a los LLMs como herramientas poderosas para abordar tareas complejas en diversos dominios, redefiniendo la optimización de decisiones, la comprensión de escenas y las tareas de resumen de video. No solo establece nuevos postulados en las aplicaciones de los LLMs, sino que también establece la dirección para futuros trabajos en este emocionante y rápidamente evolucionante campo. / [CA] L'adveniment dels Large Language Models (LLMs) marca una fase transformadora en el camp de la Intel·ligència Artificial (IA), significat el canvi cap a sistemes intel·ligents i autònoms capaços d'una comprensió i presa de decisions complexes. Aquesta tesi profunditza en les capacitats multifacètiques dels LLMs, explorant les seues possibles aplicacions en l'optimització de decisions, la comprensió d'escenes i tasques avançades de resum de vídeo en diversos contexts.
En el primer segment de la tesi, el focus està en la comprensió semàntica d'escenes de Vehicles Aeris No Tripulats (UAVs). La capacitat de proporcionar instantàniament dades d'alt nivell i senyals visuals situa els UAVs com a plataformes ideals per a realitzar tasques complexes. El treball combina el potencial dels LLMs, els Visual Language Models (VLMs), i els sistemes de detecció d'objectes d'última generació per a oferir descripcions d'escenes matisades i contextualment precises. Es presenta una implementació pràctica eficient i ben controlada usant microdrons en entorns complexos, complementant l'estudi amb mètriques de llegibilitat estandarditzades proposades per a mesurar la qualitat de les descripcions millorades pels LLMs. Aquests avenços podrien impactar significativament en sectors com el cinema, la publicitat i els parcs temàtics, millorant les experiències dels usuaris de manera exponencial.
El segon segment arroja llum sobre el problema cada vegada més crucial de la presa de decisions sota incertesa. Utilitzant el problema dels Multi-Armed Bandits (MAB) com a base, l'estudi explora l'ús dels LLMs per a informar i guiar estratègies en entorns dinàmics. Es postula que el poder predictiu dels LLMs pot ajudar a triar l'equilibri correcte entre exploració i explotació basat en l'estat actual del sistema. A través de proves rigoroses, l'estratègia informada pels LLMs proposada demostra la seua adaptabilitat i el seu rendiment competitiu front a les estratègies convencionals.
A continuació, la recerca es centra en l'estudi de les avaluacions de bondat d'ajust de les Generative Adversarial Networks (GANs) utilitzant la Signature Transform. En proporcionar una mesura eficient de similitud entre les distribucions d'imatges, l'estudi arroja llum sobre l'estructura intrínseca de les mostres generades pels GANs. Una anàlisi exhaustiva utilitzant mesures estadístiques com les proves de Kruskal-Wallis proporciona una comprensió més àmplia de la convergència dels GANs i la bondat d'ajust.
En la secció final, la tesi introdueix un nou benchmark per a la síntesi automàtica de vídeos, enfatitzant la integració harmònica dels LLMs i la Signature Transform. Es proposa un enfocament innovador basat en els components harmònics capturats per la Signature Transform. Les mesures són avaluades extensivament, demostrant oferir una precisió convincent que es correlaciona bé amb el concepte humà d'un bon resum.
Aquest treball de recerca estableix els LLMs com a eines poderoses per a abordar tasques complexes en diversos dominis, redefinint l'optimització de decisions, la comprensió d'escenes i les tasques de resum de vídeo. No solament estableix nous postulats en les aplicacions dels LLMs, sinó que també estableix la direcció per a futurs treballs en aquest emocionant i ràpidament evolucionant camp. / [EN] The advent of Large Language Models (LLMs) marks a transformative phase in the field of Artificial Intelligence (AI), signifying the shift towards intelligent and autonomous systems capable of complex understanding and decision-making. This thesis delves deep into the multifaceted capabilities of LLMs, exploring their potential applications in decision optimization, scene understanding, and advanced summarization tasks in diverse contexts.
In the first segment of the thesis, the focus is on Unmanned Aerial Vehicles' (UAVs) semantic scene understanding. The capability of instantaneously providing high-level data and visual cues positions UAVs as ideal platforms for performing complex tasks. The work combines the potential of LLMs, Visual Language Models (VLMs), and state-of-the-art detection pipelines to offer nuanced and contextually accurate scene descriptions. A well-controlled, efficient practical implementation of microdrones in challenging settings is presented, supplementing the study with proposed standardized readability metrics to gauge the quality of LLM-enhanced descriptions. This could significantly impact sectors such as film, advertising, and theme parks, enhancing user experiences manifold.
The second segment brings to light the increasingly crucial problem of decision-making under uncertainty. Using the Multi-Armed Bandit (MAB) problem as a foundation, the study explores the use of LLMs to inform and guide strategies in dynamic environments. It is postulated that the predictive power of LLMs can aid in choosing the correct balance between exploration and exploitation based on the current state of the system. Through rigorous testing, the proposed LLM-informed strategy showcases its adaptability and its competitive performance against conventional strategies.
Next, the research transitions into studying the goodness-of-fit assessments of Generative Adversarial Networks (GANs) utilizing the Signature Transform. By providing an efficient measure of similarity between image distributions, the study sheds light on the intrinsic structure of the samples generated by GANs. A comprehensive analysis using statistical measures, such as the test Kruskal-Wallis, provides a more extensive understanding of the GAN convergence and goodness of fit.
In the final section, the thesis introduces a novel benchmark for automatic video summarization, emphasizing the harmonious integration of LLMs and Signature Transform. An innovative approach grounded in the harmonic components captured by the Signature Transform is put forth. The measures are extensively evaluated, proving to offer compelling accuracy that correlates well with the concept of a good summary.
This research work establishes LLMs as powerful tools in addressing complex tasks across diverse domains, redefining decision optimization, scene understanding, and summarization tasks. It not only breaks new ground in the applications of LLMs but also sets the direction for future work in this exciting and rapidly evolving field. / De Curtò I Díaz, J. (2023). Frontiers of Large Language Models: Empowering Decision Optimization, Scene Understanding, and Summarization Through Advanced Computational Approaches [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/202200 / Compendio
|
100 |
Prédiction et génération de données structurées à l'aide de réseaux de neurones et de décisions discrètesDutil, Francis 08 1900 (has links)
No description available.
|
Page generated in 0.1304 seconds