Spelling suggestions: "subject:"multi armed bandits"" "subject:"multi armed andits""
21 |
New Spatio-temporal Hawkes Process Models For Social GoodWen-Hao Chiang (12476658) 28 April 2022 (has links)
<p>As more and more datasets with self-exciting properties become available, the demand for robust models that capture contagion across events is also getting stronger. Hawkes processes stand out given their ability to capture a wide range of contagion and self-excitation patterns, including the transmission of infectious disease, earthquake aftershock distributions, near-repeat crime patterns, and overdose clusters. The Hawkes process is flexible in modeling these various applications through parametric and non-parametric kernels that model event dependencies in space, time and on networks.</p>
<p>In this thesis, we develop new frameworks that integrate Hawkes Process models with multi-armed bandit algorithms, high dimensional marks, and high-dimensional auxiliary data to solve problems in search and rescue, forecasting infectious disease, and early detection of overdose spikes.</p>
<p>In Chapter 3, we develop a method applications to the crisis of increasing overdose mortality over the last decade. We first encode the molecular substructures found in a drug overdose toxicology report. We then cluster these overdose encodings into different overdose categories and model these categories with spatio-temporal multivariate Hawkes processes. Our results demonstrate that the proposed methodology can improve estimation of the magnitude of an overdose spike based on the substances found in an initial overdose. </p>
<p>In Chapter 4, we build a framework for multi-armed bandit problems arising in event detection where the underlying process is self-exciting. We derive the expected number of events for Hawkes processes given a parametric model for the intensity and then analyze the regret bound of a Hawkes process UCB-normal algorithm. By introducing the Hawkes Processes modeling into the upper confidence bound construction, our models can detect more events of interest under the multi-armed bandit problem setting. We apply the Hawkes bandit model to spatio-temporal data on crime events and earthquake aftershocks. We show that the model can quickly learn to detect hotspot regions, when events are unobserved, while striking a balance between exploitation and exploration. </p>
<p>In Chapter 5, we present a new spatio-temporal framework for integrating Hawkes processes with multi-armed bandit algorithms. Compared to the methods proposed in Chapter 4, the upper confidence bound is constructed through Bayesian estimation of a spatial Hawkes process to balance the trade-off between exploiting and exploring geographic regions. The model is validated through simulated datasets and real-world datasets such as flooding events and improvised explosive devices (IEDs) attack records. The experimental results show that our model outperforms baseline spatial MAB algorithms through rewards and ranking metrics.</p>
<p>In Chapter 6, we demonstrate that the Hawkes process is a powerful tool to model the infectious disease transmission. We develop models using Hawkes processes with spatial-temporal covariates to forecast COVID-19 transmission at the county level. In the proposed framework, we show how to estimate the dynamic reproduction number of the virus within an EM algorithm through a regression on Google mobility indices. We also include demographic covariates as spatial information to enhance the accuracy. Such an approach is tested on both short-term and long-term forecasting tasks. The results show that the Hawkes process outperforms several benchmark models published in a public forecast repository. The model also provides insights on important covariates and mobility that impact COVID-19 transmission in the U.S.</p>
<p>Finally, in chapter 7, we discuss implications of the research and future research directions.</p>
|
22 |
Monte Carlo Tree Search for Continuous and Stochastic Sequential Decision Making Problems / Monte Carlo Tree Search pour les problèmes de décision séquentielle en milieu continus et stochastiquesCouetoux, Adrien 30 September 2013 (has links)
Dans cette thèse, nous avons étudié les problèmes de décisions séquentielles, avec comme application la gestion de stocks d'énergie. Traditionnellement, ces problèmes sont résolus par programmation dynamique stochastique. Mais la grande dimension, et la non convexité du problème, amènent à faire des simplifications sur le modèle pour pouvoir faire fonctionner ces méthodes.Nous avons donc étudié une méthode alternative, qui ne requiert pas de simplifications du modèle: Monte Carlo Tree Search (MCTS). Nous avons commencé par étendre le MCTS classique (qui s’applique aux domaines finis et déterministes) aux domaines continus et stochastiques. Pour cela, nous avons utilisé la méthode de Double Progressive Widening (DPW), qui permet de gérer le ratio entre largeur et profondeur de l’arbre, à l’aide de deux méta paramètres. Nous avons aussi proposé une heuristique nommée Blind Value (BV) pour améliorer la recherche de nouvelles actions, en utilisant l’information donnée par les simulations passées. D’autre part, nous avons étendu l’heuristique RAVE aux domaines continus. Enfin, nous avons proposé deux nouvelles méthodes pour faire remonter l’information dans l’arbre, qui ont beaucoup amélioré la vitesse de convergence sur deux cas tests.Une part importante de notre travail a été de proposer une façon de mêler MCTS avec des heuristiques rapides pré-existantes. C’est une idée particulièrement intéressante dans le cas de la gestion d’énergie, car ces problèmes sont pour le moment résolus de manière approchée. Nous avons montré comment utiliser Direct Policy Search (DPS) pour rechercher une politique par défaut efficace, qui est ensuite utilisée à l’intérieur de MCTS. Les résultats expérimentaux sont très encourageants.Nous avons aussi appliqué MCTS à des processus markoviens partiellement observables (POMDP), avec comme exemple le jeu de démineur. Dans ce cas, les algorithmes actuels ne sont pas optimaux, et notre approche l’est, en transformant le POMDP en MDP, par un changement de vecteur d’état.Enfin, nous avons utilisé MCTS dans un cadre de méta-bandit, pour résoudre des problèmes d’investissement. Le choix d’investissement est fait par des algorithmes de bandits à bras multiples, tandis que l’évaluation de chaque bras est faite par MCTS.Une des conclusions importantes de ces travaux est que MCTS en continu a besoin de très peu d’hypothèses (uniquement un modèle génératif du problème), converge vers l’optimum, et peut facilement améliorer des méthodes suboptimales existantes. / In this thesis, we study sequential decision making problems, with a focus on the unit commitment problem. Traditionally solved by dynamic programming methods, this problem is still a challenge, due to its high dimension and to the sacrifices made on the accuracy of the model to apply state of the art methods. We investigate on the applicability of Monte Carlo Tree Search methods for this problem, and other problems that are single player, stochastic and continuous sequential decision making problems. We started by extending the traditional finite state MCTS to continuous domains, with a method called Double Progressive Widening (DPW). This method relies on two hyper parameters, and determines the ratio between width and depth in the nodes of the tree. We developed a heuristic called Blind Value (BV) to improve the exploration of new actions, using the information from past simulations. We also extended the RAVE heuristic to continuous domain. Finally, we proposed two new ways of backing up information through the tree, that improved the convergence speed considerably on two test cases.An important part of our work was to propose a way to mix MCTS with existing powerful heuristics, with the application to energy management in mind. We did so by proposing a framework that allows to learn a good default policy by Direct Policy Search (DPS), and to include it in MCTS. The experimental results are very positive.To extend the reach of MCTS, we showed how it could be used to solve Partially Observable Markovian Decision Processes, with an application to game of Mine Sweeper, for which no consistent method had been proposed before.Finally, we used MCTS in a meta-bandit framework to solve energy investment problems: the investment decision was handled by classical bandit algorithms, while the evaluation of each investment was done by MCTS.The most important take away is that continuous MCTS has almost no assumption (besides the need for a generative model), is consistent, and can easily improve existing suboptimal solvers by using a method similar to what we proposed with DPS.
|
23 |
On recommendation systems in a sequential context / Des Systèmes de Recommandation dans un Contexte SéquentielGuillou, Frédéric 02 December 2016 (has links)
Cette thèse porte sur l'étude des Systèmes de Recommandation dans un cadre séquentiel, où les retours des utilisateurs sur des articles arrivent dans le système l'un après l'autre. Après chaque retour utilisateur, le système doit le prendre en compte afin d'améliorer les recommandations futures. De nombreuses techniques de recommandation ou méthodologies d'évaluation ont été proposées par le passé pour les problèmes de recommandation. Malgré cela, l'évaluation séquentielle, qui est pourtant plus réaliste et se rapproche davantage du cadre d'évaluation d'un vrai système de recommandation, a été laissée de côté. Le contexte séquentiel nécessite de prendre en considération différents aspects non visibles dans un contexte fixe. Le premier de ces aspects est le dilemme dit d'exploration vs. exploitation: le modèle effectuant les recommandations doit trouver le bon compromis entre recueillir de l'information sur les goûts des utilisateurs à travers des étapes d'exploration, et exploiter la connaissance qu'il a à l'heure actuelle pour maximiser le feedback reçu. L'importance de ce premier point est mise en avant à travers une première évaluation, et nous proposons une approche à la fois simple et efficace, basée sur la Factorisation de Matrice et un algorithme de Bandit Manchot, pour produire des recommandations appropriées. Le second aspect pouvant apparaître dans le cadre séquentiel surgit dans le cas où une liste ordonnée d'articles est recommandée au lieu d'un seul article. Dans cette situation, le feedback donné par l'utilisateur est multiple: la partie explicite concerne la note donnée par l'utilisateur concernant l'article choisi, tandis que la partie implicite concerne les articles cliqués (ou non cliqués) parmi les articles de la liste. En intégrant les deux parties du feedback dans un modèle d'apprentissage, nous proposons une approche basée sur la Factorisation de Matrice, qui peut recommander de meilleures listes ordonnées d'articles, et nous évaluons cette approche dans un contexte séquentiel particulier pour montrer son efficacité. / This thesis is dedicated to the study of Recommendation Systems under a sequential setting, where the feedback given by users on items arrive one after another in the system. After each feedback, the system has to integrate it and try to improve future recommendations. Many techniques or evaluation methods have already been proposed to study the recommendation problem. Despite that, such sequential setting, which is more realistic and represent a closer framework to a real Recommendation System evaluation, has surprisingly been left aside. Under a sequential context, recommendation techniques need to take into consideration several aspects which are not visible for a fixed setting. The first one is the exploration-exploitation dilemma: the model making recommendations needs to find a good balance between gathering information about users' tastes or items through exploratory recommendation steps, and exploiting its current knowledge of the users and items to try to maximize the feedback received. We highlight the importance of this point through the first evaluation study and propose a simple yet efficient approach to make effective recommendation, based on Matrix Factorization and Multi-Armed Bandit algorithms. The second aspect emphasized by the sequential context appears when a list of items is recommended to the user instead of a single item. In such a case, the feedback given by the user includes two parts: the explicit feedback as the rating, but also the implicit feedback given by clicking (or not clicking) on other items of the list. By integrating both feedback into a Matrix Factorization model, we propose an approach which can suggest better ranked list of items, and we evaluate it in a particular setting.
|
24 |
Assessing and improving recommender systems to deal with user cold-start problemPaixão, Crícia Zilda Felício 06 March 2017 (has links)
Sistemas de recomendação fazem parte do nosso dia-a-dia. Os métodos usados nesses
sistemas tem como objetivo principal predizer as preferências por novos itens baseado no
perĄl do usuário. As pesquisas relacionadas a esse tópico procuram entre outras coisas
tratar o problema do cold-start do usuário, que é o desaĄo de recomendar itens para
usuários que possuem poucos ou nenhum registro de preferências no sistema.
Uma forma de tratar o cold-start do usuário é buscar inferir as preferências dos usuários
a partir de informações adicionais. Dessa forma, informações adicionais de diferentes tipos
podem ser exploradas nas pesquisas. Alguns estudos usam informação social combinada
com preferências dos usuários, outros se baseiam nos clicks ao navegar por sites Web,
informação de localização geográĄca, percepção visual, informação de contexto, etc. A
abordagem típica desses sistemas é usar informação adicional para construir um modelo
de predição para cada usuário. Além desse processo ser mais complexo, para usuários
full cold-start (sem preferências identiĄcadas pelo sistema) em particular, a maioria dos
sistemas de recomendação apresentam um baixo desempenho. O trabalho aqui apresentado,
por outro lado, propõe que novos usuários receberão recomendações mais acuradas
de modelos de predição que já existem no sistema.
Nesta tese foram propostas 4 abordagens para lidar com o problema de cold-start
do usuário usando modelos existentes nos sistemas de recomendação. As abordagens
apresentadas trataram os seguintes aspectos:
o Inclusão de informação social em sistemas de recomendação tradicional: foram investigados
os papéis de várias métricas sociais em um sistema de recomendação de
preferências pairwise fornecendo subsidíos para a deĄnição de um framework geral
para incluir informação social em abordagens tradicionais.
o Uso de similaridade por percepção visual: usando a similaridade por percepção
visual foram inferidas redes, conectando usuários similares, para serem usadas na
seleção de modelos de predição para novos usuários.
o Análise dos benefícios de um framework geral para incluir informação de redes
de usuários em sistemas de recomendação: representando diferentes tipos de informação
adicional como uma rede de usuários, foi investigado como as redes de
usuários podem ser incluídas nos sistemas de recomendação de maneira a beneĄciar
a recomendação para usuários cold-start.
o Análise do impacto da seleção de modelos de predição para usuários cold-start:
a última abordagem proposta considerou que sem a informação adicional o sistema
poderia recomendar para novos usuários fazendo a troca entre os modelos já
existentes no sistema e procurando aprender qual seria o mais adequado para a
recomendação.
As abordagens propostas foram avaliadas em termos da qualidade da predição e da
qualidade do ranking em banco de dados reais e de diferentes domínios. Os resultados
obtidos demonstraram que as abordagens propostas atingiram melhores resultados que os
métodos do estado da arte. / Recommender systems are in our everyday life. The recommendation methods have as
main purpose to predict preferences for new items based on userŠs past preferences. The
research related to this topic seeks among other things to discuss user cold-start problem,
which is the challenge of recommending to users with few or no preferences records.
One way to address cold-start issues is to infer the missing data relying on side information.
Side information of different types has been explored in researches. Some
studies use social information combined with usersŠ preferences, others user click behavior,
location-based information, userŠs visual perception, contextual information, etc. The
typical approach is to use side information to build one prediction model for each cold
user. Due to the inherent complexity of this prediction process, for full cold-start user in
particular, the performance of most recommender systems falls a great deal. We, rather,
propose that cold users are best served by models already built in system.
In this thesis we propose 4 approaches to deal with user cold-start problem using
existing models available for analysis in the recommender systems. We cover the follow
aspects:
o Embedding social information into traditional recommender systems: We investigate
the role of several social metrics on pairwise preference recommendations and
provide the Ąrst steps towards a general framework to incorporate social information
in traditional approaches.
o Improving recommendation with visual perception similarities: We extract networks
connecting users with similar visual perception and use them to come up with
prediction models that maximize the information gained from cold users.
o Analyzing the beneĄts of general framework to incorporate networked information
into recommender systems: Representing different types of side information as a
user network, we investigated how to incorporate networked information into recommender
systems to understand the beneĄts of it in the context of cold user
recommendation.
o Analyzing the impact of prediction model selection for cold users: The last proposal
consider that without side information the system will recommend to cold users
based on the switch of models already built in system.
We evaluated the proposed approaches in terms of prediction quality and ranking
quality in real-world datasets under different recommendation domains. The experiments
showed that our approaches achieve better results than the comparison methods. / Tese (Doutorado)
|
25 |
Learning-based Attack and Defense on Recommender SystemsAgnideven Palanisamy Sundar (11190282) 06 August 2021 (has links)
The internet is the home for massive volumes of valuable data constantly being created, making it difficult for users to find information relevant to them. In recent times, online users have been relying on the recommendations made by websites to narrow down the options. Online reviews have also become an increasingly important factor in the final choice of a customer. Unfortunately, attackers have found ways to manipulate both reviews and recommendations to mislead users. A Recommendation System is a special type of information filtering system adapted by online vendors to provide suggestions to their customers based on their requirements. Collaborative filtering is one of the most widely used recommendation systems; unfortunately, it is prone to shilling/profile injection attacks. Such attacks alter the recommendation process to promote or demote a particular product. On the other hand, many spammers write deceptive reviews to change the credibility of a product/service. This work aims to address these issues by treating the review manipulation and shilling attack scenarios independently. For the shilling attacks, we build an efficient Reinforcement Learning-based shilling attack method. This method reduces the uncertainty associated with the item selection process and finds the most optimal items to enhance attack reach while treating the recommender system as a black box. Such practical online attacks open new avenues for research in building more robust recommender systems. When it comes to review manipulations, we introduce a method to use a deep structure embedding approach that preserves highly nonlinear structural information and the dynamic aspects of user reviews to identify and cluster the spam users. It is worth mentioning that, in the experiment with real datasets, our method captures about 92\% of all spam reviewers using an unsupervised learning approach.<br>
|
26 |
Machine Learning Algorithms for Influence Maximization on Social NetworksAbhishek Kumar Umrawal (16787802) 08 August 2023 (has links)
<p>With an increasing number of users spending time on social media platforms and engaging with family, friends, and influencers within communities of interest (such as in fashion, cooking, gaming, etc.), there are significant opportunities for marketing firms to leverage word-of-mouth advertising on these platforms. In particular, marketing firms can select sets of influencers within relevant communities to sponsor, namely by providing free product samples to those influencers so that so they will discuss and promote the product on their social media accounts.</p><p>The question of which set of influencers to sponsor is known as <b>influence maximization</b> (IM) formally defined as follows: "if we can try to convince a subset of individuals in a social network to adopt a new product or innovation, and the goal is to trigger a large cascade of further adoptions, which set of individuals should we target?'' Under standard diffusion models, this optimization problem is known to be NP-hard. This problem has been widely studied in the literature and several approaches for solving it have been proposed. Some approaches provide near-optimal solutions but are costly in terms of runtime. On the other hand, some approaches are faster but heuristics, i.e., do not have approximation guarantees.</p><p>In this dissertation, we study the influence maximization problem extensively. We provide efficient algorithms for solving the original problem and its important generalizations. Furthermore, we provide theoretical guarantees and experimental evaluations to support the claims made in this dissertation.</p><p>We first study the original IM problem referred to as the discrete influence maximization (DIM) problem where the marketer can either provide a free sample to an influencer or not, i.e., they cannot give fractional discounts like 10% off, etc. As already mentioned the existing solution methods (for instance, the simulation-based greedy algorithm) provide near-optimal solutions that are costly in terms of runtime and the approaches that are faster do not have approximation guarantees. Motivated by the idea of addressing this trade-off between accuracy and runtime, we propose a community-aware divide-and-conquer framework to provide a time-efficient solution to the DIM problem. The proposed framework outperforms the standard methods in terms of runtime and the heuristic methods in terms of influence.</p><p>We next study a natural extension of the DIM problem referred to as the fractional influence maximization (FIM) problem where the marketer may offer fractional discounts (as opposed to either providing a free sample to an influencer or not in the DIM problem) to the influencers. Clearly, the FIM problem provides more flexibility to the marketer in allocating the available budget among different influencers. The existing solution methods propose to use a continuous extension of the simulation-based greedy approximation algorithm for solving the DIM problem. This continuous extension suggests greedily building the solution for the given fractional budget by taking small steps through the interior of the feasible region. On the contrary, we first characterize the solution to the FIM problem in terms of the solution to the DIM problem. We then use this characterization to propose an efficient greedy approximation algorithm that only iterates through the corners of the feasible region. This leads to huge savings in terms of runtime compared to the existing methods that suggest iterating through the interior of the feasible region. Furthermore, we provide an approximation guarantee for the proposed greedy algorithm to solve the FIM problem.</p><p>Finally, we study another extension of the DIM problem referred to as the online discrete influence maximization (ODIM) problem, where the marketer provides free samples not just once but repeatedly over a given time horizon and the goal is to maximize the cumulative influence over time while receiving instantaneous feedback. The existing solution methods are based on semi-bandit instantaneous feedback where the knowledge of some intermediate aspects of how the influence propagates in the social network is assumed or observed. For instance, which specific individuals became influenced at the intermediate steps during the propagation? However, for social networks with user privacy, this information is not available. Hence, we consider the ODIM problem with full-bandit feedback where no knowledge of the underlying social network or diffusion process is assumed. We note that the ODIM problem is an instance of the stochastic combinatorial multi-armed bandit (CMAB) problem with submodular rewards. To solve the ODIM problem, we provide an efficient algorithm that outperforms the existing methods in terms of influence, and time and space complexities.</p><p>Furthermore, we point out the connections of influence maximization with a related problem of disease outbreak prevention and a more general problem of submodular maximization. The methods proposed in this dissertation can also be used to solve those problems.</p>
|
27 |
Reference Tracking with Adversarial Adaptive Output- Feedback Model Predictive ControlBui, Linda January 2021 (has links)
Model Predictive Control (MPC) is a control strategy based on optimization that handles system constraints explicitly, making it a popular feedback control method in real industrial processes. However, designing this control policy is an expensive operation since an explicit model of the process is required when re-tuning the controller. Another common practical challenge is that not all states are available, which calls for an observer in order to estimate the states, and imposes additional challenges such as satisfying the constraints and conditions that follow. This thesis attempts to address these challenges by extending the novel Adversarial Adaptive Model Predictive Control (AAMPC) algorithm with output-feedback for linear plants without explicit identification. The AAMPC algorithm is an adaptive MPC framework, where results from an adversarial Multi-Armed Bandit (MAB) are applied to a basic model predictive control formulation. The algorithm of the project, Adversarial Adaptive Output-Feedback Model Predictive Control (AAOFMPC), is derived by extending the standard MPC formulation with output-feedback, i.e, to an Output-Feedback Model Predictive Control (OFMPC) scheme, where a Kalman filter is implemented as the observer. Furthermore, the control performance of the extended algorithm is demonstrated with the problem of driving the state to a given reference, in which the performance is evaluated in terms of regret, state estimation errors, and how well the states track their given reference. Experiments are conducted on two discrete-time Linear Time- Invariant (LTI) systems, a second order system and a third order system, that are perturbed with different noise sequences. It is shown that the AAOFMPC performance satisfies the given theoretical bounds and constraints despite larger perturbations. However, it is also shown that the algorithm is not very robust against noise since offsets from the reference values for the state trajectories are observed. Furthermore, there are several tuning parameters of AAOFMPC that need further investigation for optimal performance. / Modell Prediktiv Reglering (MPC) är en optimeringsbaserad reglertekniksmetod som hanterar processbegränsingar på ett systematiskt sätt, vilket gör den till en populär metod inom återkopplad reglering i processindustrin. Denna metod medför dock höga beräkningskostnader eftersom det krävs en explicit modell varje gång regulatorn justeras online. I praktiken är det också vanligt att alla tillståndsvariabler inte är tillgängliga, vilket kräver en observatör för att rekonstruera alla tillståndsvariabler. Detta leder till fler utmaningar som att uppfylla ytterligare systembegränsingar och villkor som följer. Detta projekt adresserar dessa utmaningar genom att förlänga den nya algoritmen Adversarial Adaptiv Modell Prediktiv Reglering (AAMPC) med output-feedback för linjära system utan explicit modellidentifiering. AAMPC-algoritmen är en adaptiv reglerstrategi där resultat från en adversarial multiarmed bandit (MAB) appliceras i en standard MPC-formulering. Denna MPC-formulering är förlängd med output-feedback dvs. Output-Feedback Modell Predktiv Reglering (OFMPC) där ett Kalman filter är implementerad som en observatör och resulterar i projektets algoritm: Adversarial Adaptiv Output- Feedback Modell Prediktiv Reglering (AAOFMPC). Vidare demonstreras den utökade algoritmens prestanda med problemet att driva tillståndsvariablerna till ett givet referensvärde, där prestandan evalueras i termer av regret, skattningsfel och hur väl tillståndsvariablerna följer de givna referensvärdena. Experiment utförs på två tidsdiskreta tidsinvarianta (LTI) system, ett andraordningssystem och ett tredjeordningssystem, som är perturberade med olika värden av brus. Resultaten visar att AAOFMPC:s prestanda uppfyller de givna teoretiska begränsningarna trots större störningar. Det visar sig dock att algoritmen inte är särskilt robust mot brus eftersom det sker avvikelser från de givna referensvärdena för tillståndsvariablerna. Dessutom finns det flera parametrar i algoritmen som kräver ytterligare utredningar för optimal prestanda.
|
28 |
Frontiers of Large Language Models: Empowering Decision Optimization, Scene Understanding, and Summarization Through Advanced Computational Approachesde Curtò i Díaz, Joaquim 23 January 2024 (has links)
Tesis por compendio / [ES] El advenimiento de los Large Language Models (LLMs) marca una fase transformadora en el campo de la Inteligencia Artificial (IA), significando el cambio hacia sistemas inteligentes y autónomos capaces de una comprensión y toma de decisiones complejas. Esta tesis profundiza en las capacidades multifacéticas de los LLMs, explorando sus posibles aplicaciones en la optimización de decisiones, la comprensión de escenas y tareas avanzadas de resumen de video en diversos contextos.
En el primer segmento de la tesis, el foco está en la comprensión semántica de escenas de Vehículos Aéreos No Tripulados (UAVs). La capacidad de proporcionar instantáneamente datos de alto nivel y señales visuales sitúa a los UAVs como plataformas ideales para realizar tareas complejas. El trabajo combina el potencial de los LLMs, los Visual Language Models (VLMs), y los sistemas de detección objetos de última generación para ofrecer descripciones de escenas matizadas y contextualmente precisas. Se presenta una implementación práctica eficiente y bien controlada usando microdrones en entornos complejos, complementando el estudio con métricas de legibilidad estandarizadas propuestas para medir la calidad de las descripciones mejoradas por los LLMs. Estos avances podrían impactar significativamente en sectores como el cine, la publicidad y los parques temáticos, mejorando las experiencias de los usuarios de manera exponencial.
El segundo segmento arroja luz sobre el problema cada vez más crucial de la toma de decisiones bajo incertidumbre. Utilizando el problema de Multi-Armed Bandits (MAB) como base, el estudio explora el uso de los LLMs para informar y guiar estrategias en entornos dinámicos. Se postula que el poder predictivo de los LLMs puede ayudar a elegir el equilibrio correcto entre exploración y explotación basado en el estado actual del sistema. A través de pruebas rigurosas, la estrategia informada por los LLMs propuesta demuestra su adaptabilidad y su rendimiento competitivo frente a las estrategias convencionales.
A continuación, la investigación se centra en el estudio de las evaluaciones de bondad de ajuste de las Generative Adversarial Networks (GANs) utilizando la Signature Transform. Al proporcionar una medida eficiente de similitud entre las distribuciones de imágenes, el estudio arroja luz sobre la estructura intrínseca de las muestras generadas por los GANs. Un análisis exhaustivo utilizando medidas estadísticas como las pruebas de Kruskal-Wallis proporciona una comprensión más amplia de la convergencia de los GANs y la bondad de ajuste.
En la sección final, la tesis introduce un nuevo benchmark para la síntesis automática de vídeos, enfatizando la integración armoniosa de los LLMs y la Signature Transform. Se propone un enfoque innovador basado en los componentes armónicos capturados por la Signature Transform. Las medidas son evaluadas extensivamente, demostrando ofrecer una precisión convincente que se correlaciona bien con el concepto humano de un buen resumen.
Este trabajo de investigación establece a los LLMs como herramientas poderosas para abordar tareas complejas en diversos dominios, redefiniendo la optimización de decisiones, la comprensión de escenas y las tareas de resumen de video. No solo establece nuevos postulados en las aplicaciones de los LLMs, sino que también establece la dirección para futuros trabajos en este emocionante y rápidamente evolucionante campo. / [CA] L'adveniment dels Large Language Models (LLMs) marca una fase transformadora en el camp de la Intel·ligència Artificial (IA), significat el canvi cap a sistemes intel·ligents i autònoms capaços d'una comprensió i presa de decisions complexes. Aquesta tesi profunditza en les capacitats multifacètiques dels LLMs, explorant les seues possibles aplicacions en l'optimització de decisions, la comprensió d'escenes i tasques avançades de resum de vídeo en diversos contexts.
En el primer segment de la tesi, el focus està en la comprensió semàntica d'escenes de Vehicles Aeris No Tripulats (UAVs). La capacitat de proporcionar instantàniament dades d'alt nivell i senyals visuals situa els UAVs com a plataformes ideals per a realitzar tasques complexes. El treball combina el potencial dels LLMs, els Visual Language Models (VLMs), i els sistemes de detecció d'objectes d'última generació per a oferir descripcions d'escenes matisades i contextualment precises. Es presenta una implementació pràctica eficient i ben controlada usant microdrons en entorns complexos, complementant l'estudi amb mètriques de llegibilitat estandarditzades proposades per a mesurar la qualitat de les descripcions millorades pels LLMs. Aquests avenços podrien impactar significativament en sectors com el cinema, la publicitat i els parcs temàtics, millorant les experiències dels usuaris de manera exponencial.
El segon segment arroja llum sobre el problema cada vegada més crucial de la presa de decisions sota incertesa. Utilitzant el problema dels Multi-Armed Bandits (MAB) com a base, l'estudi explora l'ús dels LLMs per a informar i guiar estratègies en entorns dinàmics. Es postula que el poder predictiu dels LLMs pot ajudar a triar l'equilibri correcte entre exploració i explotació basat en l'estat actual del sistema. A través de proves rigoroses, l'estratègia informada pels LLMs proposada demostra la seua adaptabilitat i el seu rendiment competitiu front a les estratègies convencionals.
A continuació, la recerca es centra en l'estudi de les avaluacions de bondat d'ajust de les Generative Adversarial Networks (GANs) utilitzant la Signature Transform. En proporcionar una mesura eficient de similitud entre les distribucions d'imatges, l'estudi arroja llum sobre l'estructura intrínseca de les mostres generades pels GANs. Una anàlisi exhaustiva utilitzant mesures estadístiques com les proves de Kruskal-Wallis proporciona una comprensió més àmplia de la convergència dels GANs i la bondat d'ajust.
En la secció final, la tesi introdueix un nou benchmark per a la síntesi automàtica de vídeos, enfatitzant la integració harmònica dels LLMs i la Signature Transform. Es proposa un enfocament innovador basat en els components harmònics capturats per la Signature Transform. Les mesures són avaluades extensivament, demostrant oferir una precisió convincent que es correlaciona bé amb el concepte humà d'un bon resum.
Aquest treball de recerca estableix els LLMs com a eines poderoses per a abordar tasques complexes en diversos dominis, redefinint l'optimització de decisions, la comprensió d'escenes i les tasques de resum de vídeo. No solament estableix nous postulats en les aplicacions dels LLMs, sinó que també estableix la direcció per a futurs treballs en aquest emocionant i ràpidament evolucionant camp. / [EN] The advent of Large Language Models (LLMs) marks a transformative phase in the field of Artificial Intelligence (AI), signifying the shift towards intelligent and autonomous systems capable of complex understanding and decision-making. This thesis delves deep into the multifaceted capabilities of LLMs, exploring their potential applications in decision optimization, scene understanding, and advanced summarization tasks in diverse contexts.
In the first segment of the thesis, the focus is on Unmanned Aerial Vehicles' (UAVs) semantic scene understanding. The capability of instantaneously providing high-level data and visual cues positions UAVs as ideal platforms for performing complex tasks. The work combines the potential of LLMs, Visual Language Models (VLMs), and state-of-the-art detection pipelines to offer nuanced and contextually accurate scene descriptions. A well-controlled, efficient practical implementation of microdrones in challenging settings is presented, supplementing the study with proposed standardized readability metrics to gauge the quality of LLM-enhanced descriptions. This could significantly impact sectors such as film, advertising, and theme parks, enhancing user experiences manifold.
The second segment brings to light the increasingly crucial problem of decision-making under uncertainty. Using the Multi-Armed Bandit (MAB) problem as a foundation, the study explores the use of LLMs to inform and guide strategies in dynamic environments. It is postulated that the predictive power of LLMs can aid in choosing the correct balance between exploration and exploitation based on the current state of the system. Through rigorous testing, the proposed LLM-informed strategy showcases its adaptability and its competitive performance against conventional strategies.
Next, the research transitions into studying the goodness-of-fit assessments of Generative Adversarial Networks (GANs) utilizing the Signature Transform. By providing an efficient measure of similarity between image distributions, the study sheds light on the intrinsic structure of the samples generated by GANs. A comprehensive analysis using statistical measures, such as the test Kruskal-Wallis, provides a more extensive understanding of the GAN convergence and goodness of fit.
In the final section, the thesis introduces a novel benchmark for automatic video summarization, emphasizing the harmonious integration of LLMs and Signature Transform. An innovative approach grounded in the harmonic components captured by the Signature Transform is put forth. The measures are extensively evaluated, proving to offer compelling accuracy that correlates well with the concept of a good summary.
This research work establishes LLMs as powerful tools in addressing complex tasks across diverse domains, redefining decision optimization, scene understanding, and summarization tasks. It not only breaks new ground in the applications of LLMs but also sets the direction for future work in this exciting and rapidly evolving field. / De Curtò I Díaz, J. (2023). Frontiers of Large Language Models: Empowering Decision Optimization, Scene Understanding, and Summarization Through Advanced Computational Approaches [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/202200 / Compendio
|
29 |
Contributions to Multi-Armed Bandits : Risk-Awareness and Sub-Sampling for Linear Contextual Bandits / Contributions aux bandits manchots : gestion du risque et sous-échantillonnage pour les bandits contextuels linéairesGalichet, Nicolas 28 September 2015 (has links)
Cette thèse s'inscrit dans le domaine de la prise de décision séquentielle en environnement inconnu, et plus particulièrement dans le cadre des bandits manchots (multi-armed bandits, MAB), défini par Robbins et Lai dans les années 50. Depuis les années 2000, ce cadre a fait l'objet de nombreuses recherches théoriques et algorithmiques centrées sur le compromis entre l'exploration et l'exploitation : L'exploitation consiste à répéter le plus souvent possible les choix qui se sont avérés les meilleurs jusqu'à présent. L'exploration consiste à essayer des choix qui ont rarement été essayés, pour vérifier qu'on a bien identifié les meilleurs choix. Les applications des approches MAB vont du choix des traitements médicaux à la recommandation dans le contexte du commerce électronique, en passant par la recherche de politiques optimales de l'énergie. Les contributions présentées dans ce manuscrit s'intéressent au compromis exploration vs exploitation sous deux angles spécifiques. Le premier concerne la prise en compte du risque. Toute exploration dans un contexte inconnu peut en effet aboutir à des conséquences indésirables ; par exemple l'exploration des comportements d'un robot peut aboutir à des dommages pour le robot ou pour son environnement. Dans ce contexte, l'objectif est d'obtenir un compromis entre exploration, exploitation, et prise de risque (EER). Plusieurs algorithmes originaux sont proposés dans le cadre du compromis EER. Sous des hypothèses fortes, l'algorithme MIN offre des garanties de regret logarithmique, à l'état de l'art ; il offre également une grande robustesse, contrastant avec la forte sensibilité aux valeurs des hyper-paramètres de e.g. (Auer et al. 2002). L'algorithme MARAB s'intéresse à un critère inspiré de la littérature économique(Conditional Value at Risk), et montre d'excellentes performances empiriques comparées à (Sani et al. 2012), mais sans garanties théoriques. Enfin, l'algorithme MARABOUT modifie l'estimation du critère CVaR pour obtenir des garanties théoriques, tout en obtenant un bon comportement empirique. Le second axe de recherche concerne le bandit contextuel, où l'on dispose d'informations additionnelles relatives au contexte de la décision ; par exemple, les variables d'état du patient dans un contexte médical ou de l'utilisateur dans un contexte de recommandation. L'étude se focalise sur le choix entre bras qu'on a tirés précédemment un nombre de fois différent. Le choix repose en général sur la notion d'optimisme, comparant les bornes supérieures des intervalles de confiance associés aux bras considérés. Une autre approche appelée BESA, reposant sur le sous-échantillonnage des valeurs tirées pour les bras les plus visités, et permettant ainsi de se ramener au cas où tous les bras ont été tirés un même nombre de fois, a été proposée par (Baransi et al. 2014). / This thesis focuses on sequential decision making in unknown environment, and more particularly on the Multi-Armed Bandit (MAB) setting, defined by Lai and Robbins in the 50s. During the last decade, many theoretical and algorithmic studies have been aimed at cthe exploration vs exploitation tradeoff at the core of MABs, where Exploitation is biased toward the best options visited so far while Exploration is biased toward options rarely visited, to enforce the discovery of the the true best choices. MAB applications range from medicine (the elicitation of the best prescriptions) to e-commerce (recommendations, advertisements) and optimal policies (e.g., in the energy domain). The contributions presented in this dissertation tackle the exploration vs exploitation dilemma under two angles. The first contribution is centered on risk avoidance. Exploration in unknown environments often has adverse effects: for instance exploratory trajectories of a robot can entail physical damages for the robot or its environment. We thus define the exploration vs exploitation vs safety (EES) tradeoff, and propose three new algorithms addressing the EES dilemma. Firstly and under strong assumptions, the MIN algorithm provides a robust behavior with guarantees of logarithmic regret, matching the state of the art with a high robustness w.r.t. hyper-parameter setting (as opposed to, e.g. UCB (Auer 2002)). Secondly, the MARAB algorithm aims at optimizing the cumulative 'Conditional Value at Risk' (CVar) rewards, originated from the economics domain, with excellent empirical performances compared to (Sani et al. 2012), though without any theoretical guarantees. Finally, the MARABOUT algorithm modifies the CVar estimation and yields both theoretical guarantees and a good empirical behavior. The second contribution concerns the contextual bandit setting, where additional informations are provided to support the decision making, such as the user details in the ontent recommendation domain, or the patient history in the medical domain. The study focuses on how to make a choice between two arms with different numbers of samples. Traditionally, a confidence region is derived for each arm based on the associated samples, and the 'Optimism in front of the unknown' principle implements the choice of the arm with maximal upper confidence bound. An alternative, pioneered by (Baransi et al. 2014), and called BESA, proceeds instead by subsampling without replacement the larger sample set. In this framework, we designed a contextual bandit algorithm based on sub-sampling without replacement, relaxing the (unrealistic) assumption that all arm reward distributions rely on the same parameter. The CL-BESA algorithm yields both theoretical guarantees of logarithmic regret and good empirical behavior.
|
Page generated in 0.0464 seconds