51 |
Information incomplète et regret interne en prédiction de suites individuellesStoltz, Gilles 27 May 2005 (has links) (PDF)
Le domaine de recherche dans lequel s'inscrit ce travail de thèse est la théorie de la prédiction des suites individuelles. Cette dernière considère les problèmes d'apprentissage séquentiel pour lesquels on ne peut ou ne veut pas modéliser le problème de manière stochastique, et fournit des stratégies de prédiction très robustes. Elle englobe aussi bien des problèmes issus de la communauté du machine learning que de celle de la théorie des jeux répétés, et ces derniers sont traités avec des méthodes statistiques, incluant par exemple les techniques de concentration de la mesure ou de l'estimation adaptative. Les résultats obtenus aboutissent, entre autres, à des stratégies de minimisation des regrets externe et interne dans les jeux à information incomplète, notamment les jeux répétés avec signaux. Ces stratégies s'appliquent au problème d'ajustement séquentiel des prix de vente, ou d'allocation séquentielle de bande passante. Le regret interne est ensuite plus spécifiquement étudié, d'abord dans le cadre de l'investissement séquentiel dans le marché boursier, pour lequel des simulations sur des données historiques sont proposées, puis pour l'apprentissage des équilibres corrélés des jeux infinis à ensembles de stratégies convexes et compacts.
|
52 |
Regret minimisation and system-efficiency in route choice / Minimização de Regret e eficiência do sistema em escala de rotasRamos, Gabriel de Oliveira January 2018 (has links)
Aprendizagem por reforço multiagente (do inglês, MARL) é uma tarefa desafiadora em que agentes buscam, concorrentemente, uma política capaz de maximizar sua utilidade. Aprender neste tipo de cenário é difícil porque os agentes devem se adaptar uns aos outros, tornando o objetivo um alvo em movimento. Consequentemente, não existem garantias de convergência para problemas de MARL em geral. Esta tese explora um problema em particular, denominado escolha de rotas (onde motoristas egoístas deve escolher rotas que minimizem seus custos de viagem), em busca de garantias de convergência. Em particular, esta tese busca garantir a convergência de algoritmos de MARL para o equilíbrio dos usuários (onde nenhum motorista consegue melhorar seu desempenho mudando de rota) e para o ótimo do sistema (onde o tempo médio de viagem é mínimo). O principal objetivo desta tese é mostrar que, no contexto de escolha de rotas, é possível garantir a convergência de algoritmos de MARL sob certas condições. Primeiramente, introduzimos uma algoritmo de aprendizagem por reforço baseado em minimização de arrependimento, o qual provamos ser capaz de convergir para o equilíbrio dos usuários Nosso algoritmo estima o arrependimento associado com as ações dos agentes e usa tal informação como sinal de reforço dos agentes. Além do mais, estabelecemos um limite superior no arrependimento dos agentes. Em seguida, estendemos o referido algoritmo para lidar com informações não-locais, fornecidas por um serviço de navegação. Ao usar tais informações, os agentes são capazes de estimar melhor o arrependimento de suas ações, o que melhora seu desempenho. Finalmente, de modo a mitigar os efeitos do egoísmo dos agentes, propomos ainda um método genérico de pedágios baseados em custos marginais, onde os agentes são cobrados proporcionalmente ao custo imposto por eles aos demais. Neste sentido, apresentamos ainda um algoritmo de aprendizagem por reforço baseado em pedágios que, provamos, converge para o ótimo do sistema e é mais justo que outros existentes na literatura. / Multiagent reinforcement learning (MARL) is a challenging task, where self-interested agents concurrently learn a policy that maximise their utilities. Learning here is difficult because agents must adapt to each other, which makes their objective a moving target. As a side effect, no convergence guarantees exist for the general MARL setting. This thesis exploits a particular MARL problem, namely route choice (where selfish drivers aim at choosing routes that minimise their travel costs), to deliver convergence guarantees. We are particularly interested in guaranteeing convergence to two fundamental solution concepts: the user equilibrium (UE, when no agent benefits from unilaterally changing its route) and the system optimum (SO, when average travel time is minimum). The main goal of this thesis is to show that, in the context of route choice, MARL can be guaranteed to converge to the UE as well as to the SO upon certain conditions. Firstly, we introduce a regret-minimising Q-learning algorithm, which we prove that converges to the UE. Our algorithm works by estimating the regret associated with agents’ actions and using such information as reinforcement signal for updating the corresponding Q-values. We also establish a bound on the agents’ regret. We then extend this algorithm to deal with non-local information provided by a navigation service. Using such information, agents can improve their regrets estimates, thus performing empirically better. Finally, in order to mitigate the effects of selfishness, we also present a generalised marginal-cost tolling scheme in which drivers are charged proportional to the cost imposed on others. We then devise a toll-based Q-learning algorithm, which we prove that converges to the SO and that is fairer than existing tolling schemes.
|
53 |
Regret minimisation and system-efficiency in route choice / Minimização de Regret e eficiência do sistema em escala de rotasRamos, Gabriel de Oliveira January 2018 (has links)
Aprendizagem por reforço multiagente (do inglês, MARL) é uma tarefa desafiadora em que agentes buscam, concorrentemente, uma política capaz de maximizar sua utilidade. Aprender neste tipo de cenário é difícil porque os agentes devem se adaptar uns aos outros, tornando o objetivo um alvo em movimento. Consequentemente, não existem garantias de convergência para problemas de MARL em geral. Esta tese explora um problema em particular, denominado escolha de rotas (onde motoristas egoístas deve escolher rotas que minimizem seus custos de viagem), em busca de garantias de convergência. Em particular, esta tese busca garantir a convergência de algoritmos de MARL para o equilíbrio dos usuários (onde nenhum motorista consegue melhorar seu desempenho mudando de rota) e para o ótimo do sistema (onde o tempo médio de viagem é mínimo). O principal objetivo desta tese é mostrar que, no contexto de escolha de rotas, é possível garantir a convergência de algoritmos de MARL sob certas condições. Primeiramente, introduzimos uma algoritmo de aprendizagem por reforço baseado em minimização de arrependimento, o qual provamos ser capaz de convergir para o equilíbrio dos usuários Nosso algoritmo estima o arrependimento associado com as ações dos agentes e usa tal informação como sinal de reforço dos agentes. Além do mais, estabelecemos um limite superior no arrependimento dos agentes. Em seguida, estendemos o referido algoritmo para lidar com informações não-locais, fornecidas por um serviço de navegação. Ao usar tais informações, os agentes são capazes de estimar melhor o arrependimento de suas ações, o que melhora seu desempenho. Finalmente, de modo a mitigar os efeitos do egoísmo dos agentes, propomos ainda um método genérico de pedágios baseados em custos marginais, onde os agentes são cobrados proporcionalmente ao custo imposto por eles aos demais. Neste sentido, apresentamos ainda um algoritmo de aprendizagem por reforço baseado em pedágios que, provamos, converge para o ótimo do sistema e é mais justo que outros existentes na literatura. / Multiagent reinforcement learning (MARL) is a challenging task, where self-interested agents concurrently learn a policy that maximise their utilities. Learning here is difficult because agents must adapt to each other, which makes their objective a moving target. As a side effect, no convergence guarantees exist for the general MARL setting. This thesis exploits a particular MARL problem, namely route choice (where selfish drivers aim at choosing routes that minimise their travel costs), to deliver convergence guarantees. We are particularly interested in guaranteeing convergence to two fundamental solution concepts: the user equilibrium (UE, when no agent benefits from unilaterally changing its route) and the system optimum (SO, when average travel time is minimum). The main goal of this thesis is to show that, in the context of route choice, MARL can be guaranteed to converge to the UE as well as to the SO upon certain conditions. Firstly, we introduce a regret-minimising Q-learning algorithm, which we prove that converges to the UE. Our algorithm works by estimating the regret associated with agents’ actions and using such information as reinforcement signal for updating the corresponding Q-values. We also establish a bound on the agents’ regret. We then extend this algorithm to deal with non-local information provided by a navigation service. Using such information, agents can improve their regrets estimates, thus performing empirically better. Finally, in order to mitigate the effects of selfishness, we also present a generalised marginal-cost tolling scheme in which drivers are charged proportional to the cost imposed on others. We then devise a toll-based Q-learning algorithm, which we prove that converges to the SO and that is fairer than existing tolling schemes.
|
54 |
Prédiction de structure tridimensionnelle de molécules d’ARN par minimisation de regret / Prediction of three-dimensional structure of RNA molecules by regret minimizationBoudard, Mélanie 29 April 2016 (has links)
Les fonctions d'une molécule d'ARN dans les processus cellulaires sont très étroitement liées à sa structure tridimensionnelle. Il est donc essentiel de pouvoir prédire cette structure pour étudier sa fonction. Le repliement de l'ARN peut être vu comme un processus en deux étapes : le repliement en structure secondaire, grâce à des interactions fortes, puis le repliement en structure tridimensionnelle par des interactions tertiaires. Prédire la structure secondaire a donné lieu à de nombreuses avancées depuis plus de trente ans. Toutefois, la prédiction de la structure tridimensionnelle est un problème bien plus difficile. Nous nous intéressons ici au problème de prédiction de la structure 3D d'ARN sous la forme d'un jeu. Nous représentons la structure secondaire de l'ARN comme un graphe : cela correspond à une modélisation à gros grain de cette structure. Cette modélisation permet de réaliser un jeu de repliement dans l'espace. Notre hypothèse consiste à voir la structure 3D comme un équilibre en théorie des jeux. Pour atteindre cet équilibre, nous utiliserons des algorithmes de minimisation de regret. Nous étudierons aussi différentes formalisations du jeu, basées sur des statistiques biologiques. L'objectif de ce travail est de développer une méthode de repliement d'ARN fonctionnant sur tous les types de molécule d'ARN et obtenant des structures similaires aux molécules réelles. Notre méthode, nommée GARN, a atteint les objectifs attendus et nous a permis d'approfondir l'impact de certains paramètres pour la prédiction de structure à gros grain des molécules. / The functions of RNA molecules in cellular processes are related very closely to its three dimensional structure. It is thus essential to predict the structure for understanding RNA functions. This folding can be seen as a two-step process: the formation of a secondary structure and the formation of three-dimensional structure. This first step is the results of strong interactions between nucleotides, and the second one is obtain by the tertiary interactions. Predicting the secondary structure is well-known and results in numerous advances since thirty years. However, predicting the three-dimensional structure is a more difficult problem due to the high number of possibility. To overcome this problem, we decided to see the folding of the RNA structure as a game. The secondary structure of the RNA is represented as a graph: its corresponds to a coarse-grained modeling of this structure. This modeling allows us to fold the RNA molecule in a discrete space. Our hypothesis is to understand the 3D structure like an equilibrium in game theory. To find this equilibrium, we will use regret minimization algorithms. We also study different formalizations of the game, based on biological statistics. The objective of this work is to develop a method of RNA folding which will work on all types of secondary structures and results more accurate than current approaches. Our method, called GARN, reached the expected objectives and allowed us to deepen the interesting factors for coarse-grained structure prediction on molecules.
|
55 |
Approchabilité, Calibration et Regret dans les Jeux à Observations PartiellesPerchet, Vianney 25 June 2010 (has links) (PDF)
Cette thèse s'intéresse aux jeux statistiques avec observations partielles. Ces jeux ne sont pas la formalisation d'une intéraction stratégique entre deux joueurs parfaitement rationnels, mais entre un joueur et la nature (ou l'environnement). On donne ce nom au second joueur car aucune hypothèse n'est faite sur ses paiements, ses objectifs ou sa rationalité. Les observations du joueur sont dites complètes s'il observe les choix de la nature, i.e. si il apprend a posteriori soit quelle est, à chaque étape, l'action choisie par cette dernière soit au moins son propre paiement. On s'intéressera au cadre où cette hypothèse est aaiblie et où l'on suppose que le joueur n'a que des observations partielles : il ne reçoit à chaque étape qu'un signal aléatoire dont la loi dépend de l'action de la nature. L'objectif principal de cette thèse est de généraliser des notions largement utilis ées dans les jeux avec observations complètes au cadre des jeux avec observations partielles. Nous allons en eet, dans un premier temps, construire des stratégies qui n'ont pas de regret interne et dans un deuxième temps nous allons caractériser les ensembles approchables.
|
56 |
"Förlåt dem Fader, för de vet icke vad de gör" : En religionspsykologisk studie av sista uttalanden på Texas Death RowLindner, Fredrik January 2014 (has links)
The United States is one of few Western nations that administers capital punishment to their condemned criminals. Texas has executed the most inmates of any state in the U.S. This prompted questions about the psychological characteristics of death row inmates. The purpose of this essay is to widen the perspective on inmates serving capital sentences by analyzing their final statements. The focus of this essay was centered on acute anxiety of death among inmates, visible in their final statements, as a result of an accelerated process of dying. Using three of the eight phases in Erik H. Erikson’s psychosocial theory on individual development, where the final phase is centered around the conflict between integrity and despair, the inmates’ final statements were analyzed. Despair is here identified by three factors: an inability to accept ones situation, anxiety of death and a strong feeling of regret. Through textual analysis, certain codes affiliated with Erikson’s theory were identified and studied. In addition to this, the inmates’ trust and comfort in a transcendental power was also studied as a part of mitigating anxiety among inmates. Erikson’s theory is in this essay supplemented by discourse analysis and discourse psychology in particular to provide additional context to the inmates’ situation. A sociopolitical dimension was also added to provide further analysis. The study found that inmates tend to regret the actions that led them to the incarceration, while also accepting their situation, which does not indicate acute anxiety of dying but rather expression of what is in this paper called the inmate discourse, in which inmates are expected by the wider society to regret their actions. Inmates also tend to describe God or a transcendental authority as a forgiving and loving one, which in this paper is identified not only as a way of finding comfort in a dire situation, but also to belong to a larger American community through religious identification.
|
57 |
The Triggers of Buyers Regret of Impulsive PurchasesEsterhammer, Oliver, Huang, Jiahao January 2017 (has links)
Attention on impulsive buying behavior has been increased from both researchers and marketers, as the negative consumption experience resulting from this unplanned buying could harm the business severely in terms of brand building, reputation as well as a loss of customer. By reviewing previous literatures, we have identified that there is still little research about the post-consumer behavior of impulse purchases, namely on consumers’ regret triggered from what they have bought impulsively. The purpose of this study is to discover the triggers of buyer regret from impulse purchase, which is presented by the research question “What are the triggers of buyer regret from impulse purchases?” By conducting a quantitative research, we proposed a conceptual model of impulse purchase regret that consists of six hypotheses. The technical tool that we used to test the conceptual model is a SPSS extension called AMOS, whereas the analysis method uses the application of structural equation modeling. We collected our primary data (187 viable responses) via a questionnaire through convenience sampling. By testing all the data with AMOS, we received the following result: 5 hypotheses are accepted and 1 hypothesis is rejected. This result indicates that upwards counterfactual thinking (CFT) on forgone alternatives, a change in significance, and under consideration are positively related to impulse purchase regret; external stimuli and consumer susceptibility to interpersonal influence (CSII) have indirect influence on impulse purchase regret. By applying our theoretical background to analyze the result, we suggest that consumer’s rational buying thinking still plays an important role in post evaluation stage of impulse purchase, even though it disrupts the rational buying process in the beginning. Lastly, we believe that several parties could benefit from our research, they are marketing, academia as well as consumers.
|
58 |
Regret and Police Reporting Among Individuals Who Have Experienced Sexual AssaultMarchetti, Carol Anne January 2010 (has links)
Thesis advisor: Ann W. Burgess / Sexual assault (SA) is the most widely underreported violent crime in the United States. Reporting is significant because it is through this process that people access resources that can mitigate psychiatric and other health consequences of SA. The purpose of this study was to describe regret among individuals who have experienced SA regarding their decision of whether or not to report the assault to the police. The Ottawa Decision Support Framework underpins this study and posits that evaluation of regret, a powerful negative emotion, influences the decision-making process. The sample included 78 individuals, 18-25 years, who experienced SA during the past five years. Participants completed a 34-item, electronic questionnaire. A multiple regression model was generated to describe how selected independent variables explain variation in levels of regret. In the final model, the following, combined independent variables accounted for 33.3% (adjusted R2) of the variation in levels of regret: Weight change, the only variable associated with increased regret, was the most significant and accounted for the greatest amount of variance, followed by stranger assailant, seeking professional treatment, and reporting, which were associated with decreased regret. On average, people who chose to report their assault experienced less regret regarding their decision to do so as compared to people who did not report. This research fills a gap in the nursing, psychiatric, and victimology literature and improves clinical practice by describing post-decisional regret. The findings from this study provide a foundation for future research on the development of strategies (e.g., the development of decision-making tools) that nurses and other clinicians can use to assist people with their decision-making. Additionally, the findings can contribute to the development of a midrange, nursing theory of regret. / Thesis (PhD) — Boston College, 2010. / Submitted to: Boston College. Connell School of Nursing. / Discipline: Nursing.
|
59 |
Social Withdrawal and Psychological Well-Being in Later Life: Does Marital Status Matter?Serrao, Melanie Mei 01 April 2017 (has links)
Personality researchers have described dispositional traits to typically show stability over the life course and yet one such trait, shyness, has rarely been examined in later life. Shyness as a global trait has been linked negatively to multiple psychological indices of childhood well-being, including loneliness. Despite the fact that older adults may be already at risk for experiencing heightened loneliness, regret, or decreased fulfillment, research has not assessed these experiences in relation to personality in later life. In recent years, withdrawal research has begun to move past shyness as a global trait to examine the motivations behind socially withdrawn behavior. The current study used regression analyses to examine ways that three facets of withdrawal (shyness, avoidance, and unsociability) may relate to loneliness, regret, and fulfillment in later life. Data from 309 older participants of the Huntsman Senior Games were used to explore associations. Results indicated that shyness, avoidance, and unsociability significantly predicted increased loneliness and regret, and decreased fulfillment to some extent. Further, marital status (married, divorced, widowed) moderated links between withdrawal and psychological indices of later life well-being.
|
60 |
Using counterfactual regret minimization to create a competitive multiplayer poker agentAbou Risk, Nicholas 11 1900 (has links)
Games have been used to evaluate and advance techniques in the eld of Articial Intelligence since
before computers were invented. Many of these games have been deterministic perfect information
games (e.g. Chess and Checkers). A deterministic game has no chance element and in a perfect
information game, all information is visible to all players. However, many real-world scenarios
involving competing agents can be more accurately modeled as stochastic (non-deterministic), im-
perfect information games, and this dissertation investigates such games. Poker is one such game
played by millions of people around the world; it will be used as the testbed of the research presented
in this dissertation. For a specic set of games, two-player zero-sum perfect recall games, a recent
technique called Counterfactual Regret Minimization (CFR) computes strategies that are provably
convergent to an -Nash equilibrium. A Nash equilibrium strategy is very useful in two-player games
as it maximizes its utility against a worst-case opponent. However, once we move to multiplayer
games, we lose all theoretical guarantees for CFR. Furthermore, we have no theoretical guarantees
about the performance of a strategy from a multiplayer Nash equilibrium against two arbitrary op-
ponents. Despite the lack of theoretical guarantees, my thesis is that CFR-generated agents may
perform well in multiplayer games. I created several 3-player limit Texas Holdem Poker agents
and the results of the 2009 Computer Poker Competition demonstrate that these are the strongest
3-player computer Poker agents in the world. I also contend that a good strategy can be obtained by
grafting a set of two-player subgame strategies to a 3-player base strategy when one of the players
is eliminated.
|
Page generated in 0.0654 seconds