• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 2
  • Tagged with
  • 4
  • 4
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

The development of sex-congruent preference in infancy : a longitudinal study

Shirley, Louisa January 2000 (has links)
Gender schematic processing theory suggests that children will use gender knowledge about themselves and others to make 'like me' judgements about others. They will use the behaviour of 'like me' others to create appropriate 'own-sex' schemas which will guide their behaviour. The research presented here examines this main premise of gender schematic processing theory. Because gender schematic processing posits a unitary source for the development of sex-typed behaviour i.e. the development of gender cognitions, the trajectory of development is presumed to be the same for boys and girls. This assumption is also examined in this thesis. The sex-typed preference of sixty infants at 3, 9, and 18 months was studied using measures of duration of attention to simultaneously-presented male/female pictures of peers, toys, and play activities. Self-recognition (thought to be an early manifestation of self-concept) was measured by observing mirror behaviour (rouge test) and through monitoring the infants' preferential looking to their own image paired with that of a same-age, same-sex peer. The infants' gender labelling ability was assessed at eighteen months, and demographic information was collected at each session. The infants showed self-recognition on both measures at eighteen months, but their poor performance at the gender labelling task suggested that their formal understanding of gender identity had not yet developed. The infants as a group did not show sex-typed preferences for attending to peers, or play activities, although same-sex preference was found for male infants in both areas. Despite an apparent lack of gender- related cohnitions, there was a significant sex-congruent preference for toys when the group of infants was tested at eighteen months. The trajectory of development of this sex-typed behaviour was different for male and female infants suggesting that the gender schematic processing model is not adequate in its present form to predict the ontogeny of sex-typed behaviour.
2

Comparison of deep reinforcement learning algorithms in a self-play setting

Kumar, Sunil 30 August 2021 (has links)
In this exciting era of artificial intelligence and machine learning, the success of AlphaGo, AlphaZero, and MuZero has generated a great interest in deep reinforcement learning, especially under self-play settings. The methods used by AlphaZero are finding their ways to be more useful than before in many different application areas, such as clinical medicine, intelligent military command decision support systems, and recommendation systems. While specific methods of reinforcement learning with selfplay have found their place in application domains, there is much to be explored from existing reinforcement learning methods not originally intended for self-play settings. This thesis focuses on evaluating performance of existing reinforcement learning techniques in self-play settings. In this research, we trained and evaluated the performance of two deep reinforcement learning algorithms with self-play settings on game environments, such as the games Connect Four and Chess. We demonstrate how a simple on-policy, policy-based method, such as REINFORCE, shows signs of learning, whereas an off-policy value-based method such as Deep Q-Networks does not perform well with self-play settings in the selected environments. The results show that REINFORCE agent wins 85% of the games after training against a random baseline agent and 60% games against the greedy baseline agent in the game Connect Four. The agent’s strength from both techniques was measured and plotted against different baseline agents. We also investigate the impact of selected significant hyper-parameters in the performance of the agents. Finally, we provide our recommendation for these hyper-parameters’ values for training deep reinforcement learning agents in similar environments. / Graduate
3

AI for an Imperfect-Information Wargame with Self-Play Reinforcement Learning / AI med självspelande förstärkningsinlärning för ett krigsspel med imperfekt information

Ryblad, Filip January 2021 (has links)
The task of training AIs for imperfect-information games has long been difficult. However, recently the algorithm ReBeL, a general framework for self-play reinforcement learning, has been shown to excel at heads-up no-limit Texas hold 'em, among other imperfect-information games. In this report the ability to adapt ReBeL to a downscaled version of the strategy wargame \say{Game of the Generals} is explored. It is shown that an implementation of ReBeL that uses no domain-specific knowledge is able to beat all benchmark bots, which indicates that ReBeL can be a useful framework when training AIs for imperfect-information wargames. / Det har länge varit en utmaning att träna AI:n för spel med imperfekt information. Nyligen har dock algoritmen ReBeL, ett generellt ramverk för självspelande förstärkningsinlärning, visat lovande prestanda i heads-up no-limit Texas hold 'em och andra spel med imperfekt information. I denna rapport undersöks ReBeLs förmåga att anpassas till en nedskalad version av spelet \say{Game of the Generals}, vilket är ett strategiskt krigsspel. Det visas att en implementation av ReBeL som inte använder någon domänspecifik kunskap klarar av att besegra alla bottar som användes vid jämförelse, vilket indikerar att ReBeL kan vara ett användbart ramverk för att träna AI:n för krigsspel med imperfekt information.
4

Dynamic opponent modelling in two-player games

Mealing, Richard Andrew January 2015 (has links)
This thesis investigates decision-making in two-player imperfect information games against opponents whose actions can affect our rewards, and whose strategies may be based on memories of interaction, or may be changing, or both. The focus is on modelling these dynamic opponents, and using the models to learn high-reward strategies. The main contributions of this work are: 1. An approach to learn high-reward strategies in small simultaneous-move games against these opponents. This is done by using a model of the opponent learnt from sequence prediction, with (possibly discounted) rewards learnt from reinforcement learning, to lookahead using explicit tree search. Empirical results show that this gains higher average rewards per game than state-of-the-art reinforcement learning agents in three simultaneous-move games. They also show that several sequence prediction methods model these opponents effectively, supporting the idea of using them from areas such as data compression and string matching; 2. An online expectation-maximisation algorithm that infers an agent's hidden information based on its behaviour in imperfect information games; 3. An approach to learn high-reward strategies in medium-size sequential-move poker games against these opponents. This is done by using a model of the opponent learnt from sequence prediction, which needs its hidden information (inferred by the online expectation-maximisation algorithm), to train a state-of-the-art no-regret learning algorithm by simulating games between the algorithm and the model. Empirical results show that this improves the no-regret learning algorithm's rewards when playing against popular and state-of-the-art algorithms in two simplified poker games; 4. Demonstrating that several change detection methods can effectively model changing categorical distributions with experimental results comparing their accuracies to empirical distributions. These results also show that their models can be used to outperform state-of-the-art reinforcement learning agents in two simultaneous-move games. This supports the idea of modelling changing opponent strategies with change detection methods; 5. Experimental results for the self-play convergence to mixed strategy Nash equilibria of the empirical distributions of plays of sequence prediction and change detection methods. The results show that they converge faster, and in more cases for change detection, than fictitious play.

Page generated in 0.0379 seconds