• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 4
  • Tagged with
  • 4
  • 4
  • 4
  • 3
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Recommender System using Reinforcement Learning

January 2020 (has links)
abstract: Currently, recommender systems are used extensively to find the right audience with the "right" content over various platforms. Recommendations generated by these systems aim to offer relevant items to users. Different approaches have been suggested to solve this problem mainly by using the rating history of the user or by identifying the preferences of similar users. Most of the existing recommendation systems are formulated in an identical fashion, where a model is trained to capture the underlying preferences of users over different kinds of items. Once it is deployed, the model suggests personalized recommendations precisely, and it is assumed that the preferences of users are perfectly reflected by the historical data. However, such user data might be limited in practice, and the characteristics of users may constantly evolve during their intensive interaction between recommendation systems. Moreover, most of these recommender systems suffer from the cold-start problems where insufficient data for new users or products results in reduced overall recommendation output. In the current study, we have built a recommender system to recommend movies to users. Biclustering algorithm is used to cluster the users and movies simultaneously at the beginning to generate explainable recommendations, and these biclusters are used to form a gridworld where Q-Learning is used to learn the policy to traverse through the grid. The reward function uses the Jaccard Index, which is a measure of common users between two biclusters. Demographic details of new users are used to generate recommendations that solve the cold-start problem too. Lastly, the implemented algorithm is examined with a real-world dataset against the widely used recommendation algorithm and the performance for the cold-start cases. / Dissertation/Thesis / Masters Thesis Computer Science 2020
2

Collision Avoidance for Virtual Crowds Using Reinforcement Learning / Kollisionsundvikande för virtuella folkmassor som använder förstärkningslärande

Dönmez, Halit Anil January 2017 (has links)
Virtual crowd simulation is being used in a wide variety of applications such as video games, architectural designs and movies. It is important for creators to have a realistic crowd simulator that will be able to generate crowds that displays the behaviours needed. It is important to provide an easy to use tool for crowd generation which is fast and realistic. Reinforcement Learning was proposed for training an agent to display a certain behaviour. In this thesis, a Reinforcement Learning approach was implemented and the generated virtual crowds were evaluated. Q Learning method was selected as the Reinforcement Learning method. Two different versions of the Q Learning method was implemented. These different versions were evaluated with respect to state-of-the-art algorithms: Reciprocal Velocity Obstacles(RVO) and a copy-synthesis approach based on real-data. Evaluation of the crowds was done with a user study. Results from the user study showed that while Reinforcement Learning method is not perceived as real as the real crowds, it was perceived almost as realistic as the crowds generated with RVO. Another result was that, the perception of RVO changes with the changing environment. When only the paths were shown, RVO was perceived as being more natural than when the paths were shown in a setting in real world with pedestrians. It was concluded that using Q Learning for generating virtual crowds is a promising method and can be improved as a substitute for existing methods and in certain scenarios, Q Learning algorithm results with better collision avoidance and more realistic crowd simulation. / Virtuell folkmassimulering används i ett brett utbud av applikationersom videospel, arkitektoniska mönster och filmer. Det är viktigt förskaparna att ha en realistisk publik simulator som kommer att kunnagenerera publiken som behövs för att visa de beteenden som behövs. Det är viktigt att tillhandahålla ett lättanvänt verktyg för publikgenereringsom är snabb och realistisk. Förstärkt lärande föreslogs föratt utbilda en agent för att visa ett visst beteende. I denna avhandlingimplementerades en förstärkningslärande metod för att utvärderavirtuella folkmassor. Q Lärandemetod valdes som förstärkningslärningsmetod.Två olika versioner av Q-inlärningsmetoden genomfördes. Dessa olika versioner utvärderades med avseende på toppmodernaalgoritmer: Gensamma hastighetshinder och ett kopieringssyntestillvägagångssättbaserat på realtid. Utvärderingen av publiken gjordesmed en användarstudie. Resultaten från användarstudien visadeatt medan Reinforcement Learning-metoden inte uppfattas som verkligsom den verkliga publiken, uppfattades det nästan lika realistisktsom massorna genererade med Reciprocal Velocity Objects. Ett annatresultat var att uppfattningen av RVO förändras med den föränderligamiljön. När bara stigarna visades upplevdes det mer naturligt än närdet visades i en miljö i riktiga värld med fotgängare. Det drogs slutsatsenatt att använda Q Learning för att generera folkmassor är enlovande metod och kan förbättras som ett ersättare för befintliga metoderoch i vissa scenarier resulterar Q Learning algoritm med bättrekollisionsundvikande och mer realistisk publik simulering.
3

Model-Free Reinforcement Learning for Hierarchical OO-MDPs

Goldblatt, John Dallan 23 May 2022 (has links)
No description available.
4

Deep Reinforcement Learning for the Popular Game tag

Söderlund, August, von Knorring, Gustav January 2021 (has links)
Reinforcement learning can be compared to howhumans learn – by interaction, which is the fundamental conceptof this project. This paper aims to compare three differentlearning methods by creating two adversarial reinforcementlearning models and simulate them in the game tag. The threefundamental learning methods are ordinary Q-learning, Deep Qlearning(DQN), and Double Deep Q-learning (DDQN).The models for ordinary Q-learning are built using a table andthe models for both DQN and DDQN are constructed by using aPython module called TensorFlow. The environment is composedof a bounded square with two obstacles and two agents withadversarial objectives. The rewards are given primarily basedon the distance between the agents.By comparing the trained models it was established that onlyDDQN could solve the task well and generalize, whilst both theQ-model and DQN had more serious flaws. A comparison ofthe DDQN model against its average reward trends establishedthat the model still improved regardless of the constant averagereward.Conclusively, DDQN is the appropriate choice for this adversarialproblem whilst Q-learning and DQN should be avoided.Finally, a constant average reward can be caused by bothagents improving at a similar rate rather than a stagnation inperformance. / Förstärkande inlärning kan jämföras medsättet vi människor lär oss, genom interaktion, vilket är denfundamentala idéen med detta projekt. Syftet med denna rapportär att jämföra tre olika inlärningsmetoder genom att skapatvå förstärkande motståndarinlärningsagenter och simulera demi spelet kull. De tre fundamentala inlärningsmetoderna är Qlearning,Deep Q-learning (DQN) och Double Deep Q-learning(DDQN).Modellerna för vanlig Q-learning är konstruerade med hjälpav en tabell och modellerna för både DQN och DDQN är byggdamed en Python modul, TensorFlow. Miljön är uppbyggd av enbegränsad kvadrat med två hinder och två agenter med motsattamål. Belöningarna ges baserat på avståndet mellan agenterna.En jämförelse mellan de tränade modelerna visade på attenbart DDQN kunde spela bra och generalisera sig, medan bådeQ-modellen och DQN-modellen hade mer allvarliga problem.Genom en jämförelse för DDQN-modellerna och deras genomsnittligabelöning visade det sig att DDQN-modellen fortfarandeförbättrade sig, oavsett det konstanta genomsnittet.Sammanfattningsvis, DDQN är det bäst lämpade valet fördenna motpart simulering medan vanlig Q-learning och DQNborde undvikas. Slutligen, ett konstant belöningsgenomsnitt orsakasav att agenterna förbättras i samma takt snarare än attde stagnerar i prestanda. / Kandidatexjobb i elektroteknik 2021, KTH, Stockholm

Page generated in 0.072 seconds