Global ETD Search

1	Model-based Reinforcement Learning for Protein Backbone Design / Modellbaserad förstärkningsinlärning för design av proteinbackbones Renard, Frédéric January 2024 (has links) The application of Reinforcement Learning (RL) in the field of protein design presents a novel approach to generating protein backbones that fit within an icosahedral structure, while also optimizing five critical structural scores of proteins. Central to this approach are developed two distinct Markov Decision Processes (MDPs), each employing a unique reward mechanism: one operates on a system of thresholds, while the other utilizes sigmoid functions. The study conducts a thorough comparison of these reward mechanisms to determine their effectiveness in guiding RL algorithms for protein design. The findings indicate that the threshold-based reward system outperforms the sigmoid-based system, leading to more desirable protein structures as per the defined scores. Significantly, the thesis examines the performance of AlphaZero agents in this context, comparing them to baseline Monte Carlo Tree Search agents. Results demonstrate that AlphaZero agents consistently achieve at least double the performance of Monte-Carlo Tree Search (MCTS) agents, showcasing their superior capability in top-down protein design task. Moreover, the research extends to evaluate AlphaZero agents optimized for side objectives, revealing promising outcomes. This exploration into multi-objective optimization using AlphaZero agents highlights their potential in more complex and nuanced aspects of protein engineering. This work not only underscores the effectiveness of RL in protein backbone generation but also opens up new possibilities for advanced RL applications in protein engineering, particularly in multi-faceted optimization scenarios. / Tillämpningen av RL inom proteindesignområdet presenterar en ny metod för att generera proteinbackbones som passar in i en ikosaederstruktur, samtidigt som man optimerar fem kritiska strukturella poäng för proteiner. Centralt för denna metod är utvecklingen av två distinkta MDPs, som var och en använder en unik belöningsmekanism: en fungerar på ett system av tröskelvärden, medan den andra använder sigmoidfunktioner. I studien görs en grundlig jämförelse av dessa belöningsmekanismer för att fastställa deras effektivitet när det gäller att vägleda RL-algoritmer för proteindesign. Resultaten visar att det tröskelbaserade belöningssystemet överträffar det sigmoidbaserade systemet, vilket leder till mer önskvärda proteinstrukturer enligt de definierade poängen. Avhandlingen undersöker AlphaZero-agenternas prestanda i detta sammanhang och jämför dem med grundläggande Monte Carlo Tree Search-agenter. Resultaten visar att AlphaZero-agenter konsekvent uppnår minst dubbelt så hög prestanda som MCTS-agenter, vilket visar deras överlägsna förmåga i top-down-proteindesignuppgiften. Dessutom utvidgas forskningen till att utvärdera AlphaZero-agenter optimerade för sidomål, vilket avslöjar lovande resultat. Denna utforskning av flermålsoptimering med hjälp av AlphaZero-agenter belyser deras potential i mer komplexa och nyanserade aspekter av proteinteknik. Detta arbete understryker inte bara effektiviteten hos RL vid generering av proteinbackbones, utan öppnar också upp för nya möjligheter att utveckla Reinforcement Learning Protein AlphaZero MCTS Biology Förstärkningsinlärning Protein AlphaZero MCTS Biologi Computer and Information Sciences Data- och informationsvetenskap
2	Exploring the effects of state-action space complexity on training time for AlphaZero agents / Undersökning av påverkan av spelkomplexitet på träningstiden för AlphaZero-agenter Glimmerfors, Tobias January 2022 (has links) DeepMind’s development of AlphaGo took the world by storm in 2016 when it became the first computer program to defeat a world champion at the game of Go. Through further development, DeepMind showed that the underlying algorithm could be made more general, and applied to a large set of problems. This thesis will focus on the AlphaZero algorithm and what parameters affect the rate at which an agent is able to learn through self-play. We investigated the effect that the neural network size has on the agent’s learning as well as how the environment complexity affects the agent’s learning. We used Connect4 as the environment for our agents, and by varying the width of the board we were able to simulate environments with different complexities. For each board width, we trained an AlphaZero agent and tracked the rate at which it improved. While we were unable to find a clear correlation between the complexity of the environment and the rate at which the agent improves, we found that a larger neural network both improved the final performance of the agent as well as the rate at which it learns. Along with this, we also studied what impact the number of MonteCarlo tree search iterations have on an already trained AlphaZero agent. Unsurprisingly, we found that a higher number of iterations led to an improved performance. However, the difference between using only the priors of the neural network and a series of Monte-Carlo tree search iterations is not very large. This suggest that using solely the priors can sometimes be useful if inferences need to made quickly. / DeepMinds utveckling av AlphaGo blev ett stort framsteg året 2016 då det blev första datorprogrammet att besegra världsmästaren i Go. Med utvecklingen av AlphaZero visade DeepMind att en mer generell algoritm kunde användas för att lösa en större mängd problem. Den här rapporten kommer att fokusera på AlphaZero-algoritmen och hur olika parametrar påverkar träningen. Vi undersökte påverkan av neuronnätets storlek och spelkomplexiteten på agentens förmåga att förbättra sig. Med hjälp av 4 i rad som testningsmiljö för våra agenter, och genom att ändra på bredden på spelbrädet kunde vi simulera olika komplexa spel. För varje bredd som vi testade, tränade vi en AlphaZero-agent och mätte dens förbättring. Vi kunde inte hitta någon tydlig korrelation mellan spelets komplexitet och agentens förmåga att lära sig. Däremot visade vi att ett större neuronnät leder till att agenten förbättrar sig mer, och dessutom lär sig snabbare. Vi studerade även påverkan av att variera antalet trädsökningar för en färdigtränad agent. Våra experiment visar på att det finns en korrelation mellan agentens spelstyrka och antalet trädsökningar, där fler trädsökningar innebär en förbättrad förmåga att spela spelet. Skillnaden som antalet trädsökningar gör visade sig däremot inte vara så stor som förväntad. Detta visar på att man kan spara tid under inferensfasen genom att sänka antalet trädsökningar, med en minimal bestraffning i prestanda. Deep learning Reinforcement learning AlphaZero Monte-Carlo tree search Environment complexity Djupinlärning Förstärkande inlärning AlphaZero Monte-Carlo tree search spelkomplexitet Computer and Information Sciences Data- och informationsvetenskap
3	Zero-Knowledge Agent Trained for the Game of Risk Bethdavid, Simon January 2020 (has links) Recent developments in deep reinforcement learning applied to abstract strategy games such as Go, chess and Hex have sparked an interest within military planning. This Master thesis explores if it is possible to implement an algorithm similar to Expert Iteration and AlphaZero to wargames. The studied wargame is Risk, which is a turn-based multiplayer game played on a simplified political map of the world. The algorithms consist of an expert, in the form of a Monte Carlo tree search algorithm, and an apprentice, implemented through a neural network. The neural network is trained by imitation learning, trained to mimic expert decisions generated from self-play reinforcement learning. The apprentice is then used as heuristics in forthcoming tree searches. The results demonstrated that a Monte Carlo tree search algorithm could, to some degree, be employed on a strategy game as Risk, dominating a random playing agent. The neural network, fed with a state representation in the form of a vector, had difficulty in learning expert decisions and could not beat a random playing agent. This led to a halt in the expert/apprentice learning process. However, possible solutions are provided as future work. Deep Reinforcement Learning Zero-Knowledge Agent AlphaZero Expert Iteration Risk Engineering and Technology Teknik och teknologier
4	Using search based methods for beamforming Bergman Karlsson, Adam January 2024 (has links) In accommodating the growing global demand for wireless, Multi-User Multiple-Input and Multiple-Output (MU-MIMO) systems have been identified as the key technology. In such systems, a transmitting basestation serves several users simultaneously, increasing the network capacity. However, sharing the same time-frequency physical resources can cause interference for the simultaneously scheduled users if not moderated properly. One way to mitigate this interference is by directing radio power through the radio channel in specific directions, a method which is called beamforming. Following the successful implementation of the AlphaZero algorithm in another radio resource management technique, scheduling, this thesis explores the potential of using a similar search-based method for the beamforming problem, striving towards the ultimate objective of making decisions for scheduling and beamforming jointly. However, as AlphaZero only supports discrete action spaces and the action space of the beamforming problem is continuous, a modification of the algorithm is required. The proposed course of action is to extend AlphaZero into Sampled AlphaZero, using sample-based policy improvement to create an algorithm that is both more scalable for large discrete action spaces and able to handle high dimensional continuous action spaces. To evaluate the performance of the models, test environments were simulated and solved using increasingly larger so-called codebooks, containing predefined beamforming solutions. The results of the Sampled AlphaZero model demonstrated promising performance even for very large codebook sizes, indicating the model's suitability for addressing the beamforming problem in a non-codebook-based context. Furthermore, this thesis explores how states in the search can be represented and preprocessed for the neural network to learn efficiently, demonstrating clear benefits of using a singular value decomposition-based state preprocessing over raw states as input to the neural network. Beamforming Artificial Intelligence AlphaZero Radio Resource Management Monte Carlo Tree Search Information Systems

1

Page generated in 0.0211 seconds