• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 680
  • 81
  • 66
  • 22
  • 11
  • 8
  • 8
  • 7
  • 7
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • Tagged with
  • 1098
  • 1098
  • 272
  • 228
  • 208
  • 187
  • 167
  • 165
  • 158
  • 155
  • 152
  • 133
  • 128
  • 125
  • 118
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.


Sultanov, Hakim 01 January 2013 (has links)
Today, software has become deeply woven into the fabric of our lives. The quality of the software we depend on needs to be ensured at every phase of the Software Development Life Cycle (SDLC). An analyst uses the requirements engineering process to gather and analyze system requirements in the early stages of the SDLC. An undetected problem at the beginning of the project can carry all the way through to the deployed product. The Requirements Traceability Matrix (RTM) serves as a tool to demonstrate how requirements are addressed by the design and implementation elements throughout the entire software development lifecycle. Creating an RTM matrix by hand is an arduous task. Manual generation of an RTM can be an error prone process as well. As the size of the requirements and design document collection grows, it becomes more challenging to ensure proper coverage of the requirements by the design elements, i.e., assure that every requirement is addressed by at least one design element. The techniques used by the existing requirements tracing tools take into account only the content of the documents to establish possible links. We expect that if we take into account the relative order of the text around the common terms within the inspected documents, we may discover candidate links with a higher accuracy. The aim of this research is to demonstrate how we can apply machine learning algorithms to software requirements engineering problems. This work addresses the problem of requirements tracing by viewing it in light of the Ant Colony Optimization (ACO) algorithm and a reinforcement learning algorithm. By treating the documents as the starting (nest) and ending points (sugar piles) of a path and the terms used in the documents as connecting nodes, a possible link can be established and strengthened by attracting more agents (ants) onto a path between the two documents by using pheromone deposits. The results of the work show that ACO and RL can successfully establish links between two sets of documents.

Q-Learning: Ett sätt att lära agenter att spela fotboll / Q-Learning: A way to tach agents to play football

Ekelund, Kalle January 2013 (has links)
Den artificiella intelligensen i spel brukar ofta använda sig utav regelbaserade tekniker för dess beteende. Detta har gjort att de artificiella agenterna blivit förutsägbara, vilket är väldigt tydligt för sportspel. Det här arbetet har utvärderat ifall inlärningstekniken Q-learning är bättre på att spela fotboll än en regelbaserade tekniken tillståndsmaskin. För att utvärdera detta har en förenklad fotbollssimulering skapats. Där de båda lagen har använts sig av varsin teknik. De båda lagen har sedan spelat 100 matcher mot varandra för att se vilket lag/teknik som är bäst. Statistik ifrån matcherna har använts som undersökningsresultat. Resultatet visar att Q-learning är en bättre teknik då den vinner flest match och skapar flest chanser under matcherna. Diskussionen efteråt handlar om hur användbart Q-learning är i ett spelsammanhang.

Models and metaphors in neuroscience : the role of dopamine in reinforcement learning as a case study

Kyle, Robert January 2012 (has links)
Neuroscience makes use of many metaphors in its attempt to explain the relationship between our brain and our behaviour. In this thesis I contrast the most commonly used metaphor - that of computation driven by neuron action potentials - with an alternative view which seeks to understand the brain in terms of an agent learning from the reward signalled by neuromodulators. To explore this reinforcement learning model I construct computational models to assess one of its key claims — that the neurotransmitter dopamine signals unexpected reward, and that this signal is used by the brain to learn control of our movements and drive goal-directed behaviour. In this thesis I develop a selection of computational models that are motivated by either theoretical concepts or experimental data relating to the effects of dopamine. The first model implements a published dopamine-modulated spike timing-dependent plasticity mechanism but is unable to correctly solve the distal reward problem. I analyse why this model fails and suggest solutions. The second model, more closely linked to the empirical data attempts to investigate the relative contributions of firing rate and synaptic conductances to synaptic plasticity. I use experimental data to estimate how model neurons will be affected by dopamine modulation, and use the resulting computational model to predict the effect of dopamine on synaptic plasticity. The results suggest that dopamine modulation of synaptic conductances is more significant than modulation of excitability. The third model demonstrates how simple assumptions about the anatomy of the basal ganglia, and the electrophysiological effects of dopamine modulation can lead to reinforcement learning like behaviour. The model makes the novel prediction that working memory is an emergent feature of a reinforcement learning process. In the course of producing these models I find that both theoretically and empirically based models suffer from methodological problems that make it difficult to adequately support such fundamental claims as the reinforcement learning hypothesis. The conclusion that I draw from the modelling work is that it is neither possible, nor desirable to falsify the theoretical models used in neuroscience. Instead I argue that models and metaphors can be valued by how useful they are, independently of their truth. As a result I suggest that we ought to encourage a plurality of models and metaphors in neuroscience. In Chapter 7 I attempt to put this into practice by reviewing the other transmitter systems that modulate dopamine release, and use this as a basis for exploring the context of dopamine modulation and reward-driven behaviour. I draw on evidence to suggest that dopamine modulation can be seen as part of an extended stress response, and that the function of dopamine is to encourage the individual to engage in behaviours that take it away from homeostasis. I also propose that the function of dopamine can be interpreted in terms of behaviourally defining self and non-self, much in the same way as inflammation and antibody responses are said to do in immunology.

A Coordinated Reinforcement Learning Framework for Multi-Agent Virtual Environments

Sause, William 01 January 2013 (has links)
The growing popularity of online virtual communities such as Second Life and ActiveWorlds demands the presence of intelligent agents to assist users in their daily online activities (e.g., exploring, shopping, and socializing). As these virtual environments become more crowded, multiple agents are needed to support the increasing number of users. Multi-agent environments, however, can suffer from the problem of resource competition among agents. It is therefore necessary that agents within multi-agent environments include a coordination mechanism to prevent unrealistic behaviors. Moreover, it is essential that these agents exhibit some form of intelligence, or the ability to learn, to support realism as well as to eliminate the need for developers to write separate scripts for each task the agents are required to perform. This research presents a coordinated reinforcement learning framework which can be used to develop task-oriented intelligent agents in multi-agent virtual environments. The framework contains a combination of a "next available agent" coordination model and a reinforcement learning model consisting of existing temporal difference reinforcement learning algorithms. Furthermore, the framework supports evaluations of reinforcement learning algorithms to determine which methods are best suited for task-oriented intelligent agents in dynamic, multi-agent virtual environments. To assess the effectiveness of the temporal difference reinforcement algorithms used in this study (Q-learning and Sarsa), experiments were conducted that measured an agent's ability to learn three tasks commonly performed by workers in a café environment. These tasks were basic sandwich making (BSM), complex sandwich making (CSM), and dynamic sandwich making (DSM). The BSM task consisted of four steps. The CSM and DSM tasks contained an additional fifth step. The agent learned the BSM and CSM tasks from scratch while the DSM task was learned after the agent became skillful in BSM. The measurements used to evaluate the efficiency of the Q-learning and Sarsa algorithms were the percentage of successful and optimally successful episodes performed by the agent and the average number of time steps taken by the agent to complete a successful episode. The experiments were run using both a fixed (FEP) and variable (VEP) ε-greedy probability rate. Results showed that the Sarsa reinforcement learning algorithm, on average, outperformed the Q-learning algorithm in almost all experiments except when measuring the percentage of successfully completed episodes using FEP for CSM and DSM, in which Sarsa performed almost equally as well as Q-learning. Overall, experiments utilizing VEP resulted in higher percentages of successes and optimal successes, and showed convergence to the optimal policy when measuring the average number of time steps per successful episode.

Dynamic movement primitives andreinforcement learning for adapting alearned skill

Lundell, Jens January 2016 (has links)
Traditionally robots have been preprogrammed to execute specific tasks. Thisapproach works well in industrial settings where robots have to execute highlyaccurate movements, such as when welding. However, preprogramming a robot isalso expensive, error prone and time consuming due to the fact that every featuresof the task has to be considered. In some cases, where a robot has to executecomplex tasks such as playing the ball-in-a-cup game, preprogramming it mighteven be impossible due to unknown features of the task. With all this in mind,this thesis examines the possibility of combining a modern learning framework,known as Learning from Demonstrations (LfD), to first teach a robot how toplay the ball-in-a-cup game by demonstrating the movement for the robot, andthen have the robot to improve this skill by itself with subsequent ReinforcementLearning (RL). The skill the robot has to learn is demonstrated with kinestheticteaching, modelled as a dynamic movement primitive, and subsequently improvedwith the RL algorithm Policy Learning by Weighted Exploration with the Returns.Experiments performed on the industrial robot KUKA LWR4+ showed that robotsare capable of successfully learning a complex skill such as playing the ball-in-a-cupgame. / Traditionellt sett har robotar blivit förprogrammerade för att utföra specifika uppgifter.Detta tillvägagångssätt fungerar bra i industriella miljöer var robotar måsteutföra mycket noggranna rörelser, som att svetsa. Förprogrammering av robotar ärdock dyrt, felbenäget och tidskrävande eftersom varje aspekt av uppgiften måstebeaktas. Dessa nackdelar kan till och med göra det omöjligt att förprogrammeraen robot att utföra komplexa uppgifter som att spela bollen-i-koppen spelet. Medallt detta i åtanke undersöker den här avhandlingen möjligheten att kombinera ettmodernt ramverktyg, kallat inläraning av demonstrationer, för att lära en robothur bollen-i-koppen-spelet ska spelas genom att demonstrera uppgiften för denoch sedan ha roboten att själv förbättra sin inlärda uppgift genom att användaförstärkande inlärning. Uppgiften som roboten måste lära sig är demonstreradmed kinestetisk undervisning, modellerad som dynamiska rörelseprimitiver, ochsenare förbättrad med den förstärkande inlärningsalgoritmen Policy Learning byWeighted Exploration with the Returns. Experiment utförda på den industriellaKUKA LWR4+ roboten visade att robotar är kapabla att framgångsrikt lära sigspela bollen-i-koppen spelet

A Forex Trading System Using Evolutionary Reinforcement Learning

Song, Yupu 01 May 2017 (has links)
Building automated trading systems has long been one of the most cutting-edge and exciting fields in the financial industry. In this research project, we built a trading system based on machine learning methods. We used the Recurrent Reinforcement Learning (RRL) algorithm as our fundamental algorithm, and by introducing Genetic Algorithms (GA) in the optimization procedure, we tackled the problems of picking good initial values of parameters and dynamically updating the learning speed in the original RRL algorithm. We call this optimization algorithm the Evolutionary Recurrent Reinforcement Learning algorithm (ERRL), or the GA-RRL algorithm. ERRL allows us to find many local optimal solutions easier and faster than the original RRL algorithm. Finally, we implemented the GA-RRL system on EUR/USD at a 5-minute level, and the backtest performance showed that our GA-RRL system has potentially promising profitability. In future research we plan to introduce some risk control mechanism, implement the system on different markets and assets, and perform backtest at higher frequency level.

From Model-Based to Data-Driven Discrete-Time Iterative Learning Control

Song, Bing January 2019 (has links)
This dissertation presents a series of new results of iterative learning control (ILC) that progresses from model-based ILC algorithms to data-driven ILC algorithms. ILC is a type of trial-and-error algorithm to learn by repetitions in practice to follow a pre-defined finite-time maneuver with high tracking accuracy. Mathematically ILC constructs a contraction mapping between the tracking errors of successive iterations, and aims to converge to a tracking accuracy approaching the reproducibility level of the hardware. It produces feedforward commands based on measurements from previous iterations to eliminates tracking errors from the bandwidth limitation of these feedback controllers, transient responses, model inaccuracies, unknown repeating disturbance, etc. Generally, ILC uses an a priori model to form the contraction mapping that guarantees monotonic decay of the tracking error. However, un-modeled high frequency dynamics may destabilize the control system. The existing infinite impulse response filtering techniques to stop the learning at such frequencies, have initial condition issues that can cause an otherwise stable ILC law to become unstable. A circulant form of zero-phase filtering for finite-time trajectories is proposed here to avoid such issues. This work addresses the problem of possible lack of stability robustness when ILC uses an imperfect a prior model. Besides the computation of feedforward commands, measurements from previous iterations can also be used to update the dynamic model. In other words, as the learning progresses, an iterative data-driven model development is made. This leads to adaptive ILC methods. An indirect adaptive linear ILC method to speed up the desired maneuver is presented here. The updates of the system model are realized by embedding an observer in ILC to estimate the system Markov parameters. This method can be used to increase the productivity or to produce high tracking accuracy when the desired trajectory is too fast for feedback control to be effective. When it comes to nonlinear ILC, data is used to update a progression of models along a homotopy, i.e., the ILC method presented in this thesis uses data to repeatedly create bilinear models in a homotopy approaching the desired trajectory. The improvement here makes use of Carleman bilinearized models to capture more nonlinear dynamics, with the potential for faster convergence when compared to existing methods based on linearized models. The last work presented here finally uses model-free reinforcement learning (RL) to eliminate the need for an a priori model. It is analogous to direct adaptive control using data to directly produce the gains in the ILC law without use of a model. An off-policy RL method is first developed by extending a model-free model predictive control method and then applied in the trial domain for ILC. Adjustments of the ILC learning law and the RL recursion equation for state-value function updates allow the collection of enough data while improving the tracking accuracy without much safety concerns. This algorithm can be seen as the first step to bridge ILC and RL aiming to address nonlinear systems.

Aplicação da rede GTSOM para navegação de robôs móveis utilizando aprendizado por reforço / Using the GTSOM network for mobile robot navigation with reinforcement learning

Menegaz, Mauricio January 2009 (has links)
Neste trabalho será descrita uma arquitetura de agente robótico autônomo projetada para ser capaz de criar uma representação de estado do ambiente e de realizar o aprendizado de tarefas simples em cima desta representação. A rede GTSOM (BASTOS, 2007) foi selecionada como método para classificação de estados. Sua tarefa é transformar os dados multidimensionais e contínuos lidos dos sensores em uma representação discreta, permitindo o uso de aprendizado por reforço convencional. Algumas modificações no algoritmo da rede foram necessárias para que pudesse ser aplicada neste contexto. Juntamente com esta rede, foi utilizado um mapa de grade que permite associar as experiências sensoriais com sua localização espacial. Enquanto a rede GTSOM é o ponto central de um sistema de classificação de estados, o algoritmo Q-Learning de aprendizado por reforço foi utilizado para a realização da tarefa. Utilizando a representação compacta de estado criada pela rede auto-organizável, o agente aprende as ações que devem ser executadas em cada ponto, para atingimento de seus objetivos. O modelo foi testado com um experimento que consiste em encontrar um objeto em um labirinto. Os resultados obtidos nos testes mostraram que o modelo consegue segmentar adequadamente o espaço de estados, e realiza o aprendizado da tarefa. O agente consegue aprender a evitar colisões e memorizar a localização do alvo, podendo chegar até ele independentemente de sua posição inicial. Além disso, é capaz de expandir sua representação sempre que se depara com situações não conhecidas, ao mesmo tempo que gradualmente remove da memória estados associados a experiências que não se repetem. / This work describes an architecture for an autonomous robotic agent that is capable of creating a state representation of its environment and learning how to execute simple tasks using this representation. The GTSOM Neural Network was chosen as the method for state clustering. It is used to transform the multidimensional and continuous state signal into a discrete representation, allowing the use of conventional reinforcement learning techniques. Some modifications on the algorithm were necessary so that it could be used in this project. This network is used together with a grid map algorithm that allows the model to associate the sensor readings with the places where they ocurred. While the GTSOM network is the main component of a state clustering system, the Q-Learning reinforcement learning method was chosen for the task execution. Using the compact state representation created by the self-organizing network, the agent learns which actions to execute at each state in order to achieve its objectives. The model was tested in an experiment that consists in finding the path in a maze. The results show that it can divide the state space in an useful way, and is capable of executing the task. It learns to avoid collisions and remembers the location of the target, even when the robot’s initial position is changed. Furthermore, the representation is expanded when the agent faces an unknown situation, and at the same time, states associated with old experiences are forgotten.

Dynamic generalisation of continuous action spaces in reinforcement learning : a neurally inspired approach

Smith, Andrew James January 2002 (has links)
This thesis is about the dynamic generalisation of continuous action spaces in reinforcement learning problems. The standard Reinforcement Learning (RL) account provides a principled and comprehensive means of optimising a scalar reward signal in a Markov Decision Process. However, the theory itself does not directly address the imperative issue of generalisation which naturally arises as a consequence of large or continuous state and action spaces. A current thrust of research is aimed at fusing the generalisation capabilities of supervised (and unsupervised) learning techniques with the RL theory. An example par excellence is Tesauro’s TD-Gammon. Although much effort has gone into researching ways to represent and generalise over the input space, much less attention has been paid to the action space. This thesis first considers the motivation for learning real-valued actions, and then proposes a set of key properties desirable in any candidate algorithm addressing generalisation of both input and action spaces. These properties include: Provision of adaptive and online generalisation, adherence to the standard theory with a central focus on estimating expected reward, provision for real-valued states and actions, and full support for a real-valued discounted reward signal. Of particular interest are issues pertaining to robustness in non-stationary environments, scalability, and efficiency for real-time learning in applications such as robotics. Since exploring the action space is discovered to be a potentially costly process, the system should also be flexible enough to enable maximum reuse of learned actions. A new approach is proposed which succeeds for the first time in addressing all of the key issues identified. The algorithm, which is based on the ubiquitous self-organising map, is analysed and compared with other techniques including those based on the backpropagation algorithm. The investigation uncovers some important implications of the differences between these two particular approaches with respect to RL. In particular, the distributed representation of the multi-layer perceptron is judged to be something of a double-edged sword offering more sophisticated and more scalable generalising power, but potentially causing problems in dynamic or non-equiprobable environments, and tasks involving a highly varying input-output mapping. The thesis concludes that the self-organising map can be used in conjunction with current RL theory to provide real-time dynamic representation and generalisation of continuous action spaces. The proposed model is shown to be reliable in non-stationary, unpredictable and noisy environments and judged to be unique in addressing and satisfying a number of desirable properties identified as important to a large class of RL problems.

Using Dialogue Acts in dialogue strategy learning : optimising repair strategies

Frampton, Matthew January 2008 (has links)
A Spoken Dialogue System's (SDS's) dialogue strategy specifies which action it will take depending on its representation of the current dialogue context. Designing it by hand involves anticipating how users will interact with the system, and/or repeated testing and refining, and so can be a difficult, time-consuming task. Since SDSs inevitably make understanding errors, a particularly important issue is how to design ``repair strategies'', the parts of the dialogue strategy which attempt to get the dialogue ``back-on-track'' following these errors. To try to produce better dialogue strategies with less time and effort, previous researchers have modelled a dialogue strategy as a sequential decision problem called a Markov Decision Process (MDP), and then applied Reinforcement Learning (RL) algorithms to example training dialogues to generate dialogue strategies automatically. More recent research has used training dialogues conducted with simulated rather than real users and learned which action to take in all dialogue contexts, (a ``full'' as opposed to a ``partial'' dialogue strategy) - simulated users allow more training dialogues to be generated, and the exploration of new dialogue contexts not present in an original dataset. As yet however, limited insight has been provided as to which dialogue contextual features are important to include in the MDP and why. Indeed, a full dialogue strategy has not been learned from training dialogues with a realistic probabilistic user simulation derived from real user data, and then shown to work well with real users. This thesis investigates the value of adding new linguistically-motivated contextual features to the MDP when using RL to learn full dialogue strategies for SDSs. These new features are recent Dialogue Acts (DAs). DAs indicate the role or intention of an utterance in a dialogue e.g. ``provide-information'', an utterance being a complete unit of a speaker's speech, often bounded by silence. An accurate probabilistic user simulation learned from real user data is used for generating training dialogues, and the recent DAs are shown to improve performance in testing in simulation and with real users. With real users, performance is also better than other competing learned and hand-crafted strategies. Analysis of the strategies, and further simulation experiments show how the DAs improve performance through better repair strategies. The main findings are expected to apply to SDSs in general - indeed our strategies are learned and tested on real users in different domains, (flight-booking versus tourist information). Comparisons are also made to recent research which focuses on handling understanding errors in SDSs, but which does not use RL or user simulations.

Page generated in 0.1175 seconds