The field of Learning Automata (LA) has been studied and analyzed extensively for more than four decades; however, almost all the papers have concentrated on the LA working in Environments that have a finite number of actions. This is a well-established model of computation, and expedient, epsilon-optimal and absolutely expedient machines have been designed for stationary and non-stationary Environments. There are only a few papers which deal with Environments possessing an infinite number of actions. These papers assume a well-defined and rather simple uni-modal functional form, like the Gaussian function, for the Environment's infinite reward probabilities.
This thesis pioneers the concept and presents a series of continuous action LA (CALA) algorithms that do not require the function of the Environment's infinite reward probabilities to obey a well-established uni-modal functional form. Instead, this function can be, but not limited to, a multi-modal function as long as it satisfies some weak constraints. Moreover, as our discussion evolves, the constraints are further relaxed. In all these cases, we demonstrate that the underlying machines converge in an epsilon-optimal manner to the optimal action of an infinite action set. Based on the CALA algorithms proposed, we report a global maximum search algorithm, which can find the maximum points of a real-valued function by sampling the function's values that could be contaminated by noise.
This thesis also investigates the performance limit of the action-taking scheme, sampling actions based on probability density functions, which is used by all currently available CALA algorithms. In more details, given a reward function, we define an index of the function which is the least upper bound of the performance that a CALA algorithm can possibly achieve. Besides, we also report a CALA algorithm that meets this upper bound in an epsilon-optimal manner.
By investigating the problem from a different perspective, we argue that the algorithms proposed are closely related to the family of “Stochastic Point Location” problems involving either discretized steps or d-ary parallel machines. The thesis includes the detailed proofs of the assertions and highlights the niche contributions within the broader theory of learning. To the best of our knowledge, there are no comparable results reported in the literature.
Identifer | oai:union.ndltd.org:uottawa.ca/oai:ruor.uottawa.ca:10393/39097 |
Date | 25 April 2019 |
Creators | Lu, Haoye |
Contributors | Nayak, Amiya |
Publisher | Université d'Ottawa / University of Ottawa |
Source Sets | Université d’Ottawa |
Language | English |
Detected Language | English |
Type | Thesis |
Format | application/pdf |
Page generated in 0.0026 seconds