Hengst, Bernhard, Computer Science & Engineering, Faculty of Engineering, UNSW
This thesis addresses the open problem of automatically discovering hierarchical structure in reinforcement learning. Current algorithms for reinforcement learning fail to scale as problems become more complex. Many complex environments empirically exhibit hierarchy and can be modeled as interrelated subsystems, each in turn with hierarchic structure. Subsystems are often repetitive in time and space, meaning that they reoccur as components of different tasks or occur multiple times in different circumstances in the environment. A learning agent may sometimes scale to larger problems if it successfully exploits this repetition. Evidence suggests that a bottom up approach that repetitively finds building-blocks at one level of abstraction and uses them as background knowledge at the next level of abstraction, makes learning in many complex environments tractable. An algorithm, called HEXQ, is described that automatically decomposes and solves a multi-dimensional Markov decision problem (MDP) by constructing a multi-level hierarchy of interlinked subtasks without being given the model beforehand. The effectiveness and efficiency of the HEXQ decomposition depends largely on the choice of representation in terms of the variables, their temporal relationship and whether the problem exhibits a type of constrained stochasticity. The algorithm is first developed for stochastic shortest path problems and then extended to infinite horizon problems. The operation of the algorithm is demonstrated using a number of examples including a taxi domain, various navigation tasks, the Towers of Hanoi and a larger sporting problem. The main contributions of the thesis are the automation of (1)decomposition, (2) sub-goal identification, and (3) discovery of hierarchical structure for MDPs with states described by a number of variables or features. It points the way to further scaling opportunities that encompass approximations, partial observability, selective perception, relational representations and planning. The longer term research aim is to train rather than program intelligent agents
email@example.com, Megan Adele Le Clus
In the last few decades, the workplace has been increasingly recognised as a legitimate environment for learning new skills and knowledge, which in turn enables workers to participate more effectively in ever-changing work environments. Within the workplace there is the potential for continuous learning to occur not only through formal learning initiatives that are associated with training, but also through informal learning opportunities that are embedded within everyday work activities. Somewhat surprisingly however, there have been relatively limited empirical investigations into the actual processes of informal learning in the workplace. This may in part be due to the particular methodological challenges of examining forms of learning that are not structured or organised but incidental to daily work activities. There remains, therefore, a clear need to better understand how learning occurs informally in the workplace, and most importantly, to gain insight into workers own accounts of informal learning experiences. This thesis addresses this issue by examining workers personal experiences of informal learning, and how these contributed to better participation in their regular workplace activities. Four bodies of literature were reviewed as directly relevant to this research, adult learning, organisational learning, informal learning, and a sociocultural perspective on learning. Together, they provide complementary perspectives on the development of learning in the workplace. A conceptual framework, grounded in the sociocultural perspective, was developed to address the issue of how informal learning leads to better participation in the workplace, and reciprocally, how better participation leads to continuous informal learning. Consistent with the sociocultural perspective, the workplace was conceptualised as a complex social system in which co-workers, who constitute that social system, are assumed to co-regulate each others learning opportunities. Social interactions, therefore, are considered as creating a context in which informal learning is afforded or constrained. Understanding what role workplace culture and socialisation play in affording or constraining informal learning opportunities is therefore crucial. This is because the relationships between co-workers is assumed to influence how both new and established co-workers participate in and experience the socialisation process and how they see their respective roles. The framework developed for the study generated two main research questions: How do co-workers learn informally in the workplace? and How does the workplace, as a social system, afford or constrain informal learning in the workplace? The methodology chosen for this empirical study was consistent with key concepts from the sociocultural perspective, namely that individuals and their social context must be studied concurrently as learning is assumed to be part of a social practice where activities are structured by social, cultural and situational factors. Accordingly, qualitative research methods were employed to gain knowledge and understanding of informal learning in the workplace from the perspective of co-workers. Co-workers reflections on their informal learning experiences and participation in the workplace are presented in narrative form and their accounts interpreted from the sociocultural theoretical perspective. The narrative format provides a useful way of presenting data in a way that immerses the reader in the phenomenon, with enough concrete details that the reader can identify with the subjective experiences of informal learning of each participant. The study highlighted how the nature of some relationships between new and established co-workers afforded opportunities for informal learning, while other relationships constrained such opportunities. These afforded or constrained opportunities were by nature spontaneous, planned, intentional or unintentional. The study also revealed that personal and organisational factors co-contributed to creating these social affordances or constraints. Common across groups was the importance given to the quality of relationships between co-workers. The way new and established co-workers participated and interacted in the workplace was found to represent important sociocultural processes that impacted on the effectiveness of informal learning. Overall, this study draws attention to the complexity of participation and interaction in the workplace. A major implication is that opportunities for informal learning are, potentially afforded or constrained by the social context. The study also highlighted conceptual and methodological issues in identifying and interpreting how co-workers learn informally in the workplace. Future research should establish how opportunities for effective informal learning might be fostered further through the design of more enabling workplace practices. The significance of perceived and expected roles between new and established co-workers also deserves further empirical attention, at the level of everyday informal practices but also at the level of organisational processes and structures that provide the broader context.
Joshi, Saket Subhash
03 October 2003
Sequential supervised learning problems involve assigning a class label to each item in a sequence. Examples include part-of-speech tagging and text-to-speech mapping. A very general-purpose strategy for solving such problems is to construct a recurrent sliding window (RSW) classifier, which maps some window of the input sequence plus some number of previously-predicted items into a prediction for the next item in the sequence. This paper describes a general purpose implementation of RSW classifiers and discusses the highly practical issue of how to choose the size of the input window and the number of previous predictions to incorporate. Experiments on two real-world domains show that the optimal choices vary from one learning algorithm to another. They also depend on the evaluation criterion (number of correctly-predicted items versus number of correctly-predicted whole sequences). We conclude that window sizes must be chosen by cross-validation. The results have implications for the choice of window sizes for other models including hidden Markov models and conditional random fields. / Graduation date: 2004
09 May 1996
Reinforcement Learning (RL) is the study of learning agents that improve their performance from rewards and punishments. Most reinforcement learning methods optimize the discounted total reward received by an agent, while, in many domains, the natural criterion is to optimize the average reward per time step. In this thesis, we introduce a model-based average reward reinforcement learning method called "H-learning" and show that it performs better than other average reward and discounted RL methods in the domain of scheduling a simulated Automatic Guided Vehicle (AGV). We also introduce a version of H-learning which automatically explores the unexplored parts of the state space, while always choosing an apparently best action with respect to the current value function. We show that this "Auto-exploratory H-Learning" performs much better than the original H-learning under many previously studied exploration strategies. To scale H-learning to large state spaces, we extend it to learn action models and reward functions in the form of Bayesian networks, and approximate its value function using local linear regression. We show that both of these extensions are very effective in significantly reducing the space requirement of H-learning, and in making it converge much faster in the AGV scheduling task. Further, Auto-exploratory H-learning synergistically combines with Bayesian network model learning and value function approximation by local linear regression, yielding a highly effective average reward RL algorithm. We believe that the algorithms presented here have the potential to scale to large applications in the context of average reward optimization. / Graduation date:1996
Pädagogische Anforderungen an das Lernhandeln im E-Learning : Dimensionen von Selbstlernkompetenz /Heidenreich, Susanne. January 2009 (has links)
Zugl.: Dresden, Techn. Universiẗat, Diss., 2009.
Perceived attributes to the development of a positive selfconcept from the experiences of adolescents with learning disabilities /Bernacchio, Charles P., January 2003 (has links) (PDF)
Thesis (Doctor of Education) in Individualized Ph. D. Program--University of Maine, 2003. / Includes vita. Includes bibliographical references (leaves 216-221).
Barton, John Wesley,
Thesis (Ph. D.)--University of Texas at Austin, 1999. / Vita. Includes bibliographical references (leaves 145-168). Available also in a digital version from Dissertation Abstracts.
Thesis (MA)--University of Montana, 2009. / Contents viewed on December 11, 2009. Title from author supplied metadata. Includes bibliographical references.
Retherford, April L.
Thesis (Ed. D.)--Oregon State University, 2001. / Typescript (photocopy). Includes bibliographical references (leaves 212-228). Also available online.
Jong, Nicholas K.
18 December 2012
Reinforcement Learning (RL) offers a promising approach towards achieving the dream of autonomous agents that can behave intelligently in the real world. Instead of requiring humans to determine the correct behaviors or sufficient knowledge in advance, RL algorithms allow an agent to acquire the necessary knowledge through direct experience with its environment. Early algorithms guaranteed convergence to optimal behaviors in limited domains, giving hope that simple, universal mechanisms would allow learning agents to succeed at solving a wide variety of complex problems. In practice, the field of RL has struggled to apply these techniques successfully to the full breadth and depth of real-world domains. This thesis extends the reach of RL techniques by demonstrating the synergies among certain key developments in the literature. The first of these developments is model-based exploration, which facilitates theoretical convergence guarantees in finite problems by explicitly reasoning about an agent's certainty in its understanding of its environment. A second branch of research studies function approximation, which generalizes RL to infinite problems by artificially limiting the degrees of freedom in an agent's representation of its environment. The final major advance that this thesis incorporates is hierarchical decomposition, which seeks to improve the efficiency of learning by endowing an agent's knowledge and behavior with the gross structure of its environment. Each of these ideas has intuitive appeal and sustains substantial independent research efforts, but this thesis defines the first RL agent that combines all their benefits in the general case. In showing how to combine these techniques effectively, this thesis investigates the twin issues of generalization and exploration, which lie at the heart of efficient learning. This thesis thus lays the groundwork for the next generation of RL algorithms, which will allow scientific agents to know when it suffices to estimate a plan from current data and when to accept the potential cost of running an experiment to gather new data. / text
Page generated in 0.1302 seconds