Global ETD Search

321	Moment redistribution in reinforced concrete beams and one-way slabs using 500 MPa steel. Islam, Mohammad M. January 2002 (has links) In the Australian Standard, AS 3600-2001, the neutral axis parameter Ku is used as a convenient, but approximate, parameter to design for moment redistribution in building frames. The research work reported herein was conducted to obtain complete information regarding moment redistribution of beams and one-way slabs using 500 MPa steel reinforcement.A computer based iterative numerical method was developed to analyse reinforced two-span continuous concrete beams and one-way slabs. The method takes into account the material and geometrical non-linearities in the calculations. The deflected shape of the beam and one-way slab was calculated by dividing the span length into a number of rigid segments. The program also calculates the failure load and extent of moment redistribution. The analytical method was verified against the test results reported in the literature. The analytical results for load-deflection graphs and moment redistribution showed a good agreement with the test results.A parametric study was conducted using analytical method. The results of this study showed that moment redistribution depends not only on the neutral axis parameter (Ku) but also on the ratio of neutral axis parameter (Ku-/Ku+), ultimate steel strain (ªsu) and concrete compressive strength (fc).
322	An architecture for situated learning agents Mitchell, Matthew Winston, 1968- January 2003 (has links) Abstract not available Artificial intelligence
323	Psychological Time: The effect of task complexity upon the human estimation of duration. Webber, Simon January 2007 (has links) This thesis was designed to investigate the effect of task complexity upon how humans estimate duration. Previous task complexity research suggests that duration is overestimated with simple tasks and underestimated with complex tasks. One-hundred and forty-two first and second year university students participated. Twelve experiments were conducted, which required participants to complete computer generated jigsaw puzzles and periodically estimate how long they thought they had been doing the puzzle. In Experiment 1, participants were required to complete a jigsaw puzzle before making an estimate. In the remaining eleven experiments, estimates were made throughout the session whilst participants worked on the jigsaw puzzle. In the first four experiments, a task was complex if there were more puzzle pieces and simpler if there were fewer puzzle pieces. There were no significant results obtained from the first four experiments. Given the lack of effect from the first four experiments, the next two experiments partially replicated two task complexity studies to determine how task complexity can be used as an explanation for why estimations of duration differ. Again, there were no significant results obtained from these two experiments. The next four experiments tested whether people's estimates of duration were affected by the rate of reinforcement they receive (i.e., successfully moving a puzzle piece to a new location per unit time). In the first of these two experiments (7 and 8) there was no effect of the manipulation, which consisted of decreasing the distance which a puzzle piece could be moved on the screen, relative to the distance the computer mouse was moved and fixing the speed at which a puzzle piece could be moved. In Experiments 9 and 10, more discriminative stimuli were used to indicate to participants that a change in the reinforcement rate was occurring. There was a significant result in Experiment 9 in one condition but this effect was not replicated in Experiment 10. In Experiment 11, the reinforcement rate was reduced to zero and there was a significant effect on participants' estimates of duration. However, these results suggested a confound between whether the reinforcement rate or not being able to access the jigsaw puzzle was affecting estimates of duration. In Experiment 12, access to the jigsaw puzzle was limited, whilst simultaneously controlling the reinforcement rate and the results showed that not having access to the jigsaw puzzle affected how participants estimate duration. These findings suggest that information can act as reinforcement, enabling a person to engage in private behaviour. When there is no access to reinforcement, time 'drags' for humans. time reinforcement rate information private events BeT
324	Reinforcement learning by incremental patching Kim, Min Sub, Computer Science & Engineering, Faculty of Engineering, UNSW January 2007 (has links) This thesis investigates how an autonomous reinforcement learning agent can improve on an approximate solution by augmenting it with a small patch, which overrides the approximate solution at certain states of the problem. In reinforcement learning, many approximate solutions are smaller and easier to produce than ???flat??? solutions that maintain distinct parameters for each fully enumerated state, but the best solution within the constraints of the approximation may fall well short of global optimality. This thesis proposes that the remaining gap to global optimality can be efficiently minimised by learning a small patch over the approximate solution. In order to improve the agent???s behaviour, algorithms are presented for learning the overriding patch. The patch is grown around particular regions of the problem where the approximate solution is found to be deficient. Two heuristic strategies are proposed for concentrating resources to those areas where inaccuracies in the approximate solution are most costly, drawing a compromise between solution quality and storage requirements. Patching also handles problems with continuous state variables, by two alternative methods: Kuhn triangulation over a fixed discretisation and nearest neighbour interpolation with a variable discretisation. As well as improving the agent???s behaviour, patching is also applied to the agent???s model of the environment. Inaccuracies in the agent???s model of the world are detected by statistical testing, using a selective sampling strategy to limit storage requirements for collecting data. The patching algorithms are demonstrated in several problem domains, illustrating the effectiveness of patching under a wide range of conditions. A scenario drawn from a real-time strategy game demonstrates the ability of patching to handle large complex tasks. These contributions combine to form a general framework for patching over approximate solutions in reinforcement learning. Complex problems cannot be solved by brute force alone, and some form of approximation is necessary to handle large problems. However, this does not mean that the limitations of approximate solutions must be accepted without question. Patching demonstrates one way in which an agent can leverage approximation techniques without losing the ability to handle fine yet important details. Reinforcement learning. Markov decision processes. Patchwork.
325	Damping Characteristics of Reinforced and Prestressed Normal- and High-Strength Concrete Beams Salzmann, Angela, n/a January 2003 (has links) In the last few decades there has been a significant increase in the design strength and performance of different building materials. In particular, new methods, materials and admixtures for the production of concrete have allowed for strengths as high as 100 MPa to be readily available. In addition, the standard manufactured yield strength of reinforcing steel in Australia has increased from 400 MPa to 500 MPa. A perceived design advantage of higher-strength materials is that structural elements can have longer spans and be more slender than previously possible. An emerging problem with slender concrete members is that they can be more vulnerable to loading induced vibration. The damping capacity is an inherent fundamental quantity of all structural concrete members that affects their vibrational response. It is defined as the rate at which a structural member can dissipate the vibrational energy imparted to it. Generally damping capacity measurements, to indicate the integrity of structural members, are taken once the structure is in service. This type of non-destructive testing has been the subject of much research. The published non-destructive testing research on damping capacity is conflicting and a unified method to describe the effect of damage on damping capacity has not yet been proposed. Significantly, there is not one method in the published literature or national design codes, including the Australian Standard AS 3600-2001, available to predict the damping capacity of concrete beam members at the design stage. Further, little research has implemented full-scale testing with a view to developing damping capacity design equations, which is the primary focus of this thesis. To examine the full-range damping behaviour of concrete beams, two categories of testing were proposed. The categories are the 'untested' and 'tested' beam states. These beam states have not been separately investigated in previous work and are considered a major shortcoming of previous research on the damping behaviour of concrete beams. An extensive experimental programme was undertaken to obtain residual deflection and damping capacity data for thirty-one reinforced and ten prestressed concrete beams. The concrete beams had compressive strengths ranging between 23.1 MPa and 90.7 MPa, reinforcement with yield strengths of 400 MPa or 500 MPa, and tensile reinforcement ratios between 0.76% and 2.90%. The full- and half-scale beams tested had lengths of 6.0 m and 2.4 m, respectively. The testing regime consisted of a series of on-off load increments, increasing until failure, designed to induce residual deflections with increasing amounts of internal damage at which damping capacity (logarithmic decrement) was measured. The inconsistencies that were found between the experimental damping capacity of the beams and previous research prompted an initial investigation into the data obtained. It was found that the discrepancies were due to the various interpretations of the method used to extract damping capacity from the free-vibration decay curve. Therefore, a logarithmic decrement calculation method was proposed to ensure consistency and accuracy of the extracted damping capacity data to be used in the subsequent analytical research phase. The experimental test data confirmed that the 'untested' damping capacity of reinforced concrete beams is dependent upon the beam reinforcement ratio and distribution. This quantity was termed the total longitudinal reinforcement distribution. For the prestressed concrete beams, the 'untested' damping capacity was shown to be proportional to the product of the prestressing force and prestressing eccentricity. Separate 'untested' damping capacity equations for reinforced and prestressed concrete beams were developed to reflect these quantities. To account for the variation in damping capacity due to damage in 'tested' beams, a residual deflection mechanism was utilised. The proposed residual deflection mechanism estimates the magnitude of permanent deformation in the beam and attempts to overcome traditional difficulties in calculating the damping capacity during low loading levels. Residual deflection equations, based on the instantaneous deflection data for the current experimental programme, were proposed for both the reinforced and prestressed concrete beams, which in turn were utilised with the proposed 'untested' damping equation to calculate the total damping capacity. The proposed 'untested' damping, residual deflection and total damping capacity equations were compared to published test data and an additional series of test beams. These verification investigations have shown that the proposed equations are reliable and applicable for a range of beam designs, test setups, constituent materials and loading regimes. concrete beam beam reinforced reinforcement prestressed damping
326	Modelling motivation for experience-based attention focus in reinforcement learning Merrick, Kathryn January 2007 (has links) Doctor of Philosophy / Computational models of motivation are software reasoning processes designed to direct, activate or organise the behaviour of artificial agents. Models of motivation inspired by psychological motivation theories permit the design of agents with a key reasoning characteristic of natural systems: experience-based attention focus. The ability to focus attention is critical for agent behaviour in complex or dynamic environments where only small amounts of available information is relevant at a particular time. Furthermore, experience-based attention focus enables adaptive behaviour that focuses on different tasks at different times in response to an agent’s experiences in its environment. This thesis is concerned with the synthesis of motivation and reinforcement learning in artificial agents. This extends reinforcement learning to adaptive, multi-task learning in complex, dynamic environments. Reinforcement learning algorithms are computational approaches to learning characterised by the use of reward or punishment to direct learning. The focus of much existing reinforcement learning research has been on the design of the learning component. In contrast, the focus of this thesis is on the design of computational models of motivation as approaches to the reinforcement component that generates reward or punishment. The primary aim of this thesis is to develop computational models of motivation that extend reinforcement learning with three key aspects of attention focus: rhythmic behavioural cycles, adaptive behaviour and multi-task learning in complex, dynamic environments. This is achieved by representing such environments using context-free grammars, modelling maintenance tasks as observations of these environments and modelling achievement tasks as events in these environments. Motivation is modelled by processes for task selection, the computation of experience-based reward signals for different tasks and arbitration between reward signals to produce a motivation signal. Two specific models of motivation based on the experience-oriented psychological concepts of interest and competence are designed within this framework. The first models motivation as a function of environmental experiences while the second models motivation as an introspective process. This thesis synthesises motivation and reinforcement learning as motivated reinforcement learning agents. Three models of motivated reinforcement learning are presented to explore the combination of motivation with three existing reinforcement learning components. The first model combines motivation with flat reinforcement learning for highly adaptive learning of behaviours for performing multiple tasks. The second model facilitates the recall of learned behaviours by combining motivation with multi-option reinforcement learning. In the third model, motivation is combined with an hierarchical reinforcement learning component to allow both the recall of learned behaviours and the reuse of these behaviours as abstract actions for future learning. Because motivated reinforcement learning agents have capabilities beyond those of existing reinforcement learning approaches, new techniques are required to measure their performance. The secondary aim of this thesis is to develop metrics for measuring the performance of different computational models of motivation with respect to the adaptive, multi-task learning they motivate. This is achieved by analysing the behaviour of motivated reinforcement learning agents incorporating different motivation functions with different learning components. Two new metrics are introduced that evaluate the behaviour learned by motivated reinforcement learning agents in terms of the variety of tasks learned and the complexity of those tasks. Persistent, multi-player computer game worlds are used as the primary example of complex, dynamic environments in this thesis. Motivated reinforcement learning agents are applied to control the non-player characters in games. Simulated game environments are used for evaluating and comparing motivated reinforcement learning agents using different motivation and learning components. The performance and scalability of these agents are analysed in a series of empirical studies in dynamic environments and environments of progressively increasing complexity. Game environments simulating two types of complexity increase are studied: environments with increasing numbers of potential learning tasks and environments with learning tasks that require behavioural cycles comprising more actions. A number of key conclusions can be drawn from the empirical studies, concerning both different computational models of motivation and their combination with different reinforcement learning components. Experimental results confirm that rhythmic behavioural cycles, adaptive behaviour and multi-task learning can be achieved using computational models of motivation as an experience-based reward signal for reinforcement learning. In dynamic environments, motivated reinforcement learning agents incorporating introspective competence motivation adapt more rapidly to change than agents motivated by interest alone. Agents incorporating competence motivation also scale to environments of greater complexity than agents motivated by interest alone. Motivated reinforcement learning agents combining motivation with flat reinforcement learning are the most adaptive in dynamic environments and exhibit scalable behavioural variety and complexity as the number of potential learning tasks is increased. However, when tasks require behavioural cycles comprising more actions, motivated reinforcement learning agents using a multi-option learning component exhibit greater scalability. Motivated multi-option reinforcement learning also provides a more scalable approach to recall than motivated hierarchical reinforcement learning. In summary, this thesis makes contributions in two key areas. Computational models of motivation and motivated reinforcement learning extend reinforcement learning to adaptive, multi-task learning in complex, dynamic environments. Motivated reinforcement learning agents allow the design of non-player characters for computer games that can progressively adapt their behaviour in response to changes in their environment. Motivation reinforcement learning agent computer games
327	Hierarchical average reward reinforcement learning Seri, Sandeep 15 March 2002 (has links) Reinforcement Learning (RL) is the study of agents that learn optimal behavior by interacting with and receiving rewards and punishments from an unknown environment. RL agents typically do this by learning value functions that assign a value to each state (situation) or to each state-action pair. Recently, there has been a growing interest in using hierarchical methods to cope with the complexity that arises due to the huge number of states found in most interesting real-world problems. Hierarchical methods seek to reduce this complexity by the use of temporal and state abstraction. Like most RL methods, most hierarchical RL methods optimize the discounted total reward that the agent receives. However, in many domains, the proper criteria to optimize is the average reward per time step. In this thesis, we adapt the concepts of hierarchical and recursive optimality, which are used to describe the kind of optimality achieved by hierarchical methods, to the average reward setting and show that they coincide under a condition called Result Distribution Invariance. We present two new model-based hierarchical RL methods, HH-learning and HAH-learning, that are intended to optimize the average reward. HH-learning is a hierarchical extension of the model-based, average-reward RL method, H-learning. Like H-learning, HH-learning requires exploration in order to learn correct domain models and optimal value function. HH-learning can be used with any exploration strategy whereas HAH-learning uses the principle of "optimism under uncertainty", which gives it a built-in "auto-exploratory" feature. We also give the hierarchical and auto-exploratory hierarchical versions of R-learning, a model-free average reward method, and a hierarchical version of ARTDP, a model-based discounted total reward method. We compare the performance of the "flat" and hierarchical methods in the task of scheduling an Automated Guided Vehicle (AGV) in a variety of settings. The results show that hierarchical methods can take advantage of temporal and state abstraction and converge in fewer steps than the flat methods. The exception is the hierarchical version of ARTDP. We give an explanation for this anomaly. Auto-exploratory hierarchical methods are faster than the hierarchical methods with ��-greedy exploration. Finally, hierarchical model-based methods are faster than hierarchical model-free methods. / Graduation date: 2003
328	Learning with Deictic Representation Finney, Sarah, Gardiol, Natalia H., Kaelbling, Leslie Pack, Oates, Tim 10 April 2002 (has links) Most reinforcement learning methods operate on propositional representations of the world state. Such representations are often intractably large and generalize poorly. Using a deictic representation is believed to be a viable alternative: they promise generalization while allowing the use of existing reinforcement-learning methods. Yet, there are few experiments on learning with deictic representations reported in the literature. In this paper we explore the effectiveness of two forms of deictic representation and a naive propositional representation in a simple blocks-world domain. We find, empirically, that the deictic representations actually worsen performance. We conclude with a discussion of possible causes of these results and strategies for more effective learning in domains with objects. AI Reinforcement Learning Partial Observability Representations
329	A Reinforcement-Learning Approach to Power Management Steinbach, Carl 01 May 2002 (has links) We describe an adaptive, mid-level approach to the wireless device power management problem. Our approach is based on reinforcement learning, a machine learning framework for autonomous agents. We describe how our framework can be applied to the power management problem in both infrastructure and ad~hoc wireless networks. From this thesis we conclude that mid-level power management policies can outperform low-level policies and are more convenient to implement than high-level policies. We also conclude that power management policies need to adapt to the user and network, and that a mid-level power management framework based on reinforcement learning fulfills these requirements. AI reinforcement learning power management wireless networks
330	On the Convergence of Stochastic Iterative Dynamic Programming Algorithms Jaakkola, Tommi, Jordan, Michael I., Singh, Satinder P. 01 August 1993 (has links) Recent developments in the area of reinforcement learning have yielded a number of new algorithms for the prediction and control of Markovian environments. These algorithms, including the TD(lambda) algorithm of Sutton (1988) and the Q-learning algorithm of Watkins (1989), can be motivated heuristically as approximations to dynamic programming (DP). In this paper we provide a rigorous proof of convergence of these DP-based learning algorithms by relating them to the powerful techniques of stochastic approximation theory via a new convergence theorem. The theorem establishes a general class of convergent algorithms to which both TD(lambda) and Q-learning belong. reinforcement learning stochastic approximation sconvergence dynamic programming

Search results