• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1405
  • 372
  • 195
  • 159
  • 74
  • 59
  • 45
  • 24
  • 23
  • 21
  • 17
  • 17
  • 17
  • 17
  • 17
  • Tagged with
  • 2974
  • 1247
  • 565
  • 391
  • 346
  • 295
  • 256
  • 251
  • 243
  • 242
  • 240
  • 226
  • 203
  • 197
  • 173
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
551

Utilizing negative policy information to accelerate reinforcement learning

Irani, Arya John 08 June 2015 (has links)
A pilot study by Subramanian et al. on Markov decision problem task decomposition by humans revealed that participants break down tasks into both short-term subgoals with a defined end-condition (such as "go to food") and long-term considerations and invariants with no end-condition (such as "avoid predators"). In the context of Markov decision problems, behaviors having clear start and end conditions are well-modeled by an abstraction known as options, but no abstraction exists in the literature for continuous constraints imposed on the agent's behavior. We propose two representations to fill this gap: the state constraint (a set or predicate identifying states that the agent should avoid) and the state-action constraint (identifying state-action pairs that should not be taken). State-action constraints can be directly utilized by an agent, which must choose an action in each state, while state constraints require an approximation of the MDP’s state transition function to be used; however, it is important to support both representations, as certain constraints may be more easily expressed in terms of one as compared to the other, and users may conceive of rules in either form. Using domains inspired by classic video games, this dissertation demonstrates the thesis that explicitly modeling this negative policy information improves reinforcement learning performance by decreasing the amount of training needed to achieve a given level of performance. In particular, we will show that even the use of negative policy information captured from individuals with no background in artificial intelligence yields improved performance. We also demonstrate that the use of options and constraints together form a powerful combination: an option and constraint can be taken together to construct a constrained option, which terminates in any situation where the original option would violate a constraint. In this way, a naive option defined to perform well in a best-case scenario may still accelerate learning in domains where the best-case scenario is not guaranteed.
552

Punching shear behaviour of FRP-reinforced concrete interior slab-column connections

Sayed, Ahmed 26 August 2015 (has links)
Flat slab-column connections are common elements in reinforced concrete (RC) structures such as parking garages. In cold weather regions, these structures are exposed to de-icing salts and aggressive environments. Using fiber reinforced polymer (FRP) bars instead of steel in such structures will overcome the corrosion problems associated with steel reinforcement. However, the available literature shows few studies to evaluate the behaviour of FRP-RC interior slab-column connections tested mainly under concentric loads, which seldom occurs in a real building. The main objectives of this research are to deal with this gap by investigating the behaviour of full-scale glass (G) FRP-RC interior slab-column connections subjected to eccentric load and to provide design recommendations for such type of connections. This study consisted of two phases, experimental and analytical. The experimental phase included the construction and testing of ten full-scale interior slab-column connections. The parameters investigated in the experimental phase were flexural reinforcement ratio, concrete compressive strength, type of the reinforcement, moment-to-shear ratio and the spacing between the shear stud reinforcement. Test results revealed that increasing the GFRP reinforcement ratio or the concrete strength increased the connection capacity. Moreover, compared to the control steel-RC specimen, the GFRP-RC connection with similar reinforcement rigidity showed comparable capacity and deflection at failure. Also, increasing the moment-to-shear ratio resulted in a reduction in the vertical load capacity, while using the shear stud reinforcement enhanced the strength up to 23%. In the analytical phase, a 3-D finite element model (FEM) was constructed using specialized software. The constructed FEM was able to predict the experimental results within a reasonable accuracy. The verified FEM was then used to conduct a parametric study to evaluate the effects of perimeter-to-depth ratio, column aspect ratio, slab thickness and a wide range of flexural reinforcement ratio. The numerical results showed that increasing the reinforcement ratio increased the connection capacity. In addition, increasing the perimeter-to-depth ratio and slab thickness reduced the punching shear stresses at failure, while, the effect of the column rectangularity diminished for a ratio greater than three. Moreover, the results showed prominent agreement with the experimental results from literature. / October 2015
553

Revisiting user simulation in dialogue systems : do we still need them ? : will imitation play the role of simulation ?

Chandramohan, Senthilkumar 25 September 2012 (has links) (PDF)
Recent advancements in the area of spoken language processing and the wide acceptance of portable devices, have attracted signicant interest in spoken dialogue systems.These conversational systems are man-machine interfaces which use natural language (speech) as the medium of interaction.In order to conduct dialogues, computers must have the ability to decide when and what information has to be exchanged with the users. The dialogue management module is responsible to make these decisions so that the intended task (such as ticket booking or appointment scheduling) can be achieved.Thus learning a good strategy for dialogue management is a critical task.In recent years reinforcement learning-based dialogue management optimization has evolved to be the state-of-the-art. A majority of the algorithms used for this purpose needs vast amounts of training data.However, data generation in the dialogue domain is an expensive and time consuming process. In order to cope with this and also to evaluatethe learnt dialogue strategies, user modelling in dialogue systems was introduced. These models simulate real users in order to generate synthetic data.Being computational models, they introduce some degree of modelling errors. In spite of this, system designers are forced to employ user models due to the data requirement of conventional reinforcement learning algorithms can learn optimal dialogue strategies from limited amount of training data when compared to the conventional algorithms. As a consequence of this, user models are no longer required for the purpose of optimization, yet they continue to provide a fast and easy means for quantifying the quality of dialogue strategies. Since existing methods for user modelling are relatively less realistic compared to real user behaviors, the focus is shifted towards user modelling by means of inverse reinforcement learning. Using experimental results, the proposed method's ability to learn a computational models with real user like qualities is showcased as part of this work.
554

NCR vs DRO: Evaluation of Effectiveness, Teacher Preference, and Fidelity of Implementation

Lansdale, Jackie Courntey 01 January 2012 (has links)
Abstract Previous research has demonstrated that non-contingent reinforcement (NCR) and differential reinforcement of other behaviors (DRO) are effective procedures in reducing problem behavior of children both in and out of the classroom. However, few studies have assessed which procedure is most socially acceptable among teachers. In addition, studies have not recorded data on fidelity of implementation among teachers. A non-concurrent multiple baseline across teachers design was used to (a) demonstrate the effect of NCR and DRO on the problem behaviors of school aged children with no identified developmental disability, and (b) assess implementation fidelity of each procedure by the teacher. This study further assessed which procedure was preferred by teachers by the addition of questionnaires and a choice phase in which teachers ultimately chose which procedure to implement. Results showed that both procedures significantly reduced problem behavior across all participants, with the DRO procedure having the greatest effect. The procedure that was preferred most by teachers varied across participants. One of the three participants preferred the NCR procedure, one preferred the DRO procedure and the last participant gave mixed results between the procedure she said she preferred in the surveys and the procedure she chose to implement in the final choice phase.
555

Value methods for efficiently solving stochastic games of complete and incomplete information

Mac Dermed, Liam Charles 13 January 2014 (has links)
Multi-agent reinforcement learning (MARL) poses the same planning problem as traditional reinforcement learning (RL): What actions over time should an agent take in order to maximize its rewards? MARL tackles a challenging set of problems that can be better understood by modeling them as having a relatively simple environment but with complex dynamics attributed to the presence of other agents who are also attempting to maximize their rewards. A great wealth of research has developed around specific subsets of this problem, most notably when the rewards for each agent are either the same or directly opposite each other. However, there has been relatively little progress made for the general problem. This thesis address this lack. Our goal is to tackle the most general, least restrictive class of MARL problems. These are general-sum, non-deterministic, infinite horizon, multi-agent sequential decision problems of complete and incomplete information. Towards this goal, we engage in two complementary endeavors: the creation of tractable models and the construction of efficient algorithms to solve these models. We tackle three well known models: stochastic games, decentralized partially observable Markov decision problems, and partially observable stochastic games. We also present a new fourth model, Markov games of incomplete information, to help solve the partially observable models. For stochastic games and decentralized partially observable Markov decision problems, we develop novel and efficient value iteration algorithms to solve for game theoretic solutions. We empirically evaluate these algorithms on a range of problems, including well known benchmarks and show that our value iteration algorithms perform better than current policy iteration algorithms. Finally, we argue that our approach is easily extendable to new models and solution concepts, thus providing a foundation for a new class of multi-agent value iteration algorithms.
556

Mechanics of Stimulus & Response Generalization in Signal Detection & Psychophysics: Adaptation of Static Theory to Dynamic Performance.

Hutsell, Blake Allen 01 December 2009 (has links)
The area of perceptual decision-making research seeks to understand how our perception of the world affects our judgment. Laboratory investigations of perceptual decision-making concentrate on observers' ability to discriminate among stimuli and their biases towards reporting one stimulus more frequently than others. Choice theories assume that these performance measures are determined by generalization of reinforcement along both stimulus and response dimensions. Historically the majority of research has addressed situations in which the difference among stimuli and resulting consequences of a perceptual decision are static. Consequently, little is known about the dynamics of stimulus and response generalization. The present research investigated the dynamics of discrimination accuracy and response bias by frequently varying differences among stimuli and the outcomes for correct decisions. In Experiment 1, four rats responded in a two-stimulus, two-response detection procedure employing temporal stimuli (short vs. long houselight presentations). Sample stimulus difference was varied over two levels across experimental conditions. A rapid acquisition procedure was employed in which relative reinforcer frequency varied daily. Shifts in response bias were well described by a behavioral model of detection (Davison & Nevin, 1999). Within sessions, bias adjusted rapidly to current reinforcer ratios when the sample stimulus difference was large, but not when the difference was small. In Experiment 2, three rats responded in a five-stimulus, two-response detection procedure employing temporal stimuli. Relative reinforcer frequency was again varied daily. Control by current session reinforcer ratios increased rapidly within sessions in a nearly monotonic fashion. Furthermore, response bias following each sample stimulus was observed within the first few trials of an experimental session. The speed of changes in response bias, especially following an unreinforced probe stimulus, provide strong support for an effective reinforcement process and suggest that this process may operate at a trial-by-trial level. In Experiment 3, three rats responded in a six-stimulus, two-response classification procedure. A repeated-acquisition procedure was employed in which the relationship between classes of short and long sample stimuli and their respective correct comparison locations reversed every 15 sessions. After several reversals, the probabilities of reinforcement for correct classification were also manipulated. In the majority of conditions across subjects, response bias reached half-asymptotic levels more rapidly than did discrimination accuracy. These findings provide some support for a backward chaining account of the acquisition of signal detection performance. An attention-augmented behavioral detection model accurately described the acquisition data; however parameter estimates expressing the probability of attending to sample and comparison stimuli differed widely among subjects. The results of these experiments support the adaptation of dynamic research methodologies to the study of learning in perceptual decision-making tasks. Furthermore, discrimination performance and response bias adapt rapidly to frequent changes in reinforcement contingencies. Quantitative models formulated to describe static performance in detection procedures can be extended to predict dynamic performance. Some theoretical assumptions of these models were supported and others were violated. Overall, this research supports a renewed emphasis on learning in signal detection procedures and suggests that stable behavioral endpoints are at least as much a function of contingency variables as they are of sensory variables.
557

Ligações e armaduras de lajes em vigas mistas de aço e de concreto. / Connections and slab reinforcement of concrete-steel composite beams.

Marisa Aparecida Leonel da Silva Fuzihara 24 November 2006 (has links)
As vigas mistas de aço-concreto vêm ganhando espaço no mundo e no Brasil. Sua grande vantagem é o aproveitamento das melhores propriedades que cada material apresenta. O aço possui excelente resposta para esforços tanto de tração como os de compressão e o concreto para esforços de compressão. As vigas mistas envolvem basicamente o perfil de aço, a laje de concreto, os conectores e as armaduras. Na interface destes materiais ocorrem fenômenos que merecem destaque, como grau de interação, cisalhamento na superfície de contato e separação vertical. Os procedimentos normalmente empregados em projetos de estruturas convencionais de concreto armado e de aço fornecem muitas respostas para questões semelhantes nas estruturas mistas, porém, no geral, não abordam a questão mais relevante que é a ligação entre o aço e o concreto. Na vizinhança dos conectores de cisalhamento, a laje da viga mista de aço e concreto está sujeita a uma combinação de cisalhamento longitudinal e momento fletor transversal, por isso a interface é a região que necessita de uma análise cuidadosa. Esses aspectos são os objetos principais da pesquisa. Adicionalmente são discutidos os procedimentos de projetos adotados pelas normas brasileira (NBR 8800-86), americana (AISC) e européia (EUROCODE 4): nas regiões de ligações entre os materiais por meio de conectores em perfis de aço sob lajes de concreto, no controle da fissuração em seções solicitadas por momentos negativos e nas armaduras transversais de costura. / The use of composite steel-concrete beams is increasing in Brazil and in the world, because this is to take advantage of the best properties of each material. Steel has an excellent response to compression and tension and concrete has to compression. Composite beams include basically the steel beam, concrete slab, connectors and reinforcement. Some phenomena in the interface of these materials must be considered, like the degree of interaction, shear in contact surface and uplift. The procedures normally taken in design of conventional structures of reinforced concrete and steel structures supply many answers to similar questions in composite structures, but, in general, they do not approach the most relevant question which is the bond between steel and concrete. The slab of composite steel-concrete beam is affected by a combination of longitudinal shear and transverse flexure, in the neighborhood of the shear connector. The analysis of the behavior of the slab and the reinforcement are main aspect of the work. In addition, some design procedures adopted by Brazilian Standard (NBR 8800-86), American Standard (AISC-2005) and European standard (EUROCODE 4) are discussed, in especial the related to connects, the crack control in sections with hogging moment and in transverse reinforcement.
558

Stack the Deck: A Self-Monitoring Intervention for Adolescents with Autism for Balancing Participation Levels in Groups

Lees, Lauren Elizabeth 17 June 2020 (has links)
Autism spectrum disorder (ASD) affects the lives of 1 in 54 children in the United States. By definition, these children often have social communication deficits as well as restrictive and repetitive behaviors that are socially isolating. Inclusion of participants with disabilities such as ASD in classroom or group settings with peers is a high-priority goal for building skills that lead to independent living and higher quality of life for all. Balancing an individual’s class or group participation is not always easy with different levels of social skills, however. In a classroom, this can translate to difficulty in knowing how to participate in a way that is equal to that of their peers—oftentimes children with ASD do not realize that others also need a turn to speak or that other children are not as interested in their restricted ¬interests as they are. We used differential reinforcement and self-monitoring within an existing token system to reduce excess participation in group settings for some individuals, with the goal of better balancing opportunities for all group members to participate. Called "Stack the Deck," this simple intervention allowed for more uninterrupted instruction time with fewer talk outs and meltdowns from adolescents with ASD. Our intervention occurred in a clinical setting, a once-weekly social skills group utilizing the PEERS Social Skills manualized intervention for adolescents with ASD. Groups ran for 12–14 weeks in duration and taught skills such as how to make friends, how to enter and exit conversations, as well as how to host "get-togethers." Our sample size was 33, with 26 males and 7 females. These participants met criteria for autism spectrum disorder and/or had significant social impairment, and had age-appropriate verbal and cognitive abilities by parent report (later measured within the study). Across our A-B intervention, we saw changes over time when it came to participation rates for over-responders (participants who attempted to respond far above the group average during baseline) and under-responders (participants who attempted to respond at rates far below the group average during baseline), with no changes (the desired result) for individuals who were already participating at an appropriate rate. Over-responders showed the most significant changes. A secondary finding of reduced talk-outs overall within the groups was also found. These results suggest that a fairly simple group behavioral intervention was able to produce a group environment more conducive to direct instruction that has direct application to inclusive classrooms as well as clinical environments. Further research can determine if the effects within individuals seen in one setting carry over to others.
559

Řešení vybraných detailů betonových konstrukcí s využitím FRP výztuže / Design of selected details of concrete structures with embedded FRP reinforcement

Lagiň, Juraj January 2020 (has links)
The diploma thesis is devided into two levels. The Primary part of the thesis is the theoretical part, which is part of project „FV10588 – New generation of spatial prefab made from high-firm concrete with increased mechanical resistence and endurance“, realized in cooperation with Faculty of Civil Engineering at VUT university – Institute of concrete and masonry structures. The project deals with frame corners in the form of steel and composite reinforcement which will compared through experiments and various kind of calculate proceedings. The secondary part of thesis focuses on the static-design project of cooling reservoir, placed under the ground, while is stressed by temperature. The reinforcement of the construction is realized in two ways – steel and composite reinforcement with their effectivity compared.
560

Towards a Deep Reinforcement Learning based approach for real-time decision making and resource allocation for Prognostics and Health Management applications

Ludeke, Ricardo Pedro João January 2020 (has links)
Industrial operational environments are stochastic and can have complex system dynamics which introduce multiple levels of uncertainty. This uncertainty leads to sub-optimal decision making and resource allocation. Digitalisation and automation of production equipment and the maintenance environment enable predictive maintenance, meaning that equipment can be stopped for maintenance at the optimal time. Resource constraints in maintenance capacity could however result in further undesired downtime if maintenance cannot be performed when scheduled. In this dissertation the applicability of using a Multi-Agent Deep Reinforcement Learning based approach for decision making is investigated to determine the optimal maintenance scheduling policy in a fleet of assets where there are maintenance resource constraints. By considering the underlying system dynamics of maintenance capacity, as well as the health state of individual assets, a near-optimal decision making policy is found that increases equipment availability while also maximising maintenance capacity. The implemented solution is compared to a run-to-failure corrective maintenance strategy, a constant interval preventive maintenance strategy and a condition based predictive maintenance strategy. The proposed approach outperformed traditional maintenance strategies across several asset and operational maintenance performance metrics. It is concluded that Deep Reinforcement Learning based decision making for asset health management and resource allocation is more effective than human based decision making. / Dissertation (MEng (Mechanical Engineering))--University of Pretoria, 2020. / Mechanical and Aeronautical Engineering / MEng (Mechanical Engineering) / Unrestricted

Page generated in 0.1073 seconds