• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 33
  • 6
  • 5
  • 4
  • 3
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 63
  • 33
  • 32
  • 31
  • 24
  • 15
  • 15
  • 11
  • 11
  • 11
  • 10
  • 8
  • 8
  • 7
  • 7
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Learning in a state of confusion : employing active perception and reinforcement learning in partially observable worlds

Crook, Paul A. January 2007 (has links)
In applying reinforcement learning to agents acting in the real world we are often faced with tasks that are non-Markovian in nature. Much work has been done using state estimation algorithms to try to uncover Markovian models of tasks in order to allow the learning of optimal solutions using reinforcement learning. Unfortunately these algorithms which attempt to simultaneously learn a Markov model of the world and how to act have proved very brittle. Our focus differs. In considering embodied, embedded and situated agents we have a preference for simple learning algorithms which reliably learn satisficing policies. The learning algorithms we consider do not try to uncover the underlying Markovian states, instead they aim to learn successful deterministic reactive policies such that agents actions are based directly upon the observations provided by their sensors. Existing results have shown that such reactive policies can be arbitrarily worse than a policy that has access to the underlying Markov process and in some cases no satisficing reactive policy can exist. Our first contribution is to show that providing agents with alternative actions and viewpoints on the task through the addition of active perception can provide a practical solution in such circumstances. We demonstrate empirically that: (i) adding arbitrary active perception actions to agents which can only learn deterministic reactive policies can allow the learning of satisficing policies where none were originally possible; (ii) active perception actions allow the learning of better satisficing policies than those that existed previously and (iii) our approach converges more reliably to satisficing solutions than existing state estimation algorithms such as U-Tree and the Lion Algorithm. Our other contributions focus on issues which affect the reliability with which deterministic reactive satisficing policies can be learnt in non-Markovian environments. We show that that greedy action selection may be a necessary condition for the existence of stable deterministic reactive policies on partially observable Markov decision processes (POMDPs). We also set out the concept of Consistent Exploration. This is the idea of estimating state-action values by acting as though the policy has been changed to incorporate the action being explored. We demonstrate that this concept can be used to develop better algorithms for learning reactive policies to POMDPs by presenting a new reinforcement learning algorithm; the Consistent Exploration Q(l) algorithm (CEQ(l)). We demonstrate on a significant number of problems that CEQ(l) is more reliable at learning satisficing solutions than the algorithm currently regarded as the best for learning deterministic reactive policies, that of SARSA(l).
12

Understanding effective teaching : perceptions from students, staff and executive managers in a post-1992 university

Clarke, Karen January 2015 (has links)
This study proposes a model for effective teaching based on the development of an affiliative culture for both students and staff. Characteristics such as respect, helpfulness, and approachability are combined with specific teaching skills that are perceived by staff and students to be effective both for displaying these traits and also to enhance teaching. Although the literature shows that qualitative attributes are not new, from the data gathered, it appears that they are not always recognised by staff as significant for students’ learning. The literature also indicates that there is a disjuncture in the perceptions of effective teaching from executive management, staff and students. The context of the research is in a post-1992 university and current trends indicate there has been a shift in higher education towards a more bureaucratic approach to accountability in terms of student numbers and financial aspects that have resulted in larger teaching groups especially for post1992 universities. Additionally, the student funding system has changed so that students are now responsible for paying all their tuition fees, albeit via student loans. The literature proposes that this means that students may consider themselves as customers which indicate a different kind of relationship with a university. The research literature posits that these reforms have led to increased pressures on teaching staff so that they do not have time to develop a climate of affiliation which encompasses openness, trust and a sense of belonging for the students which, in turn, leads to creating a positive learning environment and student success. The literature review considers the perceptions of effective teaching from academic teaching staff, students and executive managers. This research uses a case study approach with the research design set within an interpretivist paradigm whereby the opinions and perceptions of the respondents are explored. Data were gathered through multiple data collection tools, such as internal student surveys, a student focus group interview, and filmed teaching observations, stimulated recall discussions with staff and conversational interviews with executive managers. In addition, secondary data were used from the narrative comments in the National Student Survey, (NSS) (2012) to complement the internal survey. The research questions focus on the perceptions and strategies that are viewed as part of effective teaching from the three groups of participants. 4 From the findings, I have developed a model to promote effective teaching which proposes an alignment of affiliation with specific teaching skills that encourages participation from both staff and students so that learning is jointly constructed. The model presents a way that combines the personal qualities and values gathered from the data, with students’ preferred teaching strategies which are perceived to enable effective teaching to take place. The inter-relationship between specific teaching skills and personal characteristics, identified in the model, is unique because it is the only approach that combines teaching methods with a values base that encourages a culture of affiliation for both staff and students.
13

Finite Memory Policies for Partially Observable Markov Decision Proesses

Lusena, Christopher 01 January 2001 (has links)
This dissertation makes contributions to areas of research on planning with POMDPs: complexity theoretic results and heuristic techniques. The most important contributions are probably the complexity of approximating the optimal history-dependent finite-horizon policy for a POMDP, and the idea of heuristic search over the space of FFTs.
14

Managing populations in the face of uncertainty: adaptive management, partial observability and the dynamic value of information.

Moore, Alana L. January 2008 (has links)
The work presented in this thesis falls naturally into two parts. The first part (Chapter 2), is concerned with the benefit of perturbing a population into an immediately undesirable state, in order to improve estimates of a static probability which may improve long-term management. We consider finding the optimal harvest policy for a theoretical harvested population when a key parameter is unknown. We employ an adaptive management framework to study when it is worth sacrificing short term rewards in order to increase long term profits. / Active adaptive management has been increasingly advocated in natural resource management and conservation biology as a methodology for resolving key uncertainties about population dynamics and responses to management. However, when comparing management policies it is traditional to weigh future rewards geometrically (at a constant discount rate) which results in far-distant rewards making a negligible contribution to the total benefit. Under such a discounting scheme active adaptive management is rarely of much benefit, especially if learning is slow. In Chapter 2, we consider two proposed alternative forms of discounting for evaluating optimal policies for long term decisions which have a social component. / We demonstrate that discount functions which weigh future rewards more heavily result in more conservative harvesting strategies, but do not necessarily encourage active learning. Furthermore, the optimal management strategy is not equivalent to employing geometric discounting at a lower rate. If alternative discount functions are made mandatory in calculating optimal management policies for environmental management, then this will affect the structure of optimal management regimes and change when and how much we are willing to invest in learning. / The second part of this thesis is concerned with how to account for partial observability when calculating optimal management policies. We consider the problem of controlling an invasive pest species when only partial observations are available at each time step. In the model considered, the monitoring data available are binomial observations of a probability which is an index of the population size. We are again concerned with estimating a probability, however, in this model the probability is changing over time. / Before including partial observability explicitly, we consider a model in which perfect observations of the population are available at each time step (Chapter 3). It is intuitive that monitoring will be beneficial only if the management decision depends on the outcome. Hence, a necessary condition for monitoring to be worthwhile is that control polices which are specified in terms of the system state, out-perform simpler time-based control policies. Consequently, in addition to providing a benchmark against which we can compare the optimal management policy in the case of partial observations, analysing the perfect observation case also provides insight into when monitoring is likely to be most valuable. / In Chapters 4 and 5 we include partial observability by modelling the control problem as a partially observable Markov decision process (POMDP). We outline several tests which stem from a property of conservation of expected utility under monitoring, which aid in validating the model. We discuss the optimal management policy prescribed by the POMDP for a range of model scenarios, and use simulation to compare the POMDP management policy to several alternative policies, including controlling with perfect observations and no observations. / In Chapter 6 we propose an alternative model, developed in the spirit of a POMDP, that does not strictly satisfy the definition of a POMDP. We find that although the second model has some conceptually appealing attributes, it makes an undesirable implicit assumption about the underlying population dynamics.
15

Innovative Simulation and Tree Models and Reinforcement Learning Methods with Applications in Cybersecurity

Liu, Enhao January 2021 (has links)
No description available.
16

Low-Observable Object Detection and Tracking Using Advanced Image Processing Techniques

Li, Meng 21 August 2014 (has links)
No description available.
17

Optimal Control Problems In Communication Networks With Information Delays And Quality Of Service Constraints

Kuri, Joy 02 1900 (has links)
In this thesis, we consider optimal control problems arising in high-speed integrated communication networks with Quality of Service (QOS) constraints. Integrated networks are expected to carry a large variety of traffic sources with widely varying traffic characteristics and performance requirements. Broadly, the traffic sources fall into two categories: (a) real-time sources with specified performance criteria, like small end to end delay and loss probability (sources of this type are referred to as Type 1 sources below), and (b) sources that do not have stringent performance criteria and do not demand performance guarantees from the network - the so-called Best Effort Type sources (these are referred to as Type 2 sources below). From the network's point of view, Type 2 sources are much more "controllable" than Type 1 sources, in the sense that the Type 2 sources can be dynamically slowed down, stopped or speeded up depending on traffic congestion in the network, while for Type 1 sources, the only control action available in case of congestion is packet dropping. Carrying sources of both types in the same network concurrently while meeting the performance objectives of Type 1 sources is a challenge and raises the question of equitable sharing of resources. The objective is to carry as much Type 2 traffic as possible without sacrificing the performance requirements of Type 1 traffic. We consider simple models that capture this situation. Consider a network node through which two connections pass, one each of Types 1 and 2. One would like to maximize the throughput of the Type 2 connection while ensuring that the Type 1 connection's performance objectives are met. This can be set up as a constrained optimization problem that, however, is very hard to solve. We introduce a parameter b that represents the "cost" of buffer occupancy by Type 2 traffic. Since buffer space is limited and shared, a queued Type 2 packet means that a buffer position is not available for storing a Type 1 packet; to discourage the Type 2 connection from hogging the buffer, the cost parameter b is introduced, while a reward for each Type 2 packet coming into the buffer encourages the Type 2 connection to transmit at a high rate. Using standard on-off models for the Type 1 sources, we show how values can be assigned to the parameter b; the value depends on the characteristics of the Type 1 connection passing through the node, i.e., whether it is a Variable Bit Rate (VBR) video connection or a Continuous Bit Rate (CBR) connection etc. Our approach gives concrete networking significance to the parameter b, which has long been considered as an abstract parameter in reward-penalty formulations of flow control problems (for example, [Stidham '85]). Having seen how to assign values to b, we focus on the Type 2 connection next. Since Type 2 connections do not have strict performance requirements, it is possible to defer transmitting a Type 2 packet, if the conditions downstream so warrant. This leads to the question: what is the "best" transmission policy for Type 2 packets? Decisions to transmit or not must be based on congestion conditions downstream; however, the network state that is available at any instant gives information that is old, since feedback latency is an inherent feature of high speed networks. Thus the problem is to identify the best transmission policy under delayed feedback information. We study this problem in the framework of Markov Decision Theory. With appropriate assumptions on the arrivals, service times and scheduling discipline at a network node, we formulate our problem as a Partially Observable Controlled Markov Chain (PO-CMC). We then give an equivalent formulation of the problem in terms of a Completely Observable Controlled Markov Chain (CO-CMC) that is easier to deal with., Using Dynamic Programming and Value Iteration, we identify structural properties of an optimal transmission policy when the delay in obtaining feedback information is one time slot. For both discounted and average cost criteria, we show that the optimal policy has a two-threshold structure, with the threshold on the observed queue length depending, on whether a Type 2 packet was transmitted in the last slot or not. For an observation delay k > 2, the Value Iteration technique does not yield results. We use the structure of the problem to provide computable upper and lower bounds to the optimal value function. A study of these bounds yields information about the structure of the optimal policy for this problem. We show that for appropriate values of the parameters of the problem, depending on the number of transmissions in the last k steps, there is an "upper cut off" number which is a value such that if the observed queue length is greater than or equal to this number, the optimal action is to not transmit. Since the number of transmissions in the last k steps is between 0 and A: both inclusive, we have a stack of (k+1) upper cut off values. We conjecture that these (k + l) values axe thresholds and the optimal policy for this problem has a (k + l)-threshold structure. So far it has been assumed that the parameters of the problem are known at the transmission control point. In reality, this is usually not known and changes over time. Thus, one needs an adaptive transmission policy that keeps track of and adjusts to changing network conditions. We show that the information structure in our problem admits a simple adaptive policy that performs reasonably well in a quasi-static traffic environment. Up to this point, the models we have studied correspond to a single hop in a virtual connection. We consider the multiple hop problem next. A basic matter of interest here is whether one should have end to end or hop by hop controls. We develop a sample path approach to answer this question. It turns out that depending on the relative values of the b parameter in the transmitting node and its downstream neighbour, sometimes end to end controls are preferable while at other times hop by hop controls are preferable. Finally, we consider a routing problem in a high speed network where feedback information is delayed, as usual. As before, we formulate the problem in the framework of Markov Decision Theory and apply Value Iteration to deduce structural properties of an optimal control policy. We show that for both discounted and average cost criteria, the optimal policy for an observation delay of one slot is Join the Shortest Expected Queue (JSEQ) - a natural and intuitively satisfactory extension of the well-known Join the Shortest Queue (JSQ) policy that is optimal when there is no feedback delay (see, for example, [Weber 78]). However, for an observation delay of more than one slot, we show that the JSEQ policy is not optimal. Determining the structure of the optimal policy for a delay k>2 appears to be very difficult using the Value Iteration approach; we explore some likely policies by simulation.
18

Semi-Markov Processes In Dynamic Games And Finance

Goswami, Anindya 02 1900 (has links)
Two different sets of problems are addressed in this thesis. The first one is on partially observed semi-Markov Games (POSMG) and the second one is on semi-Markov modulated financial market model. In this thesis we study a partially observable semi-Markov game in the infinite time horizon. The study of a partially observable game (POG) involves three major steps: (i) construct an equivalent completely observable game (COG), (ii) establish the equivalence between POG and COG by showing that if COG admits an equilibrium, POG does so, (iii) study the equilibrium of COG and find the corresponding equilibrium of original partially observable problem. In case of infinite time horizon game problem there are two different payoff criteria. These are discounted payoff criterion and average payoff criterion. At first a partially observable semi-Markov decision process on general state space with discounted cost criterion is studied. An optimal policy is shown to exist by considering a Shapley’s equation for the corresponding completely observable model. Next the discounted payoff problem is studied for two-person zero-sum case. A saddle point equilibrium is shown to exist for this case. Then the variable sum game is investigated. For this case the Nash equilibrium strategy is obtained in Markov class under suitable assumption. Next the POSMG problem on countable state space is addressed for average payoff criterion. It is well known that under this criterion the game problem do not have a solution in general. To ensure a solution one needs some kind of ergodicity of the transition kernel. We find an appropriate ergodicity of partially observed model which in turn induces a geometric ergodicity to the equivalent model. Using this we establish a solution of the corresponding average payoff optimality equation (APOE). Thus the value and a saddle point equilibrium is obtained for the original partially observable model. A value iteration scheme is also developed to find out the average value of the game. Next we study the financial market model whose key parameters are modulated by semi-Markov processes. Two different problems are addressed under this market assumption. In the first one we show that this market is incomplete. In such an incomplete market we find the locally risk minimizing prices of exotic options in the Follmer Schweizer framework. In this model the stock prices are no more Markov. Generally stock price process is modeled as Markov process because otherwise one may not get a pde representation of price of a contingent claim. To overcome this difficulty we find an appropriate Markov process which includes the stock price as a component and then find its infinitesimal generator. Using Feynman-Kac formula we obtain a system of non-local partial differential equations satisfied by the option price functions in the mildsense. .Next this system is shown to have a classical solution for given initial or boundary conditions. Then this solution is used to have a F¨ollmer Schweizer decomposition of option price. Thus we obtain the locally risk minimizing prices of different options. Furthermore we obtain an integral equation satisfied by the unique solution of this system. This enable us to compute the price of a contingent claim and find the risk minimizing hedging strategy numerically. Further we develop an efficient and stable numerical method to compute the prices. Beside this work on derivative pricing, the portfolio optimization problem in semi-Markov modulated market is also studied in the thesis. We find the optimal portfolio selections by optimizing expected utility of terminal wealth. We also obtain the optimal portfolio selections under risk sensitive criterion for both finite and infinite time horizon.
19

Gauge fixed gluonic observables and neutral kaon mixing on the lattice

Hudspith, Renwick January 2013 (has links)
This thesis presents gauge fixed gluonic observable and neutral Kaon mixing matrix element measurements using nf=2+1 Domain Wall Fermion (DWF) configurations. These were generated with the Iwasaki gauge action by the RBC and UKQCD collaborations. Results from the first measurement of the QCD strong coupling with these ensembles using the triple gluon vertex are shown. We find that while a very accurate measurement of the coupling is possible using this technique, the systematic error from the perturbative matching at current lattice scales is large. We also discuss the utilisation of this method as a probe for possible Technicolor theories. The calculation of the QCD strong coupling constant from the triple gluon vertex required an implementation of a fast code to fix lattice gauge configurations. I provide details on my implementation of a parallel and optimised Fourier-accelerated algorithm for both Landau and Coulomb gauge fixing. I include the first calculation of the highly accurate W0-scale using these ensembles, allowing for percent-level scale setting. I show results from a wide variety of smearing methods and present the first gluonic measurement of different smearing radii. This thesis also details the first nf=2+1 measurement of the BSM neutral Kaon mixing renormalised matrix elements from lattice simulations with almost exact chiral symmetry in the valence sector and the sea.
20

Localização multirrobo cooperativa com planejamento / Planning for multi-robot localization

Pinheiro, Paulo Gurgel, 1983- 11 September 2018 (has links)
Orientador: Jacques Wainer / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Computação / Made available in DSpace on 2018-09-11T21:14:07Z (GMT). No. of bitstreams: 1 Pinheiro_PauloGurgel_M.pdf: 1259816 bytes, checksum: a4783df9aa3755becb68ee233ad43e3c (MD5) Previous issue date: 2009 / Resumo: Em um problema de localização multirrobô cooperativa, um grupo de robôs encontra-se em um determinado ambiente, cuja localização exata de cada um dos robôs é desconhecida. Neste cenário, uma distribuição de probabilidades aponta as chances de um robô estar em um determinado estado. É necessário então, que os robôs se movimentem pelo ambiente e gerem novas observações que serão compartilhadas, para calcular novas estimativas. Nos últimos anos, muitos trabalhos têm focado no estudo de técnicas probabilísticas, modelos de comunicação e modelos de detecções, para resolver o problema de localização. No entanto, a movimentação dos robôs é, em geral, definida por ações aleatórias. Ações aleatórias geram observações que podem ser inúteis para a melhoria da estimativa. Este trabalho apresenta uma proposta de localização com suporte a planejamento de ações. O objetivo é apresentar um modelo cujas ações realizadas pelos robôs são definidas por políticas. Escolhendo a melhor ação a ser realizada, é possível receber informações mais úteis dos sensores internos e externos e estimar as posturas mais rapidamente. O modelo proposto, denominado Modelo de Localização Planejada - MLP, utiliza POMDPs para modelar os problemas de localização e algoritmos específicos de geração de políticas. Foi utilizada a localização de Markov como técnica probabilística de localização e implementadas versões de modelos de detecção e propagação de informação. Neste trabalho, um simulador de problemas de localização multirrobô foi desenvolvido, no qual foram realizados experimentos em que o modelo proposto foi comparado a um modelo que não faz uso de planejamento de ações. Os resultados obtidos apontam que o modelo proposto é capaz de estimar as posturas dos robôs com uma menor quantidade de passos, sendo significativamente mais e ciente do que o modelo comparado sem planejamento. / Abstract: In a cooperative multi-robot localization problem, a group of robots is in a certain environment, where the exact location of each robot is unknown. In this scenario, there is only a distribution of probabilities indicating the chance of a robot to be in a particular state. It is necessary for the robots to move in the environment generating new observations, which will be shared to calculate new estimates. Currently, many studies have focused on the study of probabilistic techniques, models of communication and models of detection to solve the localization problem. However, the movement of robots is generally defined by random actions. Random actions generate observations that can be useless for improving the estimate. This work describes a proposal for multi-robot localization with support planning of actions. The objective is to describe a model whose actions performed by robots are defined by policies. Choosing the best action to be performed, the robot gets more useful information from internal and external sensors and estimates the posture more quickly. The proposed model, called Model of Planned Localization - MPL, uses POMDPs to model the problems of location and specific algorithms to generate policies. The Markov localization was used as probabilistic technique of localization and implemented versions of detection models and information propagation model. In this work, a simulator to multi-robot localization problems was developed, in which experiments were performed. The proposed model was compared to a model that does not make use of planning actions. The results showed that the proposed model is able to estimate the positions of robots with lower number of steps, being more e-cient than model compared. / Mestrado / Inteligencia Artificial / Mestre em Ciência da Computação

Page generated in 0.1117 seconds