• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 52
  • 22
  • 17
  • 6
  • 6
  • 5
  • 1
  • Tagged with
  • 137
  • 137
  • 113
  • 40
  • 28
  • 22
  • 22
  • 21
  • 19
  • 18
  • 17
  • 17
  • 17
  • 16
  • 15
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
81

Soluções eficientes para processos de decisão markovianos baseadas em alcançabilidade e bissimulações estocásticas / Efficient solutions to Markov decision processes based on reachability and stochastic bisimulations

Felipe Martins dos Santos 09 December 2013 (has links)
Planejamento em inteligência artificial é a tarefa de determinar ações que satisfaçam um dado objetivo. Nos problemas de planejamento sob incerteza, as ações podem ter efeitos probabilísticos. Esses problemas são modelados como Processos de Decisão Markovianos (Markov Decision Processes - MDPs), modelos que permitem o cálculo de soluções ótimas considerando o valor esperado de cada ação em cada estado. Contudo, resolver problemas grandes de planejamento probabilístico, i.e., com um grande número de estados e ações, é um enorme desafio. MDPs grandes podem ser reduzidos através da computação de bissimulações estocásticas, i.e., relações de equivalência sobre o conjunto de estados do MDP original. A partir das bissimulações estocásticas, que podem ser exatas ou aproximadas, é possível obter um modelo abstrato reduzido que pode ser mais fácil de resolver do que o MDP original. No entanto, para problemas de alguns domínios, a computação da bissimulação estocástica sobre todo o espaço de estados é inviável. Os algoritmos propostos neste trabalho estendem os algoritmos usados para a computação de bissimulações estocásticas para MDPs de forma que elas sejam computadas sobre o conjunto de estados alcançáveis a partir de um dado estado inicial, que pode ser muito menor do que o conjunto de estados completo. Os resultados experimentais mostram que é possível resolver problemas grandes de planejamento probabilístico com desempenho superior às técnicas conhecidas de bissimulação estocástica. / Planning in artificial intelligence is the task of finding actions to reach a given goal. In planning under uncertainty, the actions can have probabilistic effects. This problems are modeled using Markov Decision Processes (MDPs), models that enable the computation of optimal solutions considering the expected value of each action when applied in each state. However, to solve big probabilistic planning problems, i.e., those with a large number of states and actions, is still a challenge. Large MDPs can be reduced by computing stochastic bisimulations, i.e., equivalence relations over the original MDP states. From the stochastic bisimulations, that can be exact or approximated, it is possible to get an abstract reduced model that can be easier to solve than the original MDP. But, for some problems, the stochastic bisimulation computation over the whole state space is unfeasible. The algorithms proposed in this work extend the algorithms that are used to compute stochastic bisimulations for MDPs in a way that they can be computed over the reachable set of states with a given initial state, which can be much smaller than the complete set of states. The empirical results show that it is possible to solve large probabilistic planning problems with better performance than the known techniques of stochastic bisimulation.
82

Projeto habilis : a logica fuzzy contribuindo com o autoconhecimento e a escolha da profissão / Habilis project : the fuzzy logic contributing to self-knowledge and choice

Baumgartner, Ronaldo 14 August 2018 (has links)
Orientador: Geraldo Lucio Diniz / Dissertação (mestrado profissional) Universidade Estadual de Campinas, Instituto de Matematica, Estatistica e Computação Cientifica / Made available in DSpace on 2018-08-14T22:58:03Z (GMT). No. of bitstreams: 1 Baumgartner_Ronaldo_M.pdf: 1920777 bytes, checksum: e7ec868482e4dfcf3fc862840df5519f (MD5) Previous issue date: 2009 / Resumo: O presente trabalho propõe através do modelo fuzzy, a construção de um sistema de inferência para uma ferramenta de autoconhecimento (Método Habilis), que forneça orientação às pessoas com relação às tomadas de decisão quanto à escolha da profissão, curso universitário, papéis profissionais, etc. através da análise de seus sonhos, medos e habilidades sensoriais, cognitivas e emocionais. Esta análise é feita através do método de inferência fuzzy do tipo Mamdani, utilizando-se da lógica fuzzy e da teoria dos conjuntos fuzzy. Este método tem como entrada, os sonhos, medos e habilidades da pessoa analisada e fornece como saída, índices que a classificam quanto ao possível sucesso nas profissões, cursos, papéis profissionais, etc. Neste trabalho, é apresentado como exemplo, o método de inferência fuzzy (Mamdani) para os potenciais funcionais. / Abstract: A fuzzy model is proposed in this thesis, in order to develop an inference system as a tool of self-knowledge (named by Habilis method), that have the purpose to provide a frame of reference for the people to make a decision of professional choice, university studies or professional functions, and others. The analysis is made based on data bank of experts and self-evaluation of the people that will be self-analyzed, with respect to yours setting of yearning-dream, limitation-phobias, sensory abilities, cognitive abilities and emotional abilities, using fuzzy sets and fuzzy logic by the inference method of Mandani. This analysis has as input the dreams, the fears and the abilities of the person that will be analyzed, and the system has as output a rank of professional success or performance of study or professional functions, through the rule-based computation. As example, is presented the inference method of Mandani for the professional functions. / Mestrado / Logica Fuzzy / Mestre em Matemática
83

E-livsmedel : Barriärer och främjande faktorer / Barriers and promotions in e-Grocery.

Ceder Molander, Josefin, Julkunen, Hanna January 2017 (has links)
First of all, we want to inform the readers that this thesis is written in Swedish before addressing the abstract.The possibility to buy groceries online has been a possibility since 1996 in Sweden, but the growth of the service did actually begin a few years ago. The two latest years of shopping food online – e-grocery has become one of the most fast growing markets online, and it is still growing. More and more customers have been drawn to this type of alternative to shop food. It is a quite relatively new market in progress, both for consumers and companies and the online market has still a few obstacles to overcome, for instance the delivery time and the online payment. The consumer today is in a changing phase and starts to see the advantages of buying groceries online. In this research the barriers and advantage factors have been identified towards the online grocery shopping, in ability to examine the situation today.In order to find these barriers and the advantages of e-grocery, a case study with ICA City in Borås was an optimal approach. With the help of ICA City's digital customers, a qualitative study with focus groups was performed and a quantitative data collection with questionnaires was made. The purpose of the qualitative research was to identify barriers and the advantage factors, and the quantitative study was to measure their importance of these. These barriers and advantages have been placed against the purchase decision process to identify when and how the consumers are affected.The conclusion of this study resulted in both old and new barriers against e-Grocery. E-Grocery needs to be better developed; especially when it comes to the interface, it needs to be user friendlier, as well as inspirational. The groceries online, is today inadequate, which affects many consumers' and the shopping experience. The trading strategy must be adapted to the digital customer, which leads to the conclusion of a need for a better and stronger website to attract more and loyal customers. The biggest advantages for e-Grocery in this research were the conveniences and the time the consumers save, on not having to go to a physical store and shop for food. Many of the participants who participated saw the benefit of the e-Groceries availability, when they could buy food whenever and wherever. / Att handla mat på nätet är något som har funnits möjlighet att göra sedan 1996, men det var först för några år sedan som det började växa kraftigt. De två senaste åren har försäljningen av mat online ökat kraftigt, och fler kunder lockas till detta alternativa sätt att handla mat. Det är en relativt ny marknad för både konsumenter och företag. Det finns många faktorer som hindrat kunder, men dessa barriärer är nu i en förändringsfas och konsumenterna börjar se de främjande faktorer som den digitala mathandeln tillför framför barriärerna. I denna forskning har barriärer och främjanden faktorer mot digital mathandel mäts för att se hur de ser ut idag.För att kunna ta reda på dessa barriärer och de främjande faktorerna blev ansatsen och tillvägagångssättet en fallstudie på ICA City i Borås. Med hjälp av ICA Citys digitala kunder utfördes en kvalitativ studie med fokusgrupper och utifrån den en kvantitativ insamling med ett frågeformulär. Syftet med den kvalitativa undersökningen var att identifierade barriärer och främjande faktorer, och med hjälp av den kvantitativa studien i mäta deras betydelse. Dessa barriärer och främjanden faktorerna har sedan placerats ut efter den köpbeslutsprocess för att identifiera när och hur konsumenten blir påverkad.Slutsatsen av studien resulterade i att de barriärer som funnits förr börjar försvinna och i samband med detta uppkommer nya. E-livsmedel är något som behöver utvecklas, speciellt när det kommer till användarvänlighet och inspiration. Sortimentet online är bristfälligt vilket påverkar många konsumenters upplevelse. Handelstrategin måste anpassas till den digitala kunden vilket leder till slutsatsen om ett behov av en bättre och starkare hemsida för att locka fler och trogna kunder. De främsta främjande faktorer för respondenterna i forskningen var bekvämligheten med e-livsmedel samt tiden som de sparar med att inte behöva åka till en fysisk butik och handla. Många deltagare såg även fördelen med butikens tillgänglig, närsomhelst och var som helst.
84

Programação dinâmica em tempo real para processos de decisão markovianos com probabilidades imprecisas / Real-time dynamic programming for Markov Decision Processes with Imprecise Probabilities

Daniel Baptista Dias 28 November 2014 (has links)
Em problemas de tomada de decisão sequencial modelados como Processos de Decisão Markovianos (MDP) pode não ser possível obter uma medida exata para as probabilidades de transição de estados. Visando resolver esta situação os Processos de Decisão Markovianos com Probabilidades Imprecisas (Markov Decision Processes with Imprecise Transition Probabilities, MDP-IPs) foram introduzidos. Porém, enquanto estes MDP-IPs se mostram como um arcabouço robusto para aplicações de planejamento no mundo real, suas soluções consomem muito tempo na prática. Em trabalhos anteriores, buscando melhorar estas soluções foram propostos algoritmos de programação dinâmica síncrona eficientes para resolver MDP-IPs com uma representação fatorada para as funções de transição probabilística e recompensa, chamados de MDP-IP fatorados. Entretanto quando o estado inicial de um problema do Caminho mais Curto Estocástico (Stochastic Shortest Path MDP, SSP MDP) é dado, estas soluções não utilizam esta informação. Neste trabalho será introduzido o problema do Caminho mais Curto Estocástico com Probabilidades Imprecisas (Stochastic Shortest Path MDP-IP, SSP MDP-IP) tanto em sua forma enumerativa, quanto na fatorada. Um algoritmo de programação dinâmica assíncrona para SSP MDP-IP enumerativos com probabilidades dadas por intervalos foi proposto por Buffet e Aberdeen (2005). Entretanto, em geral um problema é dado de forma fatorada, i.e., em termos de variáveis de estado e nesse caso, mesmo se for assumida a imprecisão dada por intervalos sobre as variáveis, ele não poderá ser mais aplicado, pois as probabilidades de transição conjuntas serão multilineares. Assim, será mostrado que os SSP MDP-IPs fatorados são mais expressivos que os enumerativos e que a mudança do SSP MDP-IP enumerativo para o caso geral de um SSP MDP-IPs fatorado leva a uma mudança de resolução da função objetivo do Bellman backup de uma função linear para uma não-linear. Também serão propostos algoritmos enumerativos, chamados de RTDP-IP (Real-time Dynamic Programming with Imprecise Transition Probabilities), LRTDP-IP (Labeled Real-time Dynamic Programming with Imprecise Transition Probabilities), SSiPP-IP (Short-Sighted Probabilistic Planner with Imprecise Transition Probabilities) e LSSiPP-IP (Labeled Short-Sighted Probabilistic Planner with Imprecise Transition Probabilities) e fatorados chamados factRTDP-IP (factored RTDP-IP) e factLRTDP-IP (factored LRTDP-IP). Eles serão avaliados em relação aos algoritmos de programação dinâmica síncrona em termos de tempo de convergência da solução e de escalabilidade. / In sequential decision making problems modelled as Markov Decision Processes (MDP) we may not have the state transition probabilities. To solve this issue, the framework based in Markov Decision Processes with Imprecise Transition Probabilities (MDP-IPs) is introduced. Therefore, while MDP-IPs is a robust framework to use in real world planning problems, its solutions are time-consuming in practice. In previous works, efficient algorithms based in synchronous dynamic programming to solve MDP-IPs with factored representations of the probabilistic transition function and reward function, called factored MDP-IPs. However, given a initial state of a system, modeled as a Stochastic Shortest Path MDP (SSP MDP), solutions does not use this information. In this work we introduce the Stochastic Shortest Path MDP-IPs (SSP MDP-IPs) in enumerative form and in factored form. An efficient asynchronous dynamic programming solution for SSP MDP-IPs with enumerated states has been proposed by Buffet e Aberdeen (2005) before which is restricted to interval-based imprecision. Nevertheless, in general the problem is given in a factored form, i.e., in terms of state variables and in this case even if we assume interval-based imprecision over the variables, the previous solution is no longer applicable since we have multilinear parameterized joint transition probabilities. In this work we show that the innocuous change from the enumerated SSP MDP-IP cases to the general case of factored SSP MDP-IPs leads to a switch from a linear to nonlinear objectives in the Bellman backup. Also we propose assynchronous dynamic programming enumerative algorithms, called RTDP-IP (Real-time Dynamic Programming with Imprecise Transition Probabilities), LRTDP-IP (Labeled Real-time Dynamic Programming with Imprecise Transition Probabilities), SSiPP-IP (Short-Sighted Probabilistic Planner with Imprecise Transition Probabilities) and LSSiPP-IP (Labeled Short-Sighted Probabilistic Planner with Imprecise Transition Probabilities), and factored algorithms called factRTDP-IP (factored RTDP-IP) and factLRTDP-IP (factored LRTDP-IP). There algorithms will be evaluated with the synchronous dynamic programming algorithms previously proposed in terms of convergence time and scalability.
85

Processos de decisão Markovianos fatorados com probabilidades imprecisas / Factored Markov decision processes with Imprecise Transition Probabilities

Karina Valdivia Delgado 19 January 2010 (has links)
Em geral, quando modelamos problemas de planejamento probabilístico do mundo real, usando o arcabouço de Processos de Decisão Markovianos (MDPs), é difícil obter uma estimativa exata das probabilidades de transição. A incerteza surge naturalmente na especificação de um domínio, por exemplo, durante a aquisição das probabilidades de transição a partir de um especialista ou de dados observados através de técnicas de amostragem, ou ainda de distribuições de transição não estacionárias decorrentes do conhecimento insuficiente do domínio. Com o objetivo de se determinar uma política robusta, dada a incerteza nas transições de estado, Processos de Decisão Markovianos com Probabilidades Imprecisas (MDP-IPs) têm sido usados para modelar esses cenários. Infelizmente, apesar de existirem diversos algoritmos de solução para MDP-IPs, muitas vezes eles exigem chamadas externas de rotinas de otimização que podem ser extremamente custosas. Para resolver esta deficiência, nesta tese, introduzimos o MDP-IP fatorado e propomos métodos eficientes de programação matemática e programação dinâmica que permitem explorar a estrutura de um domínio de aplicação. O método baseado em programação matemática propõe soluções aproximadas eficientes para MDP-IPs fatorados, estendendo abordagens anteriores de programação linear para MDPs fatorados. Essa proposta, baseada numa formulação multilinear para aproximações robustas da função valor de estados, explora a representação fatorada de um MDP-IP, reduzindo em ordens de magnitude o tempo consumido em relação às abordagens não-fatoradas previamente propostas. O segundo método proposto, baseado em programação dinâmica, resolve o gargalo computacional existente nas soluções de programação dinâmica para MDP-IPs propostas na literatura: a necessidade de resolver múltiplos problemas de otimização não-linear. Assim, mostramos como representar a função valor de maneira compacta usando uma nova estrutura de dados chamada de Diagramas de Decisão Algébrica Parametrizados, e como aplicar técnicas de aproximação para reduzir drasticamente a sobrecarga computacional das chamadas a um otimizador não-linear, produzindo soluções ótimas aproximadas com erro limitado. Nossos resultados mostram uma melhoria de tempo e até duas ordens de magnitude em comparação às abordagens tradicionais enumerativas baseadas em programação dinâmica e uma melhoria de tempo de até uma ordem de magnitude sobre a extensão de técnicas de iteração de valor aproximadas para MDPs fatorados. Além disso, produzimos o menor erro de todos os algoritmos de aproximação avaliados. / When modeling real-world decision-theoretic planning problems with the framework of Markov Decision Processes(MDPs), it is often impossible to obtain a completely accurate estimate of transition probabilities. For example, uncertainty arises in the specification of transitions due to elicitation of MDP transition models from an expert or data, or non-stationary transition distributions arising from insuficient state knowledge. In the interest of obtaining the most robust policy under transition uncertainty, Markov Decision Processes with Imprecise Transition Probabilities (MDP-IPs) have been introduced. Unfortunately, while various solutions exist for MDP-IPs, they often require external calls to optimization routines and thus can be extremely time-consuming in practice. To address this deficiency, we introduce the factored MDP-IP and propose eficient mathematical programming and dynamic programming methods to exploit its structure. First, we derive eficient approximate solutions for Factored MDP-IPs based on mathematical programming resulting in a multilinear formulation for robust maximin linear-value approximations in Factored MDP-IPs. By exploiting factored structure in MDP-IPs we are able to demonstrate orders of magnitude reduction in solution time over standard exact non-factored approaches. Second, noting that the key computational bottleneck in the dynamic programming solution of factored MDP-IPs is the need to repeatedly solve nonlinear constrained optimization problems, we show how to target approximation techniques to drastically reduce the computational overhead of the nonlinear solver while producing bounded, approximately optimal solutions. Our results show up to two orders of magnitude speedup in comparison to traditional at dynamic programming approaches and up to an order of magnitude speedup over the extension of factored MDP approximate value iteration techniques to MDP-IPs while producing the lowest error among all approximation algorithm evaluated.
86

The reinforcement learning method : A feasible and sustainable control strategy for efficient occupant-centred building operation in smart cities

May, Ross January 2019 (has links)
Over half of the world’s population lives in urban areas, a trend which is expected to only grow as we move further into the future. With this increasing trend in urbanisation, challenges are presented in the form of the management of urban infrastructure systems. As an essential infrastructure of any city, the energy system presents itself as one of the biggest challenges. As cities expand in population and economically, global energy consumption increases and as a result so do greenhouse gas (GHG) emissions. To achieve the 2030 Agenda’s sustainable development goal on energy (SDG 7), renewable energy and energy efficiency have been shown as key strategies for attaining SDG 7. As the largest contributor to climate change, the building sector is responsible for more than half of the global final energy consumption and GHG emissions. As people spend most of their time indoors, the demand for energy is made worse as a result of maintaining the comfort level of the indoor environment. However, the emergence of the smart city and the internet of things (IoT) offers the opportunity for the smart management of buildings. Focusing on the latter strategy towards attaining SDG 7, intelligent building control offers significant potential for saving energy while respecting occupant comfort (OC). Most intelligent control strategies, however, rely on complex mathematical models which require a great deal of expertise to construct thereby costing in time and money. Furthermore, if these are inaccurate then energy is wasted and the comfort of occupants is decreased. Moreover, any change in the physical environment such as retrofits result in obsolete models which must be re-identified to match the new state of the environment. This model-based approach seems unsustainable and so a new model-free alternative is proposed. One such alternative is the reinforcement learning (RL) method. This method provides a beautiful solution to accomplishing the tradeoff between energy efficiency and OC within the smart city and more importantly to achieving SDG 7. To address the feasibility of RL as a sustainable control strategy for efficient occupant-centred building operation, a comprehensive review of RL for controlling OC in buildings as well as a case study implementing RL for improving OC via a window system are presented. The outcomes of each seem to suggest RL as a feasible solution, however, more work is required in the form of addressing current open issues such as cooperative multi-agent RL (MARL) needed for multi-occupant/multi-zonal buildings.
87

Simulation Based Algorithms For Markov Decision Process And Stochastic Optimization

Abdulla, Mohammed Shahid 05 1900 (has links)
In Chapter 2, we propose several two-timescale simulation-based actor-critic algorithms for solution of infinite horizon Markov Decision Processes (MDPs) with finite state-space under the average cost criterion. On the slower timescale, all the algorithms perform a gradient search over corresponding policy spaces using two different Simultaneous Perturbation Stochastic Approximation (SPSA) gradient estimates. On the faster timescale, the differential cost function corresponding to a given stationary policy is updated and averaged for enhanced performance. A proof of convergence to a locally optimal policy is presented. Next, a memory efficient implementation using a feature-vector representation of the state-space and TD (0) learning along the faster timescale is discussed. A three-timescale simulation based algorithm for solution of infinite horizon discounted-cost MDPs via the Value Iteration approach is also proposed. An approximation of the Dynamic Programming operator T is applied to the value function iterates. A sketch of convergence explaining the dynamics of the algorithm using associated ODEs is presented. Numerical experiments on rate based flow control on a bottleneck node using a continuous-time queueing model are presented using the proposed algorithms. Next, in Chapter 3, we develop three simulation-based algorithms for finite-horizon MDPs (FHMDPs). The first algorithm is developed for finite state and compact action spaces while the other two are for finite state and finite action spaces. Convergence analysis is briefly sketched. We then concentrate on methods to mitigate the curse of dimensionality that affects FH-MDPs severely, as there is one probability transition matrix per stage. Two parametrized actor-critic algorithms for FHMDPs with compact action sets are proposed, the ‘critic’ in both algorithms learning the policy gradient. We show w.p1convergence to a set with the necessary condition for constrained optima. Further, a third algorithm for stochastic control of stopping time processes is presented. Numerical experiments with the proposed finite-horizon algorithms are shown for a problem of flow control in communication networks. Towards stochastic optimization, in Chapter 4, we propose five algorithms which are variants of SPSA. The original one measurement SPSA uses an estimate of the gradient of objective function L containing an additional bias term not seen in two-measurement SPSA. We propose a one-measurement algorithm that eliminates this bias, and has asymptotic convergence properties making for easier comparison with the two-measurement SPSA. The algorithm, under certain conditions, outperforms both forms of SPSA with the only overhead being the storage of a single measurement. We also propose a similar algorithm that uses perturbations obtained from normalized Hadamard matrices. The convergence w.p.1 of both algorithms is established. We extend measurement reuse to design three second-order SPSA algorithms, sketch the convergence analysis and present simulation results on an illustrative minimization problem. We then propose several stochastic approximation implementations for related algorithms in flow-control of communication networks, beginning with a discrete-time implementation of Kelly’s primal flow-control algorithm. Convergence with probability1 is shown, even in the presence of communication delays and stochastic effects seen in link congestion indications. Two relevant enhancements are then pursued :a) an implementation of the primal algorithm using second-order information, and b) an implementation where edge-routers rectify misbehaving flows. Also, discrete-time implementations of Kelly’s dual algorithm and primal-dual algorithm are proposed. Simulation results a) verifying the proposed algorithms and, b) comparing stability properties with an algorithm in the literature are presented.
88

Measuring and Influencing Sequential Joint Agent Behaviours

Raffensperger, Peter Abraham January 2013 (has links)
Algorithmically designed reward functions can influence groups of learning agents toward measurable desired sequential joint behaviours. Influencing learning agents toward desirable behaviours is non-trivial due to the difficulties of assigning credit for global success to the deserving agents and of inducing coordination. Quantifying joint behaviours lets us identify global success by ranking some behaviours as more desirable than others. We propose a real-valued metric for turn-taking, demonstrating how to measure one sequential joint behaviour. We describe how to identify the presence of turn-taking in simulation results and we calculate the quantity of turn-taking that could be observed between independent random agents. We demonstrate our turn-taking metric by reinterpreting previous work on turn-taking in emergent communication and by analysing a recorded human conversation. Given a metric, we can explore the space of reward functions and identify those reward functions that result in global success in groups of learning agents. We describe 'medium access games' as a model for human and machine communication and we present simulation results for an extensive range of reward functions for pairs of Q-learning agents. We use the Nash equilibria of medium access games to develop predictors for determining which reward functions result in turn-taking. Having demonstrated the predictive power of Nash equilibria for turn-taking in medium access games, we focus on synthesis of reward functions for stochastic games that result in arbitrary desirable Nash equilibria. Our method constructs a reward function such that a particular joint behaviour is the unique Nash equilibrium of a stochastic game, provided that such a reward function exists. This method builds on techniques for designing rewards for Markov decision processes and for normal form games. We explain our reward design methods in detail and formally prove that they are correct.
89

Reinforcement Learning for Parameter Control of Image-Based Applications

Taylor, Graham January 2004 (has links)
The significant amount of data contained in digital images present barriers to methods of learning from the information they hold. Noise and the subjectivity of image evaluation further complicate such automated processes. In this thesis, we examine a particular area in which these difficulties are experienced. We attempt to control the parameters of a multi-step algorithm that processes visual information. A framework for approaching the parameter selection problem using reinforcement learning agents is presented as the main contribution of this research. We focus on the generation of state and action space, as well as task-dependent reward. We first discuss the automatic determination of fuzzy membership functions as a specific case of the above problem. Entropy of a fuzzy event is used as a reinforcement signal. Membership functions representing brightness have been automatically generated for several images. The results show that the reinforcement learning approach is superior to an existing simulated annealing-based approach. The framework has also been evaluated by optimizing ten parameters of the text detection for semantic indexing algorithm proposed by Wolf et al. Image features are defined and extracted to construct the state space. Generalization to reduce the state space is performed with the fuzzy ARTMAP neural network, offering much faster learning than in the previous tabular implementation, despite a much larger state and action space. Difficulties in using a continuous action space are overcome by employing the DIRECT method for global optimization without derivatives. The chosen parameters are evaluated using metrics of recall and precision, and are shown to be superior to the parameters previously recommended. We further discuss the interplay between intermediate and terminal reinforcement.
90

Hierarchical reinforcement learning for spoken dialogue systems

Cuayáhuitl, Heriberto January 2009 (has links)
This thesis focuses on the problem of scalable optimization of dialogue behaviour in speech-based conversational systems using reinforcement learning. Most previous investigations in dialogue strategy learning have proposed flat reinforcement learning methods, which are more suitable for small-scale spoken dialogue systems. This research formulates the problem in terms of Semi-Markov Decision Processes (SMDPs), and proposes two hierarchical reinforcement learning methods to optimize sub-dialogues rather than full dialogues. The first method uses a hierarchy of SMDPs, where every SMDP ignores irrelevant state variables and actions in order to optimize a sub-dialogue. The second method extends the first one by constraining every SMDP in the hierarchy with prior expert knowledge. The latter method proposes a learning algorithm called 'HAM+HSMQ-Learning', which combines two existing algorithms in the literature of hierarchical reinforcement learning. Whilst the first method generates fully-learnt behaviour, the second one generates semi-learnt behaviour. In addition, this research proposes a heuristic dialogue simulation environment for automatic dialogue strategy learning. Experiments were performed on simulated and real environments based on a travel planning spoken dialogue system. Experimental results provided evidence to support the following claims: First, both methods scale well at the cost of near-optimal solutions, resulting in slightly longer dialogues than the optimal solutions. Second, dialogue strategies learnt with coherent user behaviour and conservative recognition error rates can outperform a reasonable hand-coded strategy. Third, semi-learnt dialogue behaviours are a better alternative (because of their higher overall performance) than hand-coded or fully-learnt dialogue behaviours. Last, hierarchical reinforcement learning dialogue agents are feasible and promising for the (semi) automatic design of adaptive behaviours in larger-scale spoken dialogue systems. This research makes the following contributions to spoken dialogue systems which learn their dialogue behaviour. First, the Semi-Markov Decision Process (SMDP) model was proposed to learn spoken dialogue strategies in a scalable way. Second, the concept of 'partially specified dialogue strategies' was proposed for integrating simultaneously hand-coded and learnt spoken dialogue behaviours into a single learning framework. Third, an evaluation with real users of hierarchical reinforcement learning dialogue agents was essential to validate their effectiveness in a realistic environment.

Page generated in 0.4914 seconds