• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 59
  • 19
  • 9
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • 1
  • 1
  • Tagged with
  • 111
  • 111
  • 38
  • 29
  • 22
  • 21
  • 21
  • 19
  • 17
  • 13
  • 12
  • 12
  • 11
  • 11
  • 11
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
101

Alcançabilidade e controlabilidade médias para sistemas lineares com saltos markovianos a tempo contínuo / Average reachability and average controllability for continuous-time markov jum linear systems

Alfredo Rafael Roa Narvaez 06 March 2015 (has links)
Neste trabalho estudamos as noções de alcançabilidade e controlabilidade para sistemas lineares a tempo contínuo com perturbações aditivas e saltos nos parâmetros sujeitos a uma cadeia de Markov geral. Definimos conceitos de alcançabilidade e controlabilidade médios de maneira natural exigindo que os valores esperados dos gramianos correspondentes sejam definidos positivos. Visando obter uma condição testável para ambos os conceitos, introduzimos conjuntos de matrizes de alcançabilidade e de controlabilidade para esta classe de sistemas e usamos certas propriedades de invariância para mostrar que: o sistema é alcançável em média, e, analogamente, controlável em média, se e somente se as matrizes respectivas, de alcançabilidade e de controlabilidade, têm posto completo. Usamos alcançabilidade média de sistemas para mostrar que a matriz de segundo momento do estado é definida positiva com uma margem uniforme. Uma consequência deste resultado no problema de estimação linear do estado é que a matriz de covariância do erro de estimação é positiva definida em média, no sentido que existe um nível mínimo de ruído nas estimativas. Na sequência, para estimadores lineares markovianos, estudamos a limitação do valor esperado da matriz de covariância do erro para mostrar que o filtro é estável num certo sentido, sendo esta uma propriedade desejável em aplicações reais. Quanto às aplicações da controlabilidade média, usamos este conceito para estabelecer condições necessárias e suficientes que garantem a existência de um processo de controle que leva a componente contínua do estado do sistema para a origem em tempo finito e com probabilidade positiva. / In this work we study the reachability and controllability notions for continuous-time linear systems with exogenous inputs and jump parameters driven by a quite general Markov chain. We define a rather natural average reachability and controllability concepts by requiring that the associated gramians are average positive definite, respectively. Aiming at testable conditions for each concept, we introduce certain sets of matrices linked with the gramians, and employ some invariance properties to find rank-based conditions. We show for average reachable systems that the state second moment is positive definite. One consequence of this result in the context of linear estimation for reachable systems is that the expectation of the error covariance matrix is positive definite. Moreover, for linear markovian filters we study the average boundedness of the error covariance matrix to show that the filter is stable in an appropriate sense, which consists in a property that is desirable in real applications. Regarding the average controllability concept, we show that it is a necessary and sufficient condition for the feasibility of the following control problem: find a control process that drives the continuous component of the state to zero in finite time with positive probability.
102

Filtragem de Kalman aplicada à computação digital com abordagem de espaço de estado variante no tempo / Kalman filtering applied to a digital computing process with a time-varying state space approach

Battaglin, Paulo David, 1951- 26 August 2018 (has links)
Orientador: Gilmar Barreto / Tese (doutorado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de Computação / Made available in DSpace on 2018-08-26T06:42:54Z (GMT). No. of bitstreams: 1 Battaglin_PauloDavid_D.pdf: 3180685 bytes, checksum: 5e1e9893bb97a4df42116a4c0d8b10d6 (MD5) Previous issue date: 2014 / Resumo: Este trabalho mostrará a aplicação do filtro de Kalman a um processo computacional discreto, o qual será representado por um modelo matemático que é um sistema de equações lineares, multivariáveis, discretas, estocásticas e variantes no tempo. As contribuições desta pesquisa evidenciam a construção de um modelo matemático apropriado de observabilidade instantânea para representar sistemas que variam rapidamente no tempo; a construção dos fundamentos teóricos do filtro de Kalman a ser aplicado em sistemas lineares, multivariáveis, discretos, estocásticos e variantes no tempo; bem como a construção deste filtro neste contexto e sua aplicação a um processo computacional discreto. Neste trabalho propomos um método para determinar: a matriz de observabilidade instantânea, o vetor de estimação de estado interno, a matriz de covariâncias de erros de estimação de estado interno e a latência de um processo computacional discreto, quando as medidas na saída do computador são conhecidas. Aqui mostramos que quando a propriedade observabilidade instantânea do sistema é verificada, a latência de um processo computacional pode ser estimada. Esta é uma vantagem comparada com os métodos de observabilidade usual, os quais são baseados em cenários estáticos. A aplicação potencial dos resultados deste trabalho é na predição de congestionamentos em processos que variam no tempo e acontecem em computadores digitais. Em uma perspectiva mais ampla, o método da observabilidade instantânea pode ser aplicado na identificação de patologias, na previsão de tempo, em navegação e rastreamento no solo, na água e no ar; no mercado de ações e em muitas outras áreas / Abstract: This work will show the application of the Kalman filter to a discrete computational process, which will be represented by a mathematical model: a system of linear, multivariable, discrete, stochastic and time-varying equations. The contributions of this research show the construction of an appropriate mathematical model of instantaneous observability to represent systems that vary quickly in time; the construction of the theoretical foundations of the Kalman filter to be applied to a linear, multivariable, discrete, stochastic and time-varying system; the construction of this filter in this context and its application to a discrete computational process. In this research we propose a method to determine: the instantaneous observability matrix, the internal state vector estimation, Covariance matrix of internal state estimation error and the latency of a digital computational process, when the measures on the computer output are known. Here we show that when the instantaneous observability property of the system comes true, a computing process latency can be estimated. This is an advantage compared to usual observability methods, which are based on static scenarios. The potential application of the results of this work is to predict bottlenecks in time-varying processes which happen inside the discrete computers. In a broader perspective, the instantaneous observability method can be applied on identification of a pathology, weather forecast, navigation and tracking on ground, in the water and in the air; in stock market prediction and many other areas / Doutorado / Automação / Doutor em Engenharia Elétrica
103

Identificação e controle estocasticos descentralizados de sistemas interconectados multivariaveis no espaço de estado / Stochastic identification and descentralized control of multivariable interconnected systems in the state space

Torrico Caceres, Angel Fernando 26 July 2005 (has links)
Orientador: Celso Pascoli Bottura / Tese (doutorado) - Universidade Estadual de Campinas, Faculdade de Engenharia Eletrica e de Computação / Made available in DSpace on 2018-08-04T15:52:49Z (GMT). No. of bitstreams: 1 TorricoCaceres_AngelFernando_D.pdf: 1145129 bytes, checksum: e5817164d343ed7c520ead7ed9865194 (MD5) Previous issue date: 2005 / Resumo: Nesta Tese, uma metodologia descentralizada de identificação linear no espaço de estado para sistemas multivariáveis estocásticos, discretos no tempo e serialmente interconectados, é proposta. A identificação do sistema global pode ser feita por meio da identificação individual dos seus subsistemas usando-se algum método de identificação de sistemas e de séries temporais multivariáveis no espaço de estado, dentre os aqui discutidos: Identificação no Espaço de Estado do Erro de Saída de Sistemas Multivariáveis (MOESP), Algoritmos Numéricos para a Identificação nos Subespaços de Sistemas no Espaço de Estado (N4SID), realização estocástica com entradas exógenas utilizando mínimos quadrados restrito, (CLS-SSI) e MOESP-AOKI. Com base nos modelos obtidos para os subsistemas, uma etodologia de controle ótimo descentralizado que explora a estrutura Bloco Triangular Inferior das matrizes do sistema é utilizada. A metodologia combinada de identificação e de controle estocásticos descentralizados, estruturada neste estudo, é aplicada a sistema interconectado de qualidade de água de rio, que motivou este trabalho / Abstract: In this thesis a decentralized methodology for linear state space identification of discrete time, serially interconnected multivariable stochastic systems is proposed. The global system identificationis achieved by means of the individual identification of its subsystems through some state space methods for identification of multivariable systems and time series, among the ones here discussed: Multivariable Output-Error State Space Identification (MOESP), Numerical Algorithms for SubspaceState Space Systems Identification (N4SID), Constrained Least-Squares State Space Identification (CLS-SSI), MOESP-AOKI. Based on the obtained subsystems models a methodology of optimal decentralized control systems that explores the matrices Lower Block Triangular structure is utilized. The combined decentralized stochastic identification and control methodology structured in this study is applied to an interconnected river water quality system, that motivated this work / Doutorado / Automação / Doutor em Engenharia Elétrica
104

[pt] EXPLORANDO O CALOR NA TERMODINÂMICA ESTOCÁSTICA / [en] EXPLORING THE HEAT IN STOCHASTIC THERMODYNAMICS

PEDRO VENTURA PARAGUASSU 04 September 2023 (has links)
[pt] Na Termodinâmica estocástica, o calor é uma variável aleatória que flutua estatisticamente e, portanto, precisa ser investigada por meio de métodos estatísticos. Para compreender essa quantidade, a investigamos em diversos sistemas, como superamortecidos, subamortecidos, não-lineares, isotérmicos e não-isotérmicos. Os resultados aqui obtidos podem ser divididos em duas contribuições: a caracterização das distribuições de calor e dos momentos para diferentes sistemas, e a correção da fórmula do calor para sistemas superamortecidos, onde descobrimos a necessidade de incluir a energia cinética, que era previamente ignorada na literatura. Esta tese tem como foco a compreensão do calor, quantidade fundamental na termodinâmica estocástica. / [en] In Stochastic Thermodynamics, heat is a random variable that statistically fluctuates and therefore needs to be investigated using statistical methods. To understand this quantity, we investigated it for various systems, overdamped, underdamped, nonlinear, isothermal, and non-isothermal. The resultsobtained here can be divided into two contributions, the characterization ofthe distributions of heat and the moments in these different systems, and thecorrection of the formula of heat for overdamped systems, where we discoveredthe need to include the kinetic energy that was previously ignored in the literature. This thesis focuses on understanding heat, a quantity that is fundamentalin stochastic thermodynamics.
105

Switched Markov Jump Linear Systems: Analysis and Control Synthesis

Lutz, Collin C. 14 November 2014 (has links)
Markov jump linear systems find application in many areas including economics, fault-tolerant control, and networked control. Despite significant attention paid to Markov jump linear systems in the literature, few authors have investigated Markov jump linear systems with time-inhomogeneous Markov chains (Markov chains with time-varying transition probabilities), and even fewer authors have considered time-inhomogeneous Markov chains with a priori unknown transition probabilities. This dissertation provides a formal stability and disturbance attenuation analysis for a Markov jump linear system where the underlying Markov chain is characterized by an a priori unknown sequence of transition probability matrices that assumes one of finitely-many values at each time instant. Necessary and sufficient conditions for uniform stochastic stability and uniform stochastic disturbance attenuation are reported. In both cases, conditions are expressed as a set of finite-dimensional linear matrix inequalities (LMIs) that can be solved efficiently. These finite-dimensional LMI analysis results lead to nonconservative LMI formulations for optimal controller synthesis with respect to disturbance attenuation. As a special case, the analysis also applies to a Markov jump linear system with known transition probabilities that vary in a finite set. / Ph. D.
106

Advances in the stochastic and deterministic analysis of multistable biochemical networks

Petrides, Andreas January 2018 (has links)
This dissertation is concerned with the potential multistability of protein concentrations in the cell that can arise in biochemical networks. That is, situations where one, or a family of, proteins may sit at one of two or more different steady state concentrations in otherwise identical cells, and in spite of them being in the same environment. Models of multisite protein phosphorylation have shown that this mechanism is able to exhibit unlimited multistability. Nevertheless, these models have not considered enzyme docking, the binding of the enzymes to one or more substrate docking sites, which are separate from the motif that is chemically modified. Enzyme docking is, however, increasingly being recognised as a method to achieve specificity in protein phosphorylation and dephosphorylation cycles. Most models in the literature for these systems are deterministic i.e. based on Ordinary Differential Equations, despite the fact that these are accurate only in the limit of large molecule numbers. For small molecule numbers, a discrete probabilistic, stochastic, approach is more suitable. However, when compared to the tools available in the deterministic framework, the tools available for stochastic analysis offer inadequate visualisation and intuition. We firstly try to bridge that gap, by developing three tools: a) a discrete `nullclines' construct applicable to stochastic systems - an analogue to the ODE nullcines, b) a stochastic tool based on a Weakly Chained Diagonally Dominant M-matrix formulation of the Chemical Master Equation and c) an algorithm that is able to construct non-reversible Markov chains with desired stationary probability distributions. We subsequently prove that, for multisite protein phosphorylation and similar models, in the deterministic domain, enzyme docking and the consequent substrate enzyme-sequestration must inevitably limit the extent of multistability, ultimately to one steady state. In contrast, bimodality can be obtained in the stochastic domain even in situations where bistability is not possible for large molecule numbers. We finally extend our results to cases where we have an autophosphorylating kinase, as for example is the case with $Ca^{2+}$/calmodulin-dependent protein kinase II (CaMKII), a key enzyme in synaptic plasticity.
107

O conceito de estabilizabilidade fraca para sistemas lineares com saltos Markovianos / The weak stabilizability concept for linear systems with Markov jump

Manfrim, Amanda Liz Pacífico 08 March 2006 (has links)
Este trabalho introduz os conceitos de controlabilidade fraca e estabilizabilidade fraca para sistemas lineares com parâmetros sujeitos a saltos Markovianos a tempo discreto. É, inicialmente, construída uma coleção de matrizes C que se assemelha às matrizes de controlabilidade de sistemas lineares deterministicos. Essa coleção de matrizes C nos permite definir um conceito de controlabilidade fraca, requerendo que elas sejam de posto completo, assim como introduzir um conceito de estabilizabilidade fraca, dual ao conceito de detetabilidade fraca encontrado na literatura de sistemas com saltos de Markov. Uma característica importante do conceito de estabilizabilidade fraca é a de generalizar o conceito de estabilizabilidade na média quadrática, anteriormente encontrado na literatura. O papel do conceito da estabilizabilidade fraca no problema de filtragem é investigado através de casos de estudo. Estes casos de estudo são desenvolvidos no contexto do filtro de Kalman com observação do parâmetro de Markov e sugerem que a estabilizabilidade fraca em conjunto com a detetabilidade na média quadrática garantem que o estimador seja estável na média quadrática. / This work introduces weak controllability and weak stabilizability concepts for discretetime Markov jump linear system. We introduce a collection of matrices C that resembles controllability matrices of deterministic linear systems. The collection of matrices C allows us to define a weak controllability concept by requiring that the matrices are full rank, as well as to introduce a weak stabilizability concept that is a dual of the weak detectability concept found in the literature of Markov jump systems. An important feature of the introduced concept is that it generalizes the previous concept of mean square stabilizability. The role that the weak stabilizability concept plays in the filtering problem is investigated via case studies. These case studies are developed in the context of Kalman filtering with observation of the Markov parameter, they suggest that weak stabilizability together with mean square stabilizability ensure that the state estimator is mean square stable.
108

O conceito de estabilizabilidade fraca para sistemas lineares com saltos Markovianos / The weak stabilizability concept for linear systems with Markov jump

Amanda Liz Pacífico Manfrim 08 March 2006 (has links)
Este trabalho introduz os conceitos de controlabilidade fraca e estabilizabilidade fraca para sistemas lineares com parâmetros sujeitos a saltos Markovianos a tempo discreto. É, inicialmente, construída uma coleção de matrizes C que se assemelha às matrizes de controlabilidade de sistemas lineares deterministicos. Essa coleção de matrizes C nos permite definir um conceito de controlabilidade fraca, requerendo que elas sejam de posto completo, assim como introduzir um conceito de estabilizabilidade fraca, dual ao conceito de detetabilidade fraca encontrado na literatura de sistemas com saltos de Markov. Uma característica importante do conceito de estabilizabilidade fraca é a de generalizar o conceito de estabilizabilidade na média quadrática, anteriormente encontrado na literatura. O papel do conceito da estabilizabilidade fraca no problema de filtragem é investigado através de casos de estudo. Estes casos de estudo são desenvolvidos no contexto do filtro de Kalman com observação do parâmetro de Markov e sugerem que a estabilizabilidade fraca em conjunto com a detetabilidade na média quadrática garantem que o estimador seja estável na média quadrática. / This work introduces weak controllability and weak stabilizability concepts for discretetime Markov jump linear system. We introduce a collection of matrices C that resembles controllability matrices of deterministic linear systems. The collection of matrices C allows us to define a weak controllability concept by requiring that the matrices are full rank, as well as to introduce a weak stabilizability concept that is a dual of the weak detectability concept found in the literature of Markov jump systems. An important feature of the introduced concept is that it generalizes the previous concept of mean square stabilizability. The role that the weak stabilizability concept plays in the filtering problem is investigated via case studies. These case studies are developed in the context of Kalman filtering with observation of the Markov parameter, they suggest that weak stabilizability together with mean square stabilizability ensure that the state estimator is mean square stable.
109

Dynamic Resampling for Preference-based Evolutionary Multi-objective Optimization of Stochastic Systems : Improving the efficiency of time-constrained optimization

Siegmund, Florian January 2016 (has links)
In preference-based Evolutionary Multi-objective Optimization (EMO), the decision maker is looking for a diverse, but locally focused non-dominated front in a preferred area of the objective space, as close as possible to the true Pareto-front. Since solutions found outside the area of interest are considered less important or even irrelevant, the optimization can focus its efforts on the preferred area and find the solutions that the decision maker is looking for more quickly, i.e., with fewer simulation runs. This is particularly important if the available time for optimization is limited, as is the case in many real-world applications. Although previous studies in using this kind of guided-search with preference information, for example, withthe R-NSGA-II algorithm, have shown positive results, only very few of them considered the stochastic outputs of simulated systems. In the literature, this phenomenon of stochastic evaluation functions is sometimes called noisy optimization. If an EMO algorithm is run without any countermeasure to noisy evaluation functions, the performance will deteriorate, compared to the case if the true mean objective values are known. While, in general, static resampling of solutions to reduce the uncertainty of all evaluated design solutions can allow EMO algorithms to avoid this problem, it will significantly increase the required simulation time/budget, as many samples will be wasted on candidate solutions which are inferior. In comparison, a Dynamic Resampling (DR) strategy can allow the exploration and exploitation trade-off to be optimized, since the required accuracy about objective values varies between solutions. In a dense, converged population, itis important to know the accurate objective values, whereas noisy objective values are less harmful when an algorithm is exploring the objective space, especially early in the optimization process. Therefore, a well-designed Dynamic Resampling strategy which resamples the solution carefully, according to the resampling need, can help an EMO algorithm achieve better results than a static resampling allocation. While there are abundant studies in Simulation-based Optimization that considered Dynamic Resampling, the survey done in this study has found that there is no related work that considered how combinations of Dynamic Resampling and preference-based guided search can further enhance the performance of EMO algorithms, especially if the problems under study involve computationally expensive evaluations, like production systems simulation. The aim of this thesis is therefore to study, design and then to compare new combinations of preference-based EMO algorithms with various DR strategies, in order to improve the solution quality found by simulation-based multi-objective optimization with stochastic outputs, under a limited function evaluation or simulation budget. Specifically, based on the advantages and flexibility offered by interactive, reference point-based approaches, studies of the performance enhancements of R-NSGA-II when augmented with various DR strategies, with increasing degrees of statistical sophistication, as well as several adaptive features in terms of optimization parameters, have been made. The research results have clearly shown that optimization results can be improved, if a hybrid DR strategy is used and adaptive algorithm parameters are chosen according to the noise level and problem complexity. In the case of a limited simulation budget, the results allow the conclusions that both decision maker preferences and DR should be used at the same time to achieve the best results in simulation-based multi-objective optimization. / Vid preferensbaserad evolutionär flermålsoptimering försöker beslutsfattaren hitta lösningar som är fokuserade kring ett valt preferensområde i målrymden och som ligger så nära den optimala Pareto-fronten som möjligt. Eftersom lösningar utanför preferensområdet anses som mindre intressanta, eller till och med oviktiga, kan optimeringen fokusera på den intressanta delen av målrymden och hitta relevanta lösningar snabbare, vilket betyder att färre lösningar behöver utvärderas. Detta är en stor fördel vid simuleringsbaserad flermålsoptimering med långa simuleringstider eftersom antalet olika konfigurationer som kan simuleras och utvärderas är mycket begränsat. Även tidigare studier som använt fokuserad flermålsoptimering styrd av användarpreferenser, t.ex. med algoritmen R-NSGA-II, har visat positiva resultat men enbart få av dessa har tagit hänsyn till det stokastiska beteendet hos de simulerade systemen. I litteraturen kallas optimering med stokastiska utvärderingsfunktioner ibland "noisy optimization". Om en optimeringsalgoritm inte tar hänsyn till att de utvärderade målvärdena är stokastiska kommer prestandan vara lägre jämfört med om optimeringsalgoritmen har tillgång till de verkliga målvärdena. Statisk upprepad utvärdering av lösningar med syftet att reducera osäkerheten hos alla evaluerade lösningar hjälper optimeringsalgoritmer att undvika problemet, men leder samtidigt till en betydande ökning av antalet nödvändiga simuleringar och därigenom en ökning av optimeringstiden. Detta är problematiskt eftersom det innebär att många simuleringar utförs i onödan på undermåliga lösningar, där exakta målvärden inte bidrar till att förbättra optimeringens resultat. Upprepad utvärdering reducerar ovissheten och hjälper till att förbättra optimeringen, men har också ett pris. Om flera simuleringar används för varje lösning så minskar antalet olika lösningar som kan simuleras och sökrymden kan inte utforskas lika mycket, givet att det totala antalet simuleringar är begränsat. Dynamisk upprepad utvärdering kan däremot effektivisera flermålsoptimeringens avvägning mellan utforskning och exploatering av sökrymden baserat på det faktum att den nödvändiga precisionen i målvärdena varierar mellan de olika lösningarna i målrymden. I en tät och konvergerad population av lösningar är det viktigt att känna till de exakta målvärdena, medan osäkra målvärden är mindre skadliga i ett tidigt stadium i optimeringsprocessen när algoritmen utforskar målrymden. En dynamisk strategi för upprepad utvärdering med en noggrann allokering av utvärderingarna kan därför uppnå bättre resultat än en allokering som är statisk. Trots att finns ett rikligt antal studier inom simuleringsbaserad optimering som använder sig av dynamisk upprepad utvärdering så har inga relaterade studier hittats som undersöker hur kombinationer av dynamisk upprepad utvärdering och preferensbaserad styrning kan förbättra prestandan hos algoritmer för flermålsoptimering ytterligare. Speciell avsaknad finns det av studier om optimering av problem med långa simuleringstider, som t.ex. simulering av produktionssystem. Avhandlingens mål är därför att studera, konstruera och jämföra nya kombinationer av preferensbaserade optimeringsalgoritmer och dynamiska strategier för upprepad utvärdering. Syftet är att förbättra resultatet av simuleringsbaserad flermålsoptimering som har stokastiska målvärden när antalet utvärderingar eller optimeringstiden är begränsade. Avhandlingen har speciellt fokuserat på att undersöka prestandahöjande åtgärder hos algoritmen R-NSGA-II i kombination med dynamisk upprepad utvärdering, baserad på fördelarna och flexibiliteten som interaktiva referenspunktbaserade algoritmer erbjuder. Exempel på förbättringsåtgärder är dynamiska algoritmer för upprepad utvärdering med förbättrad statistisk osäkerhetshantering och adaptiva optimeringsparametrar. Resultaten från avhandlingen visar tydligt att optimeringsresultaten kan förbättras om hybrida dynamiska algoritmer för upprepad utvärdering används och adaptiva optimeringsparametrar väljs beroende på osäkerhetsnivån och komplexiteten i optimeringsproblemet. För de fall där simuleringstiden är begränsad är slutsatsen från avhandlingen att både användarpreferenser och dynamisk upprepad utvärdering bör användas samtidigt för att uppnå de bästa resultaten i simuleringsbaserad flermålsoptimering.
110

Learning in Partially Observable Markov Decision Processes

Sachan, Mohit 21 August 2013 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Learning in Partially Observable Markov Decision process (POMDP) is motivated by the essential need to address a number of realistic problems. A number of methods exist for learning in POMDPs, but learning with limited amount of information about the model of POMDP remains a highly anticipated feature. Learning with minimal information is desirable in complex systems as methods requiring complete information among decision makers are impractical in complex systems due to increase of problem dimensionality. In this thesis we address the problem of decentralized control of POMDPs with unknown transition probabilities and reward. We suggest learning in POMDP using a tree based approach. States of the POMDP are guessed using this tree. Each node in the tree has an automaton in it and acts as a decentralized decision maker for the POMDP. The start state of POMDP is known as the landmark state. Each automaton in the tree uses a simple learning scheme to update its action choice and requires minimal information. The principal result derived is that, without proper knowledge of transition probabilities and rewards, the automata tree of decision makers will converge to a set of actions that maximizes the long term expected reward per unit time obtained by the system. The analysis is based on learning in sequential stochastic games and properties of ergodic Markov chains. Simulation results are presented to compare the long term rewards of the system under different decision control algorithms.

Page generated in 0.109 seconds