Spelling suggestions: "subject:"codecision 1mprocesses"" "subject:"codecision byprocesses""
111 |
Age of Information: Fundamentals, Distributions, and ApplicationsAbd-Elmagid, Mohamed Abd-Elaziz 11 July 2023 (has links)
A typical model for real-time status update systems consists of a transmitter node that generates real-time status updates about some physical process(es) of interest and sends them through a communication network to a destination node. Such a model can be used to analyze the performance of a plethora of emerging Internet of Things (IoT)-enabled real-time applications including healthcare, factory automation, autonomous vehicles, and smart homes, to name a few. The performance of these applications highly depends upon the freshness of the information status at the destination node about its monitored physical process(es). Because of that, the main design objective of such real-time status update systems is to ensure timely delivery of status updates from the transmitter node to the destination node. To measure the freshness of information at the destination node, the Age of Information (AoI) has been introduced as a performance metric that accounts for the generation time of each status update (which was ignored by conventional performance metrics, specifically throughput and delay). Since then, there have been two main research directions in the AoI research area. The first direction aimed to analyze/characterize AoI in different queueing-theoretic models/disciplines, and the second direction was focused on the optimization of AoI in different communication systems that deal with time-sensitive information. However, the prior queueing-theoretic analyses of AoI have mostly been limited to the characterization of the average AoI and the prior studies developing AoI/age-aware scheduling/transmission policies have mostly ignored the energy constraints at the transmitter node(s). Motivated by these limitations, this dissertation develops new queueing-theoretic methods that allow the characterization of the distribution of AoI in several classes of status updating systems as well as novel AoI-aware scheduling policies accounting for the energy constraints at the transmitter nodes (for several settings of communication networks) in the process of decision-making using tools from optimization theory and reinforcement learning.
The first part of this dissertation develops a stochastic hybrid system (SHS)-based general framework to facilitate the analysis of characterizing the distribution of AoI in several classes of real-time status updating systems. First, we study a general setting of status updating systems, where a set of source nodes provide status updates about some physical process(es) to a set of monitors. For this setting, the continuous state of the system is formed by the AoI/age processes at different monitors, the discrete state of the system is modeled using a finite-state continuous-time Markov chain, and the coupled evolution of the continuous and discrete states of the system is described by a piecewise linear SHS with linear reset maps. Using the notion of tensors, we derive a system of linear equations for the characterization of the joint moment generating function (MGF) of an arbitrary set of age processes in the network. Afterwards, we study a general setting of gossip networks in which a source node forwards its measurements (in the form of status updates) about some observed physical process to a set of monitoring nodes according to independent Poisson processes. Furthermore, each monitoring node sends status updates about its information status (about the process observed by the source) to the other monitoring nodes according to independent Poisson processes. For this setup, we develop SHS-based methods that allow the characterization of higher-order marginal/joint moments of the age processes in the network. Finally, our SHS-based framework is applied to derive the stationary marginal and joint MGFs for several queueing disciplines and gossip network topologies, using which we derive closed-form expressions for marginal/joint high-order statistics of age processes, such as the variance of each age process and the correlation coefficients between all possible pairwise combinations of age processes.
In the second part of this dissertation, our analysis is focused on understanding the distributional properties of AoI in status updating systems powered by energy harvesting (EH). In particular, we consider a multi-source status updating system in which an EH-powered transmitter node has multiple sources generating status updates about several physical processes. The status updates are then sent to a destination node where the freshness of each status update is measured in terms of AoI. The status updates of each source and harvested energy packets are assumed to arrive at the transmitter according to independent Poisson processes, and the service time of each status update is assumed to be exponentially distributed. For this setup, we derive closed-form expressions of MGF of AoI under several queueing disciplines at the transmitter, including non-preemptive and source-agnostic/source-aware preemptive in service strategies. The generality of our analysis is demonstrated by recovering several existing results as special cases. A key insight from our characterization of the distributional properties of AoI is that it is crucial to incorporate the higher moments of AoI in the implementation/optimization of status updating systems rather than just relying on its average (as has been mostly done in the existing literature on AoI).
In the third and final part of this dissertation, we employ AoI as a performance metric for several settings of communication networks, and develop novel AoI-aware scheduling policies using tools from optimization theory and reinforcement learning. First, we investigate the role of an unmanned aerial vehicle (UAV) as a mobile relay to minimize the average peak AoI for a source-destination pair. For this setup, we formulate an optimization problem to jointly optimize the UAV's flight trajectory as well as energy and service time allocations for packet transmissions. This optimization problem is subject to the UAV's mobility constraints and the total available energy constraints at the source node and UAV. In order to solve this non-convex problem, we propose an efficient iterative algorithm and establish its convergence analytically. A key insight obtained from our results is that the optimal design of the UAV's flight trajectory achieves significant performance gains especially when the available energy at the source node and UAV is limited and/or when the size of the update packet is large. Afterwards, we study a generic system setup for an IoT network in which radio frequency (RF)-powered IoT devices are sensing different physical processes and need to transmit their sensed data to a destination node. For this generic system setup, we develop a novel reinforcement learning-based framework that characterizes the optimal sampling policy for IoT devices with the objective of minimizing the long-term weighted sum of average AoI values in the network. Our analytical results characterize the structural properties of the age-optimal policy, and demonstrate that it has a threshold-based structure with respect to the AoI values for different processes. They further demonstrate that the structures of the age-optimal and throughput-optimal policies are different. Finally, we analytically characterize the structural properties of the AoI-optimal joint sampling and updating policy for wireless powered communication networks while accounting for the costs of generating status updates in the process of decision-making. Our results demonstrate that the AoI-optimal joint sampling and updating policy has a threshold-based structure with respect to different system state variables. / Doctor of Philosophy / A typical model for real-time status update systems consists of a transmitter node that generates real-time status updates about some physical process(es) of interest and sends them through a communication network to a destination node. Such a model can be used to analyze the performance of a plethora of emerging Internet of Things (IoT)-enabled real-time applications including healthcare, factory automation, autonomous vehicles, and smart homes, to name a few. The performance of these applications highly depends upon the freshness of the information status at the destination node about its monitored physical process(es). Because of that, the main design objective of such real-time status update systems is to ensure timely delivery of status updates from the transmitter node to the destination node. To measure the freshness of information at the destination node, the Age of Information (AoI) has been introduced as a performance metric that accounts for the generation time of each status update (which was ignored by conventional performance metrics, specifically throughput and delay). Since then, there have been two main research directions in the AoI research area. The first direction aimed to analyze/characterize AoI in different queueing-theoretic models/disciplines, and the second direction was focused on the optimization of AoI in different communication systems that deal with time-sensitive information. However, the prior queueing-theoretic analyses of AoI have mostly been limited to the characterization of the average AoI and the prior studies developing AoI/age-aware scheduling/transmission policies have mostly ignored the energy constraints at the transmitter node(s). Motivated by these limitations, this dissertation first develops new queueing-theoretic methods that allow the characterization of the distribution of AoI in several classes of status updating systems. Afterwards, using tools from optimization theory and reinforcement learning, novel AoI-aware scheduling policies are developed while accounting for the energy constraints at the transmitter nodes for several settings of communication networks, including unmanned aerial vehicles (UAVs)-assisted and radio frequency (RF)-powered communication networks, in the process of decision-making.
In the first part of this dissertation, a stochastic hybrid system (SHS)-based general framework is first developed to facilitate the analysis of characterizing the distribution of AoI in several classes of real-time status updating systems. Afterwards, this framework is applied to derive the stationary marginal and joint moment generating functions (MGFs) for several queueing disciplines and gossip network topologies, using which we derive closed-form expressions for marginal/joint high-order statistics of age processes, such as the variance of each age process and the correlation coefficients between all possible pairwise combinations of age processes.
In the second part of this dissertation, our analysis is focused on understanding the distributional properties of AoI in status updating systems powered by energy harvesting (EH). In particular, we consider a multi-source status updating system in which an EH-powered transmitter node has multiple sources generating status updates about several physical processes. The status updates are then sent to a destination node where the freshness of each status update is measured in terms of AoI. For this setup, we derive closed-form expressions of MGF of AoI under several queueing disciplines at the transmitter. The generality of our analysis is demonstrated by recovering several existing results as special cases. A key insight from our characterization of the distributional properties of AoI is that it is crucial to incorporate the higher moments of AoI in the implementation/optimization of status updating systems rather than just relying on its average (as has been mostly done in the existing literature on AoI).
In the third and final part of this dissertation, we employ AoI as a performance metric for several settings of communication networks, and develop novel AoI-aware scheduling policies using tools from optimization theory and reinforcement learning. First, we investigate the role of a UAV as a mobile relay to minimize the average peak AoI for a source-destination pair. For this setup, we formulate an optimization problem to jointly optimize the UAV's flight trajectory as well as energy and service time allocations for packet transmissions. This optimization problem is subject to the UAV's mobility constraints and the total available energy constraints at the source node and UAV. A key insight obtained from our results is that the optimal design of the UAV's flight trajectory achieves significant performance gains especially when the available energy at the source node and UAV is limited and/or when the size of the update packet is large. Afterwards, we study a generic system setup for an IoT network in which RF-powered IoT devices are sensing different physical processes and need to transmit their sensed data to a destination node. For this generic system setup, we develop a novel reinforcement learning-based framework that characterizes the optimal sampling policy for IoT devices with the objective of minimizing the long-term weighted sum of average AoI values in the network. Our analytical results characterize the structural properties of the age-optimal policy, and demonstrate that it has a threshold-based structure with respect to the AoI values for different processes. They further demonstrate that the structures of the age-optimal and throughput-optimal policies are different. Finally, we analytically characterize the structural properties of the AoI-optimal joint sampling and updating policy for wireless powered communication networks while accounting for the costs of generating status updates in the process of decision-making. Our results demonstrate that the AoI-optimal joint sampling and updating policy has a threshold-based structure with respect to different system state variables.
|
112 |
Single and Multi-player Stochastic Dynamic OptimizationSaha, Subhamay January 2013 (has links) (PDF)
In this thesis we investigate single and multi-player stochastic dynamic optimization prob-lems. We consider both discrete and continuous time processes. In the multi-player setup we investigate zero-sum games with both complete and partial information. We study partially observable stochastic games with average cost criterion and the state process be-ing discrete time controlled Markov chain. The idea involved in studying this problem is to replace the original unobservable state variable with a suitable completely observable state variable. We establish the existence of the value of the game and also obtain optimal strategies for both players. We also study a continuous time zero-sum stochastic game with complete observation. In this case the state is a pure jump Markov process. We investigate the nite horizon total cost criterion. We characterise the value function via appropriate Isaacs equations. This also yields optimal Markov strategies for both players.
In the single player setup we investigate risk-sensitive control of continuous time Markov chains. We consider both nite and in nite horizon problems. For the nite horizon total cost problem and the in nite horizon discounted cost problem we characterise the value function as the unique solution of appropriate Hamilton Jacobi Bellman equations. We also derive optimal Markov controls in both the cases. For the in nite horizon average cost case we shown the existence of an optimal stationary control. we also give a value iteration scheme for computing the optimal control in the case of nite state and action spaces.
Further we introduce a new class of stochastic processes which we call stochastic processes with \age-dependent transition rates". We give a rigorous construction of the process. We prove that under certain assunptions the process is Feller. We also compute the limiting probabilities for our process. We then study the controlled version of the above process. In this case we take the risk-neutral cost criterion. We solve the in nite horizon discounted cost problem and the average cost problem for this process. The crucial step in analysing these problems is to prove that the original control problem is equivalent to an appropriate semi-Markov decision problem. Then the value functions and optimal controls are characterised using this equivalence and the theory of semi-Markov decision processes (SMDP). The analysis of nite horizon problems becomes di erent from that of in nite horizon problems because of the fact that in this case the idea of converting into an equivalent SMDP does not seem to work. So we deal with the nite horizon total cost problem by showing that our problem is equivalent to another appropriately de ned discrete time Markov decision problem. This allows us to characterise the value function and to nd an optimal Markov control.
|
113 |
Ant Colony Optimization and its Application to Adaptive Routing in Telecommunication NetworksDi Caro, Gianni 10 November 2004 (has links)
In ant societies, and, more in general, in insect societies, the activities of the individuals, as well as of the society as a whole, are not regulated by any explicit form of centralized control. On the other hand, adaptive and robust behaviors transcending the behavioral repertoire of the single individual can be easily observed at society level. These complex global behaviors are the result of self-organizing dynamics driven by local interactions and communications among a number of relatively simple individuals.
The simultaneous presence of these and other fascinating and unique characteristics have made ant societies an attractive and inspiring model for building new algorithms and new multi-agent systems. In the last decade, ant societies have been taken as a reference for an ever growing body of scientific work, mostly in the fields of robotics, operations research, and telecommunications.
Among the different works inspired by ant colonies, the Ant Colony Optimization metaheuristic (ACO) is probably the most successful and popular one. The ACO metaheuristic is a multi-agent framework for combinatorial optimization whose main components are: a set of ant-like agents, the use of memory and of stochastic decisions, and strategies of collective and distributed learning.
It finds its roots in the experimental observation of a specific foraging behavior of some ant colonies that, under appropriate conditions, are able to select the shortest path among few possible paths connecting their nest to a food site. The pheromone, a volatile chemical substance laid on the ground by the ants while walking and affecting in turn their moving decisions according to its local intensity, is the mediator of this behavior.
All the elements playing an essential role in the ant colony foraging behavior were understood, thoroughly reverse-engineered and put to work to solve problems of combinatorial optimization by Marco Dorigo and his co-workers at the beginning of the 1990's.
From that moment on it has been a flourishing of new combinatorial optimization algorithms designed after the first algorithms of Dorigo's et al., and of related scientific events.
In 1999 the ACO metaheuristic was defined by Dorigo, Di Caro and Gambardella with the purpose of providing a common framework for describing and analyzing all these algorithms inspired by the same ant colony behavior and by the same common process of reverse-engineering of this behavior. Therefore, the ACO metaheuristic was defined a posteriori, as the result of a synthesis effort effectuated on the study of the characteristics of all these ant-inspired algorithms and on the abstraction of their common traits.
The ACO's synthesis was also motivated by the usually good performance shown by the algorithms (e.g., for several important combinatorial problems like the quadratic assignment, vehicle routing and job shop scheduling, ACO implementations have outperformed state-of-the-art algorithms).
The definition and study of the ACO metaheuristic is one of the two fundamental goals of the thesis. The other one, strictly related to this former one, consists in the design, implementation, and testing of ACO instances for problems of adaptive routing in telecommunication networks.
This thesis is an in-depth journey through the ACO metaheuristic, during which we have (re)defined ACO and tried to get a clear understanding of its potentialities, limits, and relationships with other frameworks and with its biological background. The thesis takes into account all the developments that have followed the original 1999's definition, and provides a formal and comprehensive systematization of the subject, as well as an up-to-date and quite comprehensive review of current applications. We have also identified in dynamic problems in telecommunication networks the most appropriate domain of application for the ACO ideas. According to this understanding, in the most applicative part of the thesis we have focused on problems of adaptive routing in networks and we have developed and tested four new algorithms.
Adopting an original point of view with respect to the way ACO was firstly defined (but maintaining full conceptual and terminological consistency), ACO is here defined and mainly discussed in the terms of sequential decision processes and Monte Carlo sampling and learning.
More precisely, ACO is characterized as a policy search strategy aimed at learning the distributed parameters (called pheromone variables in accordance with the biological metaphor) of the stochastic decision policy which is used by so-called ant agents to generate solutions. Each ant represents in practice an independent sequential decision process aimed at constructing a possibly feasible solution for the optimization problem at hand by using only information local to the decision step.
Ants are repeatedly and concurrently generated in order to sample the solution set according to the current policy. The outcomes of the generated solutions are used to partially evaluate the current policy, spot the most promising search areas, and update the policy parameters in order to possibly focus the search in those promising areas while keeping a satisfactory level of overall exploration.
This way of looking at ACO has facilitated to disclose the strict relationships between ACO and other well-known frameworks, like dynamic programming, Markov and non-Markov decision processes, and reinforcement learning. In turn, this has favored reasoning on the general properties of ACO in terms of amount of complete state information which is used by the ACO's ants to take optimized decisions and to encode in pheromone variables memory of both the decisions that belonged to the sampled solutions and their quality.
The ACO's biological context of inspiration is fully acknowledged in the thesis. We report with extensive discussions on the shortest path behaviors of ant colonies and on the identification and analysis of the few nonlinear dynamics that are at the very core of self-organized behaviors in both the ants and other societal organizations. We discuss these dynamics in the general framework of stigmergic modeling, based on asynchronous environment-mediated communication protocols, and (pheromone) variables priming coordinated responses of a number of ``cheap' and concurrent agents.
The second half of the thesis is devoted to the study of the application of ACO to problems of online routing in telecommunication networks. This class of problems has been identified in the thesis as the most appropriate for the application of the multi-agent, distributed, and adaptive nature of the ACO architecture.
Four novel ACO algorithms for problems of adaptive routing in telecommunication networks are throughly described. The four algorithms cover a wide spectrum of possible types of network: two of them deliver best-effort traffic in wired IP networks, one is intended for quality-of-service (QoS) traffic in ATM networks, and the fourth is for best-effort traffic in mobile ad hoc networks.
The two algorithms for wired IP networks have been extensively tested by simulation studies and compared to state-of-the-art algorithms for a wide set of reference scenarios. The algorithm for mobile ad hoc networks is still under development, but quite extensive results and comparisons with a popular state-of-the-art algorithm are reported. No results are reported for the algorithm for QoS, which has not been fully tested. The observed experimental performance is excellent, especially for the case of wired IP networks: our algorithms always perform comparably or much better than the state-of-the-art competitors.
In the thesis we try to understand the rationale behind the brilliant performance obtained and the good level of popularity reached by our algorithms. More in general, we discuss the reasons of the general efficacy of the ACO approach for network routing problems compared to the characteristics of more classical approaches. Moving further, we also informally define Ant Colony Routing (ACR), a multi-agent framework explicitly integrating learning components into the ACO's design in order to define a general and in a sense futuristic architecture for autonomic network control.
Most of the material of the thesis comes from a re-elaboration of material co-authored and published in a number of books, journal papers, conference proceedings, and technical reports. The detailed list of references is provided in the Introduction.
|
114 |
Vers le vol à voile longue distance pour drones autonomes / Towards Vision-Based Autonomous Cross-Country Soaring for UAVsStolle, Martin Tobias 03 April 2017 (has links)
Les petit drones à voilure fixe rendent services aux secteurs de la recherche, de l'armée et de l'industrie, mais souffrent toujours de portée et de charge utile limitées. Le vol thermique permet de réduire la consommation d'énergie. Cependant,sans télédétection d'ascendances, un drone ne peut bénéficier d'une ascendance qu'en la rencontrant par hasard. Dans cette thèse, un nouveau cadre pour le vol à voile longue distance autonome est élaboré, permettant à un drone planeur de localiser visuellement des ascendances sous-cumulus et d’en récolter l'énergie de manière efficace. S'appuyant sur le filtre de Kalman non parfumé, une méthode de vision monoculaire est établie pour l'estimation des paramètres d’ascendances. Sa capacité de fournir des estimations convergentes et cohérentes est évaluée par des simulations Monte Carlo. Les incertitudes de modèle, le bruit de traitement de l'image et les trajectoires de l'observateur peuvent dégrader ces estimés. Par conséquent, un deuxième axe de cette thèse est la conception d'un planificateur de trajectoire robuste basé sur des cartes d'ascendances. Le planificateur fait le compromis entre le temps de vol et le risque d’un atterrissage forcé dans les champs tout en tenant compte des incertitudes d'estimation dans le processus de prise de décision. Il est illustré que la charge de calcul du planificateur de trajectoire proposé est réalisable sur une plate-forme informatique peu coûteuse. Les algorithmes proposés d’estimation ainsi que de planification sont évalués conjointement dans un simulateur de vol à 6 axes, mettant en évidence des améliorations significatives par rapport aux vols à voile longue distance autonomes actuels. / Small fixed-wing Unmanned Aerial Vehicles (UAVs) provide utility to research, military, and industrial sectors at comparablyreasonable cost, but still suffer from both limited operational ranges and payload capacities. Thermal soaring flight for UAVsoffers a significant potential to reduce the energy consumption. However, without remote sensing of updrafts, a glider UAVcan only benefit from an updraft when encountering it by chance. In this thesis, a new framework for autonomous cross-country soaring is elaborated, enabling a glider UAV to visually localize sub-cumulus thermal updrafts and to efficiently gain energy from them.Relying on the Unscented Kalman Filter, a monocular vision-based method is established, for remotely estimatingsub-cumulus updraft parameters. Its capability of providing convergent and consistent state estimates is assessed relyingon Monte Carlo Simulations. Model uncertainties, image processing noise, and poor observer trajectories can degrade theestimated updraft parameters. Therefore, a second focus of this thesis is the design of a robust probabilistic path plannerfor map-based autonomous cross-country soaring. The proposed path planner balances between the flight time and theoutlanding risk by taking into account the estimation uncertainties in the decision making process. The suggested updraftestimation and path planning algorithms are jointly assessed in a 6 Degrees Of Freedom simulator, highlighting significantperformance improvements with respect to state of the art approaches in autonomous cross-country soaring while it is alsoshown that the path planner is implementable on a low-cost computer platform.
|
115 |
Extração de preferências por meio de avaliações de comportamentos observados. / Preference elicitation using evaluation over observed behaviours.Silva, Valdinei Freire da 07 April 2009 (has links)
Recentemente, várias tarefas tem sido delegadas a sistemas computacionais, principalmente quando sistemas computacionais são mais confiáveis ou quando as tarefas não são adequadas para seres humanos. O uso de extração de preferências ajuda a realizar a delegação, permitindo que mesmo pessoas leigas possam programar facilmente um sistema computacional com suas preferências. As preferências de uma pessoa são obtidas por meio de respostas para questões específicas, que são formuladas pelo próprio sistema computacional. A pessoa age como um usuário do sistema computacional, enquanto este é visto como um agente que age no lugar da pessoa. A estrutura e contexto das questões são apontadas como fonte de variações das respostas do usuário, e tais variações podem impossibilitar a factibilidade da extração de preferências. Uma forma de evitar tais variações é questionar um usuário sobre a sua preferência entre dois comportamentos observados por ele. A questão de avaliar relativamente comportamentos observados é mais simples e transparente ao usuário, diminuindo as possíveis variações, mas pode não ser fácil para o agente interpretar tais avaliações. Se existem divergências entre as percepções do agente e do usuário, o agente pode ficar impossibilitado de aprender as preferências do usuário. As avaliações são geradas com base nas percepções do usuário, mas tudo que um agente pode fazer é relacionar tais avaliações às suas próprias percepções. Um outro problema é que questões, que são expostas ao usuário por meio de comportamentos demonstrados, são agora restritas pela dinâmica do ambiente e um comportamento não pode ser escolhido arbitrariamente. O comportamento deve ser factível e uma política de ação deve ser executada no ambiente para que um comportamento seja demonstrado. Enquanto o primeiro problema influencia a inferência de como o usuário avalia comportamentos, o segundo problema influencia quão rápido e acurado o processo de aprendizado pode ser feito. Esta tese propõe o problema de Extração de Preferências com base em Comportamentos Observados utilizando o arcabouço de Processos Markovianos de Decisão, desenvolvendo propriedades teóricas em tal arcabouço que viabilizam computacionalmente tal problema. O problema de diferentes percepções é analisado e soluções restritas são desenvolvidas. O problema de demonstração de comportamentos é analisado utilizando formulação de questões com base em políticas estacionárias e replanejamento de políticas, sendo implementados algoritmos com ambas soluções para resolver a extração de preferências em um cenário sob condições restritas. / Recently, computer systems have been delegated to accomplish a variety of tasks, when the computer system can be more reliable or when the task is not suitable or not recommended for a human being. The use of preference elicitation in computational systems helps to improve such delegation, enabling lay people to program easily a computer system with their own preference. The preference of a person is elicited through his answers to specific questions, that the computer system formulates by itself. The person acts as an user of the computer system, whereas the computer system can be seen as an agent that acts in place of the person. The structure and context of the questions have been pointed as sources of variance regarding the users answers, and such variance can jeopardize the feasibility of preference elicitation. An attempt to avoid such variance is asking an user to choose between two behaviours that were observed by himself. Evaluating relatively observed behaviours turn questions more transparent and simpler for the user, decreasing the variance effect, but it might not be easier interpreting such evaluations. If divergences between agents and users perceptions occur, the agent may not be able to learn the users preference. Evaluations are generated regarding users perception, but all an agent can do is to relate such evaluation to his own perception. Another issue is that questions, which are exposed to the user through behaviours, are now constrained by the environment dynamics and a behaviour cannot be chosen arbitrarily, but the behaviour must be feasible and a policy must be executed in order to achieve a behaviour. Whereas the first issue influences the inference regarding users evaluation, the second problem influences how fast and accurate the learning process can be made. This thesis proposes the problem of Preference Elicitation under Evaluations over Observed Behaviours using the Markov Decision Process framework and theoretic properties in such framework are developed in order to turn such problem computationally feasible. The problem o different perceptions is analysed and constraint solutions are developed. The problem of demonstrating a behaviour is considered under the formulation of question based on stationary policies and non-stationary policies. Both type of questions was implemented and tested to solve the preference elicitation in a scenario with constraint conditions.
|
116 |
Informa??o utilizada nos processos decis?rios de gestores universit?rios: estudo de caso na PUC-Campinas, SPTeixeira, Darlene 29 August 2005 (has links)
Made available in DSpace on 2016-04-04T18:36:27Z (GMT). No. of bitstreams: 1
Darlene Teixeira 1.pdf: 887858 bytes, checksum: d12dd192a95bec0b4974679938d19eca (MD5)
Previous issue date: 2005-08-29 / Pontif?cia Universidade Cat?lica de Campinas / It is known that the effective use of information is closely related to the quality employed for its retrieval and dissemination. Information is a organizational resource needed to identify problems as well as to solve them, and it also must be dealt with specific and measurable characteristics such as gathering, use and life circle methods, presenting different attributes in each phase. It can also be transformed into products that make easier for organizations to reach their goals. Thus, the combination of Information Technology with Communication and Information Science has great influence on the production, management and use of information within organizational processes. The main objective of this research was to identify the ways in which the university managers use the information in the decision making process and in order to achieve it a case study was carried at the PUC-Campinas with information collected from Directors of the University Academic Centres and other documental sources. The results obtained could be used to assist the university managers to analyze information when taking decisions as well to offer management alternatives by means of TI tools which can be found inside the organization itself. Among these results, it may be pointed out that although not effectively use by most of the subjects, the IC tools and scenarios analysis can be normally used by university managers if they get themselves familiar with these techniques specially on continuous monitoring in a way that can be reduced the possibility of being surprised by internal and external changes. / Sabe-se que o uso efetivo da informa??o est? intimamente ligado ? qualidade que se pode fazer para a sua articula??o, recupera??o e dissemina??o. Necess?ria tanto para ajudar a identificar problemas quanto para solucion?-los, a informa??o torna-se um recurso e como tal deve ser tratada com caracter?sticas especificadas e mensur?veis, como m?todo de coleta, uso, ciclo de vida padr?o, com diferentes atributos em cada est?gio. Pode tamb?m ser transformada em produtos que possibilitam ? organiza??o atingir seus objetivos. Sendo assim, a converg?ncia da tecnologia da informa??o, da comunica??o e da ci?ncia da informa??o afeta a cria??o, gest?o e uso da informa??o dentro dos processos organizacionais. O objetivo geral deste estudo foi identificar as informa??es utilizadas pelos gestores universit?rios nos processos decis?rios e para tal, foi realizado um estudo de caso na PUC-Campinas a partir de question?rios feitos aos Diretores dos Centros da Universidade, dentre outras t?cnicas de coleta de dados. Os resultados obtidos poder?o auxiliar os gestores universit?rios na forma de analisar as informa??es para o processo decis?rio bem como fornecer alternativas de gerenciamento das mesmas atrav?s de ferramentas de TI que podem estar dentro da pr?pria organiza??o. Dentre esses, pode-se destacar que, embora n?o efetivamente ainda utilizadas por todos os gestores, as ferramentas de intelig?ncia competitiva e an?lise de cen?rios podem perfeitamente ser utilizadas pelos gestores universit?rios desde que os mesmos se conscientizem e se familiarizem com a aplica??o das mesmas, especialmente no monitoramento cont?nuo, de forma que possa ser reduzida a probabilidade de serem surpreendidos por mudan?as internas e externas.
|
117 |
Apprentissage Intelligent des Robots Mobiles dans la Navigation Autonome / Intelligent Mobile Robot Learning in Autonomous NavigationXia, Chen 24 November 2015 (has links)
Les robots modernes sont appelés à effectuer des opérations ou tâches complexes et la capacité de navigation autonome dans un environnement dynamique est un besoin essentiel pour les robots mobiles. Dans l’objectif de soulager de la fastidieuse tâche de préprogrammer un robot manuellement, cette thèse contribue à la conception de commande intelligente afin de réaliser l’apprentissage des robots mobiles durant la navigation autonome. D’abord, nous considérons l’apprentissage des robots via des démonstrations d’experts. Nous proposons d’utiliser un réseau de neurones pour apprendre hors-ligne une politique de commande à partir de données utiles extraites d’expertises. Ensuite, nous nous intéressons à l’apprentissage sans démonstrations d’experts. Nous utilisons l’apprentissage par renforcement afin que le robot puisse optimiser une stratégie de commande pendant le processus d’interaction avec l’environnement inconnu. Un réseau de neurones est également incorporé et une généralisation rapide permet à l’apprentissage de converger en un certain nombre d’épisodes inférieur à la littérature. Enfin, nous étudions l’apprentissage par fonction de récompenses potentielles compte rendu des démonstrations d’experts optimaux ou non-optimaux. Nous proposons un algorithme basé sur l’apprentissage inverse par renforcement. Une représentation non-linéaire de la politique est désignée et la méthode du max-margin est appliquée permettant d’affiner les récompenses et de générer la politique de commande. Les trois méthodes proposées sont évaluées sur des robots mobiles afin de leurs permettre d’acquérir les compétences de navigation autonome dans des environnements dynamiques et inconnus / Modern robots are designed for assisting or replacing human beings to perform complicated planning and control operations, and the capability of autonomous navigation in a dynamic environment is an essential requirement for mobile robots. In order to alleviate the tedious task of manually programming a robot, this dissertation contributes to the design of intelligent robot control to endow mobile robots with a learning ability in autonomous navigation tasks. First, we consider the robot learning from expert demonstrations. A neural network framework is proposed as the inference mechanism to learn a policy offline from the dataset extracted from experts. Then we are interested in the robot self-learning ability without expert demonstrations. We apply reinforcement learning techniques to acquire and optimize a control strategy during the interaction process between the learning robot and the unknown environment. A neural network is also incorporated to allow a fast generalization, and it helps the learning to converge in a number of episodes that is greatly smaller than the traditional methods. Finally, we study the robot learning of the potential rewards underneath the states from optimal or suboptimal expert demonstrations. We propose an algorithm based on inverse reinforcement learning. A nonlinear policy representation is designed and the max-margin method is applied to refine the rewards and generate an optimal control policy. The three proposed methods have been successfully implemented on the autonomous navigation tasks for mobile robots in unknown and dynamic environments.
|
118 |
Extração de preferências por meio de avaliações de comportamentos observados. / Preference elicitation using evaluation over observed behaviours.Valdinei Freire da Silva 07 April 2009 (has links)
Recentemente, várias tarefas tem sido delegadas a sistemas computacionais, principalmente quando sistemas computacionais são mais confiáveis ou quando as tarefas não são adequadas para seres humanos. O uso de extração de preferências ajuda a realizar a delegação, permitindo que mesmo pessoas leigas possam programar facilmente um sistema computacional com suas preferências. As preferências de uma pessoa são obtidas por meio de respostas para questões específicas, que são formuladas pelo próprio sistema computacional. A pessoa age como um usuário do sistema computacional, enquanto este é visto como um agente que age no lugar da pessoa. A estrutura e contexto das questões são apontadas como fonte de variações das respostas do usuário, e tais variações podem impossibilitar a factibilidade da extração de preferências. Uma forma de evitar tais variações é questionar um usuário sobre a sua preferência entre dois comportamentos observados por ele. A questão de avaliar relativamente comportamentos observados é mais simples e transparente ao usuário, diminuindo as possíveis variações, mas pode não ser fácil para o agente interpretar tais avaliações. Se existem divergências entre as percepções do agente e do usuário, o agente pode ficar impossibilitado de aprender as preferências do usuário. As avaliações são geradas com base nas percepções do usuário, mas tudo que um agente pode fazer é relacionar tais avaliações às suas próprias percepções. Um outro problema é que questões, que são expostas ao usuário por meio de comportamentos demonstrados, são agora restritas pela dinâmica do ambiente e um comportamento não pode ser escolhido arbitrariamente. O comportamento deve ser factível e uma política de ação deve ser executada no ambiente para que um comportamento seja demonstrado. Enquanto o primeiro problema influencia a inferência de como o usuário avalia comportamentos, o segundo problema influencia quão rápido e acurado o processo de aprendizado pode ser feito. Esta tese propõe o problema de Extração de Preferências com base em Comportamentos Observados utilizando o arcabouço de Processos Markovianos de Decisão, desenvolvendo propriedades teóricas em tal arcabouço que viabilizam computacionalmente tal problema. O problema de diferentes percepções é analisado e soluções restritas são desenvolvidas. O problema de demonstração de comportamentos é analisado utilizando formulação de questões com base em políticas estacionárias e replanejamento de políticas, sendo implementados algoritmos com ambas soluções para resolver a extração de preferências em um cenário sob condições restritas. / Recently, computer systems have been delegated to accomplish a variety of tasks, when the computer system can be more reliable or when the task is not suitable or not recommended for a human being. The use of preference elicitation in computational systems helps to improve such delegation, enabling lay people to program easily a computer system with their own preference. The preference of a person is elicited through his answers to specific questions, that the computer system formulates by itself. The person acts as an user of the computer system, whereas the computer system can be seen as an agent that acts in place of the person. The structure and context of the questions have been pointed as sources of variance regarding the users answers, and such variance can jeopardize the feasibility of preference elicitation. An attempt to avoid such variance is asking an user to choose between two behaviours that were observed by himself. Evaluating relatively observed behaviours turn questions more transparent and simpler for the user, decreasing the variance effect, but it might not be easier interpreting such evaluations. If divergences between agents and users perceptions occur, the agent may not be able to learn the users preference. Evaluations are generated regarding users perception, but all an agent can do is to relate such evaluation to his own perception. Another issue is that questions, which are exposed to the user through behaviours, are now constrained by the environment dynamics and a behaviour cannot be chosen arbitrarily, but the behaviour must be feasible and a policy must be executed in order to achieve a behaviour. Whereas the first issue influences the inference regarding users evaluation, the second problem influences how fast and accurate the learning process can be made. This thesis proposes the problem of Preference Elicitation under Evaluations over Observed Behaviours using the Markov Decision Process framework and theoretic properties in such framework are developed in order to turn such problem computationally feasible. The problem o different perceptions is analysed and constraint solutions are developed. The problem of demonstrating a behaviour is considered under the formulation of question based on stationary policies and non-stationary policies. Both type of questions was implemented and tested to solve the preference elicitation in a scenario with constraint conditions.
|
119 |
A Markovian state-space framework for integrating flexibility into space system design decisionsLafleur, Jarret Marshall 16 December 2011 (has links)
The past decades have seen the state of the art in aerospace system design progress from a scope of simple optimization to one including robustness, with the objective of permitting a single system to perform well even in off-nominal future environments. Integrating flexibility, or the capability to easily modify a system after it has been fielded in response to changing environments, into system design represents a further step forward. One challenge in accomplishing this rests in that the decision-maker must consider not only the present system design decision, but also sequential future design and operation decisions. Despite extensive interest in the topic, the state of the art in designing flexibility into aerospace systems, and particularly space systems, tends to be limited to analyses that are qualitative, deterministic, single-objective, and/or limited to consider a single future time period.
To address these gaps, this thesis develops a stochastic, multi-objective, and multi-period framework for integrating flexibility into space system design decisions. Central to the framework are five steps. First, system configuration options are identified and costs of switching from one configuration to another are compiled into a cost transition matrix. Second, probabilities that demand on the system will transition from one mission to another are compiled into a mission demand Markov chain. Third, one performance matrix for each design objective is populated to describe how well the identified system configurations perform in each of the identified mission demand environments. The fourth step employs multi-period decision analysis techniques, including Markov decision processes (MDPs) from the field of operations research, to find efficient paths and policies a decision-maker may follow. The final step examines the implications of these paths and policies for the primary goal of informing initial system selection.
Overall, this thesis unifies state-centric concepts of flexibility from economics and engineering literature with sequential decision-making techniques from operations research. The end objective of this thesis' framework and its supporting analytic and computational tools is to enable selection of the next-generation space systems today, tailored to decision-maker budget and performance preferences, that will be best able to adapt and perform in a future of changing environments and requirements. Following extensive theoretical development, the framework and its steps are applied to space system planning problems of (1) DARPA-motivated multiple- or distributed-payload satellite selection and (2) NASA human space exploration architecture selection.
|
120 |
Semi-Markov Processes In Dynamic Games And FinanceGoswami, Anindya 02 1900 (has links)
Two different sets of problems are addressed in this thesis. The first one is on partially observed semi-Markov Games (POSMG) and the second one is on semi-Markov modulated financial market model.
In this thesis we study a partially observable semi-Markov game in the infinite time horizon. The study of a partially observable game (POG) involves three major steps: (i) construct an equivalent completely observable game (COG), (ii) establish the equivalence between POG and COG by showing that if COG admits an equilibrium, POG does so, (iii) study the equilibrium of COG and find the corresponding equilibrium of original partially observable problem.
In case of infinite time horizon game problem there are two different payoff criteria. These are discounted payoff criterion and average payoff criterion. At first a partially observable semi-Markov decision process on general state space with discounted cost criterion is studied. An optimal policy is shown to exist by considering a Shapley’s equation for the corresponding completely observable model. Next the discounted payoff problem is studied for two-person zero-sum case. A saddle point equilibrium is shown to exist for this case. Then the variable sum game is investigated. For this case the Nash equilibrium strategy is obtained in Markov class under suitable assumption. Next the POSMG problem on countable state space is addressed for average payoff criterion. It is well known that under this criterion the game problem do not have a solution in general. To ensure a solution one needs some kind of ergodicity of the transition kernel. We find an appropriate ergodicity of partially observed model which in turn induces a geometric ergodicity to the equivalent model. Using this we establish a solution of the corresponding average payoff optimality equation (APOE). Thus the value and a saddle point equilibrium is obtained for the original partially observable model. A value iteration scheme is also developed to find out the average value of the game.
Next we study the financial market model whose key parameters are modulated by semi-Markov processes. Two different problems are addressed under this market assumption. In the first one we show that this market is incomplete. In such an incomplete market we find the locally risk minimizing prices of exotic options in the Follmer Schweizer framework. In this model the stock prices are no more Markov. Generally stock price process is modeled as Markov process because otherwise one may not get a pde representation of price of a contingent claim. To overcome this difficulty we find an appropriate Markov process which includes the stock price as a component and then find its infinitesimal generator. Using Feynman-Kac formula we obtain a system of non-local partial differential equations satisfied by the option price functions in the mildsense. .Next this system is shown to have a classical solution for given initial or boundary conditions.
Then this solution is used to have a F¨ollmer Schweizer decomposition of option price. Thus we obtain the locally risk minimizing prices of different options. Furthermore we obtain an integral equation satisfied by the unique solution of this system. This enable us to compute the price of a contingent claim and find the risk minimizing hedging strategy numerically. Further we develop an efficient and stable numerical method to compute the prices.
Beside this work on derivative pricing, the portfolio optimization problem in semi-Markov modulated market is also studied in the thesis. We find the optimal portfolio selections by optimizing expected utility of terminal wealth. We also obtain the optimal portfolio selections under risk sensitive criterion for both finite and infinite time horizon.
|
Page generated in 0.0604 seconds