Global ETD Search

111	Optimisation des systèmes de véhicules en libre service par la tarification / Vehicle Sharing System Pricing Optimization Waserhole, Ariel 18 November 2013 (has links) Nous étudions les systèmes de véhicules en libre service en aller-simple : avec emprunt et restitution dans des lieux éventuellement différents. La publicité promeut l'image de flexibilité et d'accessibilité (tarifaire) de tels systèmes, mais en réalité il arrive qu'il n'y ait pas de véhicule disponible au départ, voire pire, pas de place à l'arrivée. Il est envisageable (et pratiqué pour Vélib' à Paris) de relocaliser les véhicules pour éviter que certaines stations soient vides ou pleines à cause des marées ou de la gravitation. Notre parti-pris est cependant de ne pas considérer de ``relocalisation physique'' (à base de tournées de camions) en raison du coût, du trafic et de la pollution occasionnées (surtout pour des systèmes de voitures, comme Autolib' à Paris). La question à laquelle nous désirons répondre dans cette thèse est la suivante : Une gestion via des tarifs incitatifs permet-elle d'améliorer significativement les performances des systèmes de véhicules en libre service ? / One way Vehicle Sharing Systems (VSS), in which users pick-up and return a vehicle in different places is a new type of transportation system that presents many advantages. However, even if advertising promotes an image of flexibility and price accessibility, in reality customers might not find a vehicle at the original station (which may be considered as an infinite price), or worse, a parking spot at destination. Since the first Bike Sharing Systems (BSS), problems of vehicles and parking spots availability have appeared crucial. We define the system performance as the number of trips sold (to be maximized). BSS performance is currently improved by vehicle relocation with trucks. Our scope is to focus on self regulating systems through pricing incentives, avoiding physical station balancing. The question we are investigating in this thesis is the following: Can a management of the incentives increases significantly the performance of the vehicle sharing systems? Véhicules en libre service Politiques tarifaires Processus de décision markovien Réseau de files d'attentes Simulation Vehicle Sharing Systems Pricing policy Markov Decision Process Queuing networks Simulation 510 004
112	Algoritmos para o módulo de controle de taxa de codificação de vídeos multivistas do padrão H.264/MVC / Algorithms for encoding rate control module for multiview videos of h.264/mvc standard Vizzotto, Bruno Boessio January 2012 (has links) Esta dissertação de mestrado apresenta um novo esquema de controle de taxa hierárquico – HRC – para o padrão MVC – extensão para vídeos de múltiplas vistas do padrão H.264 – com objetivo de melhorar o aproveitamento da largura de banda oferecida por um canal entregando o vídeo comprimido com a melhor qualidade possível. Este esquema de controle de taxa hierárquico foi concebido para controlar de forma conjunta os níveis de quadro e de unidades básicas (BU). O esquema proposto explora a correlação existente entre as distribuições das taxas de bits em quadros vizinhos para predizer de forma eficiente o comportamento dos futuras bitrates através da aplicação de um controle preditivo baseado em modelos – MPC – que define uma ação de controle apropriada sobre as ações de adaptação do parâmetro de quantização (QP). Para prover um ajuste em granularidade fina, o QP é adicionalmente adaptado internamente para cada quadro por um processo de decisão de Markov (MDP) implementado em nível de BU capaz de considerar mapas com Regiões de Interesse (RoI). Um retorno acoplado aos dois níveis supracitados é realizado para garantir a consistência do sistema. Aprendizagem por Reforço é utilizada para atualizar os parâmetros do Controle Preditivo baseado em Modelos e do processo de decisão de Markov. Resultados experimentais mostram a superioridade da utilização do esquema de controle proposto, comparado às soluções estado-da-arte, tanto em termos de precisão na alocação de bits quanto na otimização da razão taxa-distorção, entregando um vídeo de maior qualidade visual nos níveis de quadros e de BUs. / This master thesis presents a novel Hierarchical Rate Control – HRC – for the Multiview Video Coding standard targeting an increased bandwidth usage and high video quality. The HRC is designed to jointly address the rate control at both framelevel and Basic Unit (BU)-level. This scheme is able to exploit the bitrate distribution correlation with neighboring frames to efficiently predict the future bitrate behavior by employing a Model Predictive Control that defines a proper control action through QP (Quantization Parameter) adaptation. To provide a fine-grained tuning, the QP is further adapted within each frame by a Markov Decision Process implemented at BU-level able to take into consideration a map of the Regions of Interest. A coupled frame/BU-level feedback is performed in order to guarantee the system consistency. A Reinforcement Learning method is responsible for updating the Model Predictive Control and the Markov Decision Process parameters. Experimental results show the superiority of the Hierarchical Rate Control compared to state-of-the-art solutions, in terms of bitrate allocation accuracy and rate-distortion, while delivering smooth video quality at both frame and Basic Unit levels. Microeletrônica Vídeo digital Codificacao : Video digital Video encoding Multiview video coding Rate control Model predictive control Markov decision process Reinforcement learning
113	Processos de decisão Markovianos com probabilidades imprecisas e representações relacionais: algoritmos e fundamentos. / Markov decision processes with imprecise probabilities and relational representations: foundations and algorithms. Ricardo Shirota Filho 03 May 2012 (has links) Este trabalho é dedicado ao desenvolvimento teórico e algorítmico de processos de decisão markovianos com probabilidades imprecisas e representações relacionais. Na literatura, essa configuração tem sido importante dentro da área de planejamento em inteligência artificial, onde o uso de representações relacionais permite obter descrições compactas, e o emprego de probabilidades imprecisas resulta em formas mais gerais de incerteza. São três as principais contribuições deste trabalho. Primeiro, efetua-se uma discussão sobre os fundamentos de tomada de decisão sequencial com probabilidades imprecisas, em que evidencia-se alguns problemas ainda em aberto. Esses resultados afetam diretamente o (porém não restrito ao) modelo de interesse deste trabalho, os processos de decisão markovianos com probabilidades imprecisas. Segundo, propõe-se três algoritmos para processos de decisão markovianos com probabilidades imprecisas baseadas em programação (otimização) matemática. E terceiro, desenvolvem-se ideias propostas por Trevizan, Cozman e de Barros (2008) no uso de variantes do algoritmo Real-Time Dynamic Programming para resolução de problemas de planejamento probabilístico descritos através de versões estendidas da linguagem de descrição de domínios de planejamento (PPDDL). / This work is devoted to the theoretical and algorithmic development of Markov Decision Processes with Imprecise Probabilities and relational representations. In the literature, this configuration is important within artificial intelligence planning, where the use of relational representations allow compact representations and imprecise probabilities result in a more general form of uncertainty. There are three main contributions. First, we present a brief discussion of the foundations of decision making with imprecise probabilities, pointing towards key questions that remain unanswered. These results have direct influence upon the model discussed within this text, that is, Markov Decision Processes with Imprecise Probabilities. Second, we propose three algorithms for Markov Decision Processes with Imprecise Probabilities based on mathematical programming. And third, we develop ideas proposed by Trevizan, Cozman e de Barros (2008) on the use of variants of Real-Time Dynamic Programming to solve problems of probabilistic planning described by an extension of the Probabilistic Planning Domain Definition Language (PPDDL). Algoritmos Fundamentos Probabilidades imprecisas Processo de decisão Markoviano Representações relacionais Tomada de decisão sequencial Algorithm Foundations Imprecise probabilities Markov decision process Relational representations Sequential decision making
114	Planejamento probabilístico usando programação dinâmica assíncrona e fatorada / Probabilistic planning using asynchronous and factored dynamic programming. Mijail Gamarra Holguin 03 April 2013 (has links) Processos de Decisão Markovianos (Markov Decision Process - MDP) modelam problemas de tomada de decisão sequencial em que as possíveis ações de um agente possuem efeitos probabilísticos sobre os estados sucessores (que podem ser definidas por matrizes de transição de estados). Programação dinâmica em tempo real (Real-time dynamic programming - RTDP), é uma técnica usada para resolver MDPs quando existe informação sobre o estado inicial. Abordagens tradicionais apresentam melhor desempenho em problemas com matrizes esparsas de transição de estados porque podem alcançar eficientemente a convergência para a política ótima, sem ter que visitar todos os estados. Porém essa vantagem pode ser perdida em problemas com matrizes densas de transição, nos quais muitos estados podem ser alcançados em um passo (por exemplo, problemas de controle com eventos exógenos). Uma abordagem para superar essa limitação é explorar regularidades existentes na dinâmica do domínio através de uma representação fatorada, isto é, uma representação baseada em variáveis de estado. Nesse trabalho de mestrado, propomos um novo algoritmo chamado de FactRTDP (RTDP Fatorado), e sua versão aproximada aFactRTDP (RTDP Fatorado e Aproximado), que é a primeira versão eficiente fatorada do algoritmo clássico RTDP. Também propomos outras 2 extensões desses algoritmos, o FactLRTDP e aFactLRTDP, que rotulam estados cuja função valor convergiu para o ótimo. Os resultados experimentais mostram que estes novos algoritmos convergem mais rapidamente quando executados em domínios com matrizes de transição densa e tem bom comportamento online em domínios com matrizes de transição densa com pouca dependência entre as variáveis de estado. / Markov Decision Process (MDP) model problems of sequential decision making, where the possible actions have probabilistic effects on the successor states (defined by state transition matrices). Real-time dynamic programming (RTDP), is a technique for solving MDPs when there exists information about the initial state. Traditional approaches show better performance in problems with sparse state transition matrices, because they can achieve the convergence to optimal policy efficiently, without visiting all states. But, this advantage can be lose in problems with dense state transition matrices, in which several states can be achieved in a step (for example, control problems with exogenous events). An approach to overcome this limitation is to explore regularities existing in the domain dynamics through a factored representation, i.e., a representation based on state variables. In this master thesis, we propose a new algorithm called FactRTDP (Factored RTDP), and its approximate version aFactRTDP (Approximate and Factored RTDP), that are the first factored efficient versions of the classical RTDP algorithm. We also propose two other extensions, FactLRTDP and aFactLRTDP, that label states for which the value function has converged to the optimal. The experimental results show that when these new algorithms are executed in domains with dense transition matrices, they converge faster. And they have a good online performance in domains with dense transition matrices and few dependencies among state variables. Planejamento Probabilístico Processo de Decisão Markoviano Programação Dinâmica em Tempo Real Raciocínio Aproximado. Approximate Reasoning. Markov Decision Process Probabilistic Planning Real-Time Dynamic Programming
115	Algoritmos para o módulo de controle de taxa de codificação de vídeos multivistas do padrão H.264/MVC / Algorithms for encoding rate control module for multiview videos of h.264/mvc standard Vizzotto, Bruno Boessio January 2012 (has links) Esta dissertação de mestrado apresenta um novo esquema de controle de taxa hierárquico – HRC – para o padrão MVC – extensão para vídeos de múltiplas vistas do padrão H.264 – com objetivo de melhorar o aproveitamento da largura de banda oferecida por um canal entregando o vídeo comprimido com a melhor qualidade possível. Este esquema de controle de taxa hierárquico foi concebido para controlar de forma conjunta os níveis de quadro e de unidades básicas (BU). O esquema proposto explora a correlação existente entre as distribuições das taxas de bits em quadros vizinhos para predizer de forma eficiente o comportamento dos futuras bitrates através da aplicação de um controle preditivo baseado em modelos – MPC – que define uma ação de controle apropriada sobre as ações de adaptação do parâmetro de quantização (QP). Para prover um ajuste em granularidade fina, o QP é adicionalmente adaptado internamente para cada quadro por um processo de decisão de Markov (MDP) implementado em nível de BU capaz de considerar mapas com Regiões de Interesse (RoI). Um retorno acoplado aos dois níveis supracitados é realizado para garantir a consistência do sistema. Aprendizagem por Reforço é utilizada para atualizar os parâmetros do Controle Preditivo baseado em Modelos e do processo de decisão de Markov. Resultados experimentais mostram a superioridade da utilização do esquema de controle proposto, comparado às soluções estado-da-arte, tanto em termos de precisão na alocação de bits quanto na otimização da razão taxa-distorção, entregando um vídeo de maior qualidade visual nos níveis de quadros e de BUs. / This master thesis presents a novel Hierarchical Rate Control – HRC – for the Multiview Video Coding standard targeting an increased bandwidth usage and high video quality. The HRC is designed to jointly address the rate control at both framelevel and Basic Unit (BU)-level. This scheme is able to exploit the bitrate distribution correlation with neighboring frames to efficiently predict the future bitrate behavior by employing a Model Predictive Control that defines a proper control action through QP (Quantization Parameter) adaptation. To provide a fine-grained tuning, the QP is further adapted within each frame by a Markov Decision Process implemented at BU-level able to take into consideration a map of the Regions of Interest. A coupled frame/BU-level feedback is performed in order to guarantee the system consistency. A Reinforcement Learning method is responsible for updating the Model Predictive Control and the Markov Decision Process parameters. Experimental results show the superiority of the Hierarchical Rate Control compared to state-of-the-art solutions, in terms of bitrate allocation accuracy and rate-distortion, while delivering smooth video quality at both frame and Basic Unit levels. Microeletrônica Vídeo digital Codificacao : Video digital Video encoding Multiview video coding Rate control Model predictive control Markov decision process Reinforcement learning
116	Estimating Likelihood of Having a BRCA Gene Mutation Based on Family History of Cancers and Recommending Optimized Cancer Preventive Actions Abdollahian, Mehrnaz 12 November 2015 (has links) BRCA1 and BRCA2 are gene mutations that drastically increase chances of developing breast and ovarian cancers, up to 20-fold, for women. A genetic blood test is used to detect BRCA mutations. Though these mutations occur in one of every 400 in the general population (excluding Ashkenazi Jewish ethnicity), they are present in most cases of hereditary breast and ovarian cancer patients. Hence, it is common practice for the physicians to require genetic testing for those that fit the rules as recommended by the National Cancer Comprehensive Network. However, data from the Myriad Laboratory, the only provider of the test until 2013, show that over 70 percent of those tested are negative for BRCA mutations [1]. As there are significant costs and psychological trauma associated with having to go through the test, there is a need for more comprehensive rules for determining who should be tested. Once the presence of BRCA is identified via testing, the next challenge for both mutation carriers and their physicians is to select the most appropriate types and timing of intervention actions. Organizations such as the American Cancer Society suggest drastic intervention actions such as prophylactic surgeries and intense breast screenings. These actions vary significantly in their cost, cancer incidence prevention ability, and can have major side effects potentially resulting in reproduction inability or death. Effectiveness of these intervention actions is also age dependent. In this dissertation, both an analytical and an optimization framework are presented. The analytical framework uses supervised machine learning models on extended family history of cancers, and personal and medical information from a recent nationwide survey study of women who have been referred for genetic testing for the presence of a BRCA mutation. This framework provides the potential mutation carriers as well as their physician with an estimate of the likelihood of having the mutations. The optimization framework uses a Markov decision process (MDP) model to find cost-optimal and/or quality-adjusted life years (QALYs) optimal intervention strategies for those tested positive for a BRCA mutation. This framework uses a dynamic approach to address this problem. The decisions are made more robust by considering the variation in estimates of the transition probabilities by using a robust version of the MDP model. This research study delivers an innovative decision support tool that enables physicians and genetic consultants predict the population at high risk of breast and ovarian cancers more accurately. For those identified with presence of the BRCA mutation, the decision support tool offers effective intervention strategies considering either minimizing cost or maximizing QALYs to prevent incidence of cancers. Breast Cancer Gene Mutation Markov Decision Process Machine Learning Classiﬁers Breast and Ovarian Cancers Robust Optimization Bioinformatics Industrial Engineering Medicine and Health Sciences
117	Privacy-by-Design for Cyber-Physical Systems Li, Zuxing January 2017 (has links) It is envisioned that future cyber-physical systems will provide a more convenient living and working environment. However, such systems need inevitably to collect and process privacy-sensitive information. That means the benefits come with potential privacy leakage risks. Nowadays, this privacy issue receives more attention as a legal requirement of the EU General Data Protection Regulation. In this thesis, privacy-by-design approaches are studied where privacy enhancement is realized through taking privacy into account in the physical layer design. This work focuses in particular on cyber-physical systems namely sensor networks and smart grids. Physical-layer performance and privacy leakage risk are assessed by hypothesis testing measures. First, a sensor network in the presence of an informed eavesdropper is considered. Extended from the traditional hypothesis testing problems, novel privacy-preserving distributed hypothesis testing problems are formulated. The optimality of deterministic likelihood-based test is discussed. It is shown that the optimality of deterministic likelihood-based test does not always hold for an intercepted remote decision maker and an optimal randomized decision strategy is completely characterized by the privacy-preserving condition. These characteristics are helpful to simplify the person-by-person optimization algorithms to design optimal privacy-preserving hypothesis testing networks. Smart meter privacy becomes a significant issue in the development of smart grid technology. An innovative scheme is to exploit renewable energy supplies or an energy storage at a consumer to manipulate meter readings from actual energy demands to enhance the privacy. Based on proposed asymptotic hypothesis testing measures of privacy leakage, it is shown that the optimal privacy-preserving performance can be characterized by a Kullback-Leibler divergence rate or a Chernoff information rate in the presence of renewable energy supplies. When an energy storage is used, its finite capacity introduces memory in the smart meter system. It is shown that the design of an optimal energy management policy can be cast to a belief state Markov decision process framework. / <p>QC 20170815</p> cyber-physical system hypothesis testing information theory Markov decision process privacy Signal Processing Signalbehandling Communication Systems Kommunikationssystem Annan elektroteknik och elektronik
118	Aprendizado por reforço em lote: um estudo de caso para o problema de tomada de decisão em processos de venda / Batch reinforcement learning: a case study for the problem of decision making in sales processes Dênis Antonio Lacerda 12 December 2013 (has links) Planejamento Probabilístico estuda os problemas de tomada de decisão sequencial de um agente, em que as ações possuem efeitos probabilísticos, modelados como um processo de decisão markoviano (Markov Decision Process - MDP). Dadas a função de transição de estados probabilística e os valores de recompensa das ações, é possível determinar uma política de ações (i.e., um mapeamento entre estado do ambiente e ações do agente) que maximiza a recompensa esperada acumulada (ou minimiza o custo esperado acumulado) pela execução de uma sequência de ações. Nos casos em que o modelo MDP não é completamente conhecido, a melhor política deve ser aprendida através da interação do agente com o ambiente real. Este processo é chamado de aprendizado por reforço. Porém, nas aplicações em que não é permitido realizar experiências no ambiente real, por exemplo, operações de venda, é possível realizar o aprendizado por reforço sobre uma amostra de experiências passadas, processo chamado de aprendizado por reforço em lote (Batch Reinforcement Learning). Neste trabalho, estudamos técnicas de aprendizado por reforço em lote usando um histórico de interações passadas, armazenadas em um banco de dados de processos, e propomos algumas formas de melhorar os algoritmos existentes. Como um estudo de caso, aplicamos esta técnica no aprendizado de políticas para o processo de venda de impressoras de grande formato, cujo objetivo é a construção de um sistema de recomendação de ações para vendedores iniciantes. / Probabilistic planning studies the problems of sequential decision-making of an agent, in which actions have probabilistic effects, and can be modeled as a Markov decision process (MDP). Given the probabilities and reward values of each action, it is possible to determine an action policy (in other words, a mapping between the state of the environment and the agent\'s actions) that maximizes the expected reward accumulated by executing a sequence of actions. In cases where the MDP model is not completely known, the best policy needs to be learned through the interaction of the agent in the real environment. This process is called reinforcement learning. However, in applications where it is not allowed to perform experiments in the real environment, for example, sales process, it is possible to perform the reinforcement learning using a sample of past experiences. This process is called Batch Reinforcement Learning. In this work, we study techniques of batch reinforcement learning (BRL), in which learning is done using a history of past interactions, stored in a processes database. As a case study, we apply this technique for learning policies in the sales process for large format printers, whose goal is to build a action recommendation system for beginners sellers. Aprendizado de processos de venda Aprendizado por reforço em lote Planejamento probabilístico Processo de decisão markoviano Batch reinforcement learning Markov decision process Probabilistic planning Sales process learning
119	Deep Reinforcement Learning for the Optimization of Combining Raster Images in Forest Planning Wen, Yangyang January 2021 (has links) Raster images represent the treatment options of how the forest will be cut. Economic benefits from cutting the forest will be generated after the treatment is selected and executed. Existing raster images have many clusters and small sizes, this becomes the principal cause of overhead. If we can fully explore the relationship among the raster images and combine the old data sets according to the optimization algorithm to generate a new raster image, then this result will surpass the existing raster images and create higher economic benefits. The question of this project is can we create a dynamic model that treats the updating pixel’s status as an agent selecting options for an empty raster image in response to neighborhood environmental and landscape parameters. This project is trying to explore if it is realistic to use deep reinforcement learning to generate new and superior raster images. Finally, this project aims to explore the feasibility, usefulness, and effectiveness of deep reinforcement learning algorithms in optimizing existing treatment options. The problem was modeled as a Markov decision process, in which the pixel to be updated was an agent of the empty raster image, which would determine the choice of the treatment option for the current empty pixel. This project used the Deep Q learning neural network model to calculate the Q values. The temporal difference reinforcement learning algorithm was applied to predict future rewards and to update model parameters. After the modeling was completed, this project set up the model usefulness experiment to test the usefulness of the model. Then the parameter correlation experiment was set to test the correlation between the parameters and the benefit of the model. Finally, the trained model was used to generate a larger size raster image to test its effectiveness. Raster images Optimization Deep Reinforcement Learning Markov Decision Process Deep Q Learning Neural Network Temporal Difference Model Usefulness Parameter Correlation Model Effectiveness. Computer Systems Datorsystem
120	On Non-Classical Stochastic Shortest Path Problems Piribauer, Jakob 13 October 2021 (has links) The stochastic shortest path problem lies at the heart of many questions in the formal verification of probabilistic systems. It asks to find a scheduler resolving the non-deterministic choices in a weighted Markov decision process (MDP) that minimizes or maximizes the expected accumulated weight before a goal state is reached. In the classical setting, it is required that the scheduler ensures that a goal state is reached almost surely. For the analysis of systems without guarantees on the occurrence of an event of interest (reaching a goal state), however, schedulers that miss the goal with positive probability are of interest as well. We study two non-classical variants of the stochastic shortest path problem that drop the restriction that the goal has to be reached almost surely. These variants ask for the optimal partial expectation, obtained by assigning weight 0 to paths not reaching the goal, and the optimal conditional expectation under the condition that the goal is reached, respectively. Both variants have only been studied in structures with non-negative weights. We prove that the decision versions of these non-classical stochastic shortest path problems in MDPs with arbitrary integer weights are at least as hard as the Positivity problem for linear recurrence sequences. This Positivity problem is an outstanding open number-theoretic problem, closely related to the famous Skolem problem. A decid- ability result for the Positivity problem would imply a major breakthrough in analytic number theory. The proof technique we develop can be applied to a series of further problems. In this way, we obtain Positivity-hardness results for problems addressing the termination of one-counter MDPs, the satisfaction of energy objectives, the satisfaction of cost constraints and the computation of quantiles, the conditional value-at-risk – an important risk measure – for accumulated weights, and the model-checking problem of frequency-LTL. Despite these Positivity-hardness results, we show that the optimal values for the non-classical stochastic shortest path problems can be achieved by weight-based deter- ministic schedulers and that the optimal values can be approximated in exponential time. In MDPs with non-negative weights, it is known that optimal partial and conditional expectations can be computed in exponential time. These results rely on the existence of a saturation point, a bound on the accumulated weight above which optimal schedulers can behave memorylessly. We improve the result for partial expectations by showing that the least possible saturation point can be computed efficiently. Further, we show that a simple saturation point also allows us to compute the optimal conditional value-at-risk for the accumulated weight in MDPs with non-negative weights. Moreover, we introduce the notions of long-run probability and long-run expectation addressing the long-run behavior of a system. These notions quantify the long-run average probability that a path property is satisfied on a suffix of a run and the long-run average expected amount of weight accumulated before the next visit to a target state, respectively. We establish considerable similarities of the corresponding optimization problems with non-classical stochastic shortest path problems. On the one hand, we show that the threshold problem for optimal long-run probabilities of regular co-safety properties is Positivity-hard via the Positivity-hardness of non-classical stochastic shortest path problems. On the other hand, we show that optimal long-run expectations in MDPs with arbitrary integer weights and long-run probabilities of constrained reachability properties (a U b) can be computed in exponential time using the existence of a saturation point. info:eu-repo/classification/ddc/510 ddc:510

Search results