Global ETD Search

1	Modeling exotic options with maturity extensions by stochastic dynamic programming Tapeinos, Socratis January 2009 (has links) The exotic options that are examined in this thesis have a combination of non-standard characteristics which can be found in shout, multi-callable, pathdependent and Bermudan options. These options are called reset options. A reset option is an option which allows the holder to reset, one or more times, certain terms of the contract based on pre-specified rules during the life of the option. Overall in this thesis, an attempt has been made to tackle the modeling challenges that arise from the exotic properties of the reset option embedded in segregated funds. Initially, the relevant literature was reviewed and the lack of published work, advanced enough to deal with the complexities of the reset option, was identified. Hence, there appears to be a clear and urgent need to have more sophisticated approaches which will model the reset option. The reset option on the maturity guarantee of segregated funds is formulated as a non-stationary finite horizon Markov Decision Process. The returns from the underlying asset are modeled using a discrete time approximation of the lognormal model. An Optimal Exercise Boundary of the reset option is derived where a threshold value is depicted such that if the value of the underlying asset price exceeds it then it is optimal for the policyholder to reset his maturity guarantee. Otherwise, it is optimal for the policyholder to rollover his maturity guarantee. It is noteworthy that the model is able to depict the Optimal Exercise Boundary of not just the first but of all the segregated fund contracts which can be issued throughout the planning horizon of the policyholder. The main finding of the model is that as the segregated fund contract approaches its maturity, the threshold value in the Optimal Exercise Boundary increases. However, in the last period before the maturity of the segregated fund, the threshold value decreases. The reason for this is that if the reset option is not exercised it will expire worthless. The model is then extended to re ect on the characteristics of the range of products which are traded in the market. Firstly, the issuer of the segregated fund contract is allowed to charge a management fee to the policyholder. The effect from incorporating this fee is that the policyholder requires a higher return in order to optimally reset his maturity guarantee while the total value of the segregated fund is diminished. Secondly, the maturity guarantee becomes a function of the number of times that the reset option has been exercised. The effect is that the policyholder requires a higher return in order to choose to reset his maturity guarantee while the total value of the segregated fund is diminished. Thirdly, the policyholder is allowed to reset the maturity guarantee at any point in time within each year from the start of the planning horizon, but only once. The effect is that the total value of the segregated fund is increased since the policyholder may lock in higher market gains as he has more reset decision points. In response to the well documented deficiencies of the lognormal model to capture the jumps experienced by stock markets, extensions were built which incorporate such jumps in the original model. The effect from incorporating such jumps is that the policyholder requires a higher return in order to choose to reset his maturity guarantee while the total value of the segregated fund is diminished due to the adverse effect of the negative jumps on the value of the underlying asset. 332
2	Inventory and Pricing Management of Perishable Products with Fixed and Random Shelf life Moshtagh, Mohammad January 2024 (has links) In this dissertation, we study inventory and revenue management problems for perishable products with customer choice considerations. This dissertation is composed of six chapters. In Chapter 1, we provide an overview and the motivation of problems. Subsequently, in Chapter 2, we propose a joint inventory and pricing problem for a perishable product with two freshness levels. After a stochastic time, a fresh item turns into a non-fresh item, which will expire after another random duration. Under an (r, Q) ordering policy and a markdown pricing strategy for non-fresh items, we formulate a model that maximizes the long-run average profit rate. We then reduce the model to a mixed-integer bilinear program (MIBLP), which can be solved efficiently by state-of-the-art commercial solvers. We also investigate the value of using a markdown strategy by establishing bounds on it under limiting regimes of some parameters such as large market demand. Further, we consider an Economic Order Quantity (EOQ)-type heuristic and bound the optimality gap asymptotically. Our results reveal that although the clearance strategy is always beneficial for the retailer, it may hurt customers who are willing to buy fresh products. In Chapter 3, we extend this model to the dynamic setting with multiple freshness levels of perishable products. Due to the complexity of the problem, we study the structural properties of value function and characterize the structure of the optimal policies by using the concept of anti-multimodularity. The structural analysis enables us to devise three novel and efficient heuristic policies. We further extend the model by considering donation policy and replenishment system. Our results imply that freshness-dependent pricing and dynamic pricing are two substitute strategies, while freshness-dependent pricing and donation strategy are two complement strategies for matching supply with demand. Also, high variability in product quality under dynamic pricing benefits the firm, but it may result in significant losses with a static pricing strategy. In Chapter 4, we study a joint inventory-pricing model for perishable items with fixed shelf lives to examine the effectiveness of different markdown policies, including single-stage, multiple-stage, and dynamic markdown policies both theoretically and numerically. We show that the value of multiple-stage markdown policies over single-stage ones asymptotically vanishes as the shelf life, market demand, or customers’ maximum willingness-to-pay increase. In chapter 5, with a focus on blood products, we optimize blood supply chain structure along with the operations optimization. Specifically, we study collection, production, replenishment, issuing, inventory, wastage, and substitution decisions under three different blood supply chain channel structures, i.e., the decentralized, centralized, and coordinated. We propose a bi-level optimization program to model the decentralized system and use the Karush–Kuhn–Tucker (KKT) optimality conditions to solve that. Although centralized systems result in a higher performance than decentralized systems, it is challenging to implement them. Thus, we design a novel coordination mechanism to motivate hospitals to operate in a centralized system. We also extend the model to the case with demand uncertainty and compare different issuing and replenishment policies. Analysis of a realistic case-study indicates that integration can significantly improve the performance of the system. Finally, Chapter 6 concludes this dissertation and proposes future research directions. / Dissertation / Doctor of Philosophy (PhD) Perishable Products Inventory Management Revenue Management Markov Decision Process (MDP)
3	Tutor de ensino: módulo de agente de avaliação do comportamento de alunos no aprendizado em cursos de engenharia / Teaching tutor: evaluation agent module students behavior learning in engineering courses. Santos, Valdomiro dos 15 June 2016 (has links) O comportamento e o desempenho acadêmico dos alunos em cursos de Engenharia é um campo fértil, interessante e crescente de investigação. Este trabalho apresenta os resultados obtidos na análise estocástica do progresso dos alunos em 15 cursos de graduação das diferentes opções oferecidas pela Escola Politécnica da Universidade de São Paulo (EPUSP). Para realizar esta análise, foi desenvolvido um agente de avaliação aplicando-se o Processo de Decisão de Markov (PDM). Esse agente de avaliação extrai observações parciais dos estados atuais das notas dos alunos nas disciplinas cursadas e possibilita a identificação de ações adequadas para modelar autonomamente o comportamento futuro do aluno. O algoritmo aplicado estima o esforço que representa o estado cognitivo do aluno baseado em uma relação de pares estado/ação, calculada com base nas notas obtidas ao longo do período compreendido entre os 2000 e 2010. O período em que um aluno obteve uma nota de aprovação torna possível o estudo temporal desse evento, o que permite a utilização de métodos de agrupamento de dados, como os modelos ocultos de Markov, para a avaliação do comportamento das notas dos alunos durante os cursos de Engenharia. O presente estudo se fundamentou no agrupamento das notas dos alunos em três níveis para a classificação dos comportamentos das notas desses alunos. / The students behavior and academic performance in engineering programs is a fruitful field, interesting and crescent research. This paper presents the results of student progress obtained in stochastic analysis in 15 undergraduate courses of offered by the Escola Politécnica of the São Paulo University (EPUSP). An evaluation agent was developed to perform this analysis, applying the Markov Decision Process (PDM). This evaluation agent extracts partial observations of the current state of students\' grades in courses taken, enabling the identification of appropriate actions to autonomously shape the student future behaviour. The algorithm applied estimates the effort that represents the cognitive state of the student on states/action, based on the grades obtained during the period between 2000 and 2010. The period which a student received a passing grade makes possible the temporal study of this event, allowing the use of data grouping methods, such as hidden Markov models for the evaluation of the behaviours of students\' grades for the courses of engineering. This study is based on students grades at three different levels, classifying the behaviour of the notes. Agente de avaliação Comportamento (Identificação) Engineering education Ensino em engenharia Evaluation agent Markov Decision Process (MDP) Processo de decisão de Markov (PDM)
4	Tutor de ensino: módulo de agente de avaliação do comportamento de alunos no aprendizado em cursos de engenharia / Teaching tutor: evaluation agent module students behavior learning in engineering courses. Valdomiro dos Santos 15 June 2016 (has links) O comportamento e o desempenho acadêmico dos alunos em cursos de Engenharia é um campo fértil, interessante e crescente de investigação. Este trabalho apresenta os resultados obtidos na análise estocástica do progresso dos alunos em 15 cursos de graduação das diferentes opções oferecidas pela Escola Politécnica da Universidade de São Paulo (EPUSP). Para realizar esta análise, foi desenvolvido um agente de avaliação aplicando-se o Processo de Decisão de Markov (PDM). Esse agente de avaliação extrai observações parciais dos estados atuais das notas dos alunos nas disciplinas cursadas e possibilita a identificação de ações adequadas para modelar autonomamente o comportamento futuro do aluno. O algoritmo aplicado estima o esforço que representa o estado cognitivo do aluno baseado em uma relação de pares estado/ação, calculada com base nas notas obtidas ao longo do período compreendido entre os 2000 e 2010. O período em que um aluno obteve uma nota de aprovação torna possível o estudo temporal desse evento, o que permite a utilização de métodos de agrupamento de dados, como os modelos ocultos de Markov, para a avaliação do comportamento das notas dos alunos durante os cursos de Engenharia. O presente estudo se fundamentou no agrupamento das notas dos alunos em três níveis para a classificação dos comportamentos das notas desses alunos. / The students behavior and academic performance in engineering programs is a fruitful field, interesting and crescent research. This paper presents the results of student progress obtained in stochastic analysis in 15 undergraduate courses of offered by the Escola Politécnica of the São Paulo University (EPUSP). An evaluation agent was developed to perform this analysis, applying the Markov Decision Process (PDM). This evaluation agent extracts partial observations of the current state of students\' grades in courses taken, enabling the identification of appropriate actions to autonomously shape the student future behaviour. The algorithm applied estimates the effort that represents the cognitive state of the student on states/action, based on the grades obtained during the period between 2000 and 2010. The period which a student received a passing grade makes possible the temporal study of this event, allowing the use of data grouping methods, such as hidden Markov models for the evaluation of the behaviours of students\' grades for the courses of engineering. This study is based on students grades at three different levels, classifying the behaviour of the notes. Agente de avaliação Comportamento (Identificação) Ensino em engenharia Processo de decisão de Markov (PDM) Engineering education Evaluation agent Markov Decision Process (MDP)
5	Contrôle adaptatif des feux de signalisation dans les carrefours : modélisation du système de trafic dynamique et approches de résolution / Adaptative traffic signal control at intersections : dynamic traffic system modeling and algorithms Yin, Biao 11 December 2015 (has links) La régulation adaptative des feux de signalisation est un problème très important. Beaucoup de chercheurs travaillent continuellement afin de résoudre les problémes liés à l’embouteillage dans les intersections urbaines. Il devient par conséquent très utile d’employer des algorithmes intelligents afin d’améliorer les performances de régulation et la qualité du service. Dans cette thèse, nous essayons d'étudier ce problème d’une part à travers une modèlisation microscopique et dynamique en temps discret, et d’autre part en explorant plusieurs approches de résoltion pour une intersection isolée ainsi que pour un réseau distribué d'intersections.La première partie se concentre sur la modélisation dynamique des problèmes des feux de signalisation ainsi que de la charge du réseau d’intersections. Le mode de la “séquence de phase adaptative” (APS) dans un plan de feux est d'abord considéré. Quant à la modélisation du contrôle des feux aux intersections, elle est formulée grâce à un processus décisionnel de markov (MDP). En particulier, la notion de “l'état du système accordable” est alors proposée pour la coordination du réseau de trafic. En outre, un nouveau modèle de “véhicule-suiveur” est proposé pour l'environnement de trafic. En se basant sur la modélisation proposée, les méthodes de contrôle des feux dans cette thèse comportent des algorithmes optimaux et quasi-optimaux. Deux algorithmes exacts de résolution basées sur la programmation dynamique (DP) sont alors étudiés et les résultats montrent certaines limites de cette solution DP surtout dans quelques cas complexes où l'espace d'états est assez important. En raison de l’importance du temps d’execution de l'algorithme DP et du manque d'information du modèle (notamment l’information exacte relative à l’arrivée des véhicules à l’intersection), nous avons opté pour un algorithme de programmation dynamique approximative (ADP). Enfin, un algorithme quasi-optimal utilisant l'ADP combinée à la méthode d’amélioration RLS-TD (λ) est choisi. Dans les simulations, en particulier avec l'intégration du mode de phase APS, l'algorithme proposé montre de bons résultats notamment en terme de performance et d'efficacité de calcul. / Adaptive traffic signal control is a decision making optimization problem. People address this crucial problem constantly in order to solve the traffic congestion at urban intersections. It is very popular to use intelligent algorithms to improve control performances, such as traffic delay. In the thesis, we try to study this problem comprehensively with a microscopic and dynamic model in discrete-time, and investigate the related algorithms both for isolated intersection and distributed network control. At first, we focus on dynamic modeling for adaptive traffic signal control and network loading problems. The proposed adaptive phase sequence (APS) mode is highlighted as one of the signal phase control mechanisms. As for the modeling of signal control at intersections, problems are fundamentally formulated by Markov decision process (MDP), especially the concept of tunable system state is proposed for the traffic network coordination. Moreover, a new vehicle-following model supports for the network loading environment.Based on the model, signal control methods in the thesis are studied by optimal and near-optimal algorithms in turn. Two exact DP algorithms are investigated and results show some limitations of DP solution when large state space appears in complex cases. Because of the computational burden and unknown model information in dynamic programming (DP), it is suggested to use an approximate dynamic programming (ADP). Finally, the online near-optimal algorithm using ADP with RLS-TD(λ) is confirmed. In simulation experiments, especially with the integration of APS, the proposed algorithm indicates a great advantage in performance measures and computation efficiency. Contrôle de trafic Intersections Processus Décisionnel Markovien (MDP) Programmation dynamique (DP) Traffic control Intersections Markov decision process (MDP) Dynamic programming (DP) 620 388
6	ADAPTIVE MANAGEMENT OF MIXED-SPECIES HARDWOOD FORESTS UNDER RISK AND UNCERTAINTY Vamsi K Vipparla (9174710) 28 July 2020 (has links) <p>Forest management involves numerous stochastic elements. To sustainably manage forest resources, it is crucial to acknowledge these sources as uncertainty or risk, and incorporate them in adaptive decision-making. Here, I developed several stochastic programming models in the form of passive or active adaptive management for natural mixed-species hardwood forests in Indiana. I demonstrated how to use these tools to deal with time-invariant and time-variant natural disturbances in optimal planning of harvests.</p> <p> Markov decision process (MDP) models were first constructed based upon stochastic simulations of an empirical forest growth model for the forest type of interest. Then, they were optimized to seek the optimal or near-optimal harvesting decisions while considering risk and uncertainty in natural disturbances. In particular, a classic expected-criterion infinite-horizon MDP model was first used as a passive adaptive management tool to determine the optimal action for a specific forest state when the probabilities of forest transition remained constant over time. Next, a two-stage non-stationary MDP model combined with a rolling-horizon heuristic was developed, which allowed information update and then adjustments of decisions accordingly. It was used to determine active adaptive harvesting decisions for a three-decade planning horizon during which natural disturbance probabilities may be altered by climate change.</p> <p> The empirical results can be used to make some useful quantitative management recommendations, and shed light on the impacts of decision-making on the forests and timber yield when some stochastic elements in forest management changed. In general, the increase in the likelihood of damages by natural disturbance to forests would cause more aggressive decisions if timber production was the management objective. When windthrow did not pose a threat to mixed hardwood forests, the average optimal yield of sawtimber was estimated to be 1,376 ft<sup>3</sup>/ac/acre, while the residual basal area was 88 ft<sup>2</sup>/ac. Assuming a 10 percent per decade probability of windthrow that would reduce the stand basal area considerably, the optimal sawtimber yield per decade would decline by 17%, but the residual basal area would be lowered only by 5%. Assuming that the frequency of windthrow increased in the magnitude of 5% every decade under climate change, the average sawtimber yield would be reduced by 31%, with an average residual basal area slightly around 76 ft<sup>2</sup>/ac. For validation purpose, I compared the total sawtimber yield in three decades obtained from the heuristic approach to that of a three-decade MDP model making <i>ex post</i> decisions. The heuristic approach was proved to provide a satisfactory result which was only about 18% lower than the actual optimum.</p> These findings highlight the need for landowners, both private and public, to monitor forests frequently and use flexible planning approaches in order to anticipate for climate change impacts. They also suggest that climate change may considerably lower sawtimber yield, causing a concerning decline in the timber supply in Indiana. Future improvements of the approaches used here are recommended, including addressing the changing stumpage market condition and developing a more flexible rolling-horizon heuristic approach. Optimisation Forestry Management and Environment optimisation forest management Markov decision process (MDP) climate change impact assessment Risk and uncertainty Decision Making heuristic algorithm rolling horizon framework adaptive management frameworks adaptive management approach active adaptive management
7	Approximate Dynamic Programming and Reinforcement Learning - Algorithms, Analysis and an Application Lakshminarayanan, Chandrashekar January 2015 (has links) (PDF) Problems involving optimal sequential making in uncertain dynamic systems arise in domains such as engineering, science and economics. Such problems can often be cast in the framework of Markov Decision Process (MDP). Solving an MDP requires computing the optimal value function and the optimal policy. The idea of dynamic programming (DP) and the Bellman equation (BE) are at the heart of solution methods. The three important exact DP methods are value iteration, policy iteration and linear programming. The exact DP methods compute the optimal value function and the optimal policy. However, the exact DP methods are inadequate in practice because the state space is often large and in practice, one might have to resort to approximate methods that compute sub-optimal policies. Further, in certain cases, the system observations are known only in the form of noisy samples and we need to design algorithms that learn from these samples. In this thesis we study interesting theoretical questions pertaining to approximate and learning algorithms, and also present an interesting application of MDPs in the domain of crowd sourcing. Approximate Dynamic Programming (ADP) methods handle the issue of large state space by computing an approximate value function and/or a sub-optimal policy. In this thesis, we are concerned with conditions that result in provably good policies. Motivated by the limitations of the PBE in the conventional linear algebra, we study the PBE in the (min, +) linear algebra. It is a well known fact that deterministic optimal control problems with cost/reward criterion are (min, +)/(max, +) linear and ADP methods have been developed for such systems in literature. However, it is straightforward to show that inﬁnite horizon discounted reward/cost MDPs are neither (min, +) nor (max, +) linear. We develop novel ADP schemes namely the Approximate Q Iteration (AQI) and Variational Approximate Q Iteration (VAQI), where the approximate solution is a (min, +) linear combination of a set of basis functions whose span constitutes a subsemimodule. We show that the new ADP methods are convergent and we present a bound on the performance of the sub-optimal policy. The Approximate Linear Program (ALP) makes use of linear function approximation (LFA) and oﬀers theoretical performance guarantees. Nevertheless, the ALP is diﬃcult to solve due to the presence of a large number of constraints and in practice, a reduced linear program (RLP) is solved instead. The RLP has a tractable number of constraints sampled from the original constraints of the ALP. Though the RLP is known to perform well in experiments, theoretical guarantees are available only for a speciﬁc RLP obtained under idealized assumptions. In this thesis, we generalize the RLP to deﬁne a generalized reduced linear program (GRLP) which has a tractable number of constraints that are obtained as positive linear combinations of the original constraints of the ALP. The main contribution here is the novel theoretical framework developed to obtain error bounds for any given GRLP. Reinforcement Learning (RL) algorithms can be viewed as sample trajectory based solution methods for solving MDPs. Typically, RL algorithms that make use of stochastic approximation (SA) are iterative schemes taking small steps towards the desired value at each iteration. Actor-Critic algorithms form an important sub-class of RL algorithms, wherein, the critic is responsible for policy evaluation and the actor is responsible for policy improvement. The actor and critic iterations have deferent step-size schedules, in particular, the step-sizes used by the actor updates have to be generally much smaller than those used by the critic updates. Such SA schemes that use deferent step-size schedules for deferent sets of iterates are known as multitimescale stochastic approximation schemes. One of the most important conditions required to ensure the convergence of the iterates of a multi-timescale SA scheme is that the iterates need to be stable, i.e., they should be uniformly bounded almost surely. However, the conditions that imply the stability of the iterates in a multi-timescale SA scheme have not been well established. In this thesis, we provide veritable conditions that imply stability of two timescale stochastic approximation schemes. As an example, we also demonstrate that the stability of a widely used actor-critic RL algorithm follows from our analysis. Crowd sourcing (crowd) is a new mode of organizing work in multiple groups of smaller chunks of tasks and outsourcing them to a distributed and large group of people in the form of an open call. Recently, crowd sourcing has become a major pool for human intelligence tasks (HITs) such as image labeling, form digitization, natural language processing, machine translation evaluation and user surveys. Large organizations/requesters are increasingly interested in crowd sourcing the HITs generated out of their internal requirements. Task starvation leads to huge variation in the completion times of the tasks posted on to the crowd. This is an issue for frequent requesters desiring predictability in the completion times of tasks speciﬁed in terms of percentage of tasks completed within a stipulated amount of time. An important task attribute that aﬀects the completion time of a task is its price. However, a pricing policy that does not take the dynamics of the crowd into account might fail to achieve the desired predictability in completion times. Here, we make use of the MDP framework to compute a pricing policy that achieves predictable completion times in simulations as well as real world experiments. Dynamic Programming (DP) Markov Decision Process (MDP) Bellman Equation CBE Machine Learning Bellman Operator Crowdsourcing Approximate Linear Programming (ALP) Reinforcement Learning Stochastic Approximation Approximate Dynamic Programming (ADP) Approximate Linear Program Linear Function Approximation (LFA) Reduced Linear Program (RLP) Crowd Sourcing Computer Science and Automation
8	Network Utility Maximization Based on Information Freshness Cho-Hsin Tsai (12225227) 20 April 2022 (has links) <p>It is predicted that there would be 41.6 billion IoT devices by 2025, which has kindled new interests on the timing coordination between sensors and controllers, i.e., how to use the waiting time to improve the performance. Sun et al. showed that a <i>controller</i> can strictly improve the data freshness, the so-called Age-of-Information (AoI), via careful scheduling designs. The optimal waiting policy for the <i>sensor</i> side was later characterized in the context of remote estimation. The first part of this work develops the jointly optimal sensor/controller waiting policy. It generalizes the above two important results in that not only do we consider joint sensor/controller designs, but we also assume random delay in both the forward and feedback directions. </p> <p> </p> <p>The second part of the work revisits and significantly strengthens the seminal results of Sun et al on the following fronts: (i) When designing the optimal offline schemes with full knowledge of the delay distributions, a new <i>fixed-point-based</i> method is proposed with <i>quadratic convergence rate</i>; (ii) When the distributional knowledge is unavailable, two new low-complexity online algorithms are proposed, which provably attain the optimal average AoI penalty; and (iii) the online schemes also admit a modular architecture, which allows the designer to <i>upgrade</i> certain components to handle additional practical challenges. Two such upgrades are proposed: (iii.1) the AoI penalty function incurred at the destination is unknown to the source node and must also be estimated on the fly, and (iii.2) the unknown delay distribution is Markovian instead of i.i.d. </p> <p> </p> <p>With the exponential growth of interconnected IoT devices and the increasing risk of excessive resource consumption in mind, the third part of this work derives an optimal joint cost-and-AoI minimization solution for multiple coexisting source-destination (S-D) pairs. The results admit a new <i>AoI-market-price</i>-based interpretation and are applicable to the setting of (i) general heterogeneous AoI penalty functions and Markov delay distributions for each S-D pair, and (ii) a general network cost function of aggregate throughput of all S-D pairs. </p> <p> </p> <p>In each part of this work, extensive simulation is used to demonstrate the superior performance of the proposed schemes. The discussion on analytical as well as numerical results sheds some light on designing practical network utility maximization protocols.</p> Computer Engineering Information Systems Coding and Information Theory Networking and Communications Information Engineering and Theory Age-of-information (AoI) Data freshness Information freshness Remote estimation Online algorithm Fixed-point equation Stochastic approximation Stochastic control Information update system Markov decision process (MDP) Network utility maximization Information theory Wireless networking Networking Communication systems Communication theory
9	ENABLING RIDE-SHARING IN ON-DEMAND AIR SERVICE OPERATIONS THROUGH REINFORCEMENT LEARNING Apoorv Maheshwari (11564572) 22 November 2021 (has links) The convergence of various technological and operational advancements has reinstated the interest in On-Demand Air Service (ODAS) as a viable mode of transportation. ODAS enables an end-user to be transported in an aircraft between their desired origin and destination at their preferred time without advance notice. Industry, academia, and the government organizations are collaborating to create technology solutions suited for large-scale implementation of this mode of transportation. Market studies suggest reducing vehicle operating cost per passenger as one of the biggest enablers of this market. To enable ODAS, an ODAS operator controls a fleet of aircraft that are deployed across a set of nodes (e.g., airports, vertiports) to satisfy end-user transportation requests. There is a gap in the literature for a tractable and online methodology that can enable ride-sharing in the on-demand operations while maintaining a publicly acceptable level of service (such as with low waiting time). The need for an approach that not only supports a dynamic-stochastic formulation but can also handle uncertainty with unknowable properties, drives me towards the field of Reinforcement Learning (RL). In this work, a novel two-layer hierarchical RL framework is proposed that can distribute a fleet of aircraft across a nodal network as well as perform real-time scheduling for an ODAS operator. The top layer of the framework - the Fleet Distributor - is modeled as a Partially Observable Markov Decision Process whereas the lower layer - the Trip Request Manager - is modeled as a Semi-Markov Decision Process. This framework is successfully demonstrated and assessed through various studies for a hypothetical ODAS operator in the Chicago region. This approach provides a new way of solving fleet distribution and scheduling problems in aviation. It also bridges the gap between the state-of-the-art RL advancements and node-based transportation network problems. Moreover, this work provides a non-proprietary approach to reasonably model ODAS operations that can be leveraged by researchers and policy makers. Knowledge representation and reasoning Operations research Advanced Air Mobility reinforcement learning artificial intelligence operations strategy scheduling Aerospace and defense industry Aeronautics. machine learning-based Air Transportation system Ride-sharing Markov decision process (MDP) uncertainty and fluctuations Aerospace Engineering Operations Research

Search results