• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 8
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 12
  • 12
  • 4
  • 4
  • 3
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

On the optimal stopping time of learning

Fedyszak-Koszela, Anna January 2008 (has links)
<p> The goal of this thesis is to study the economics of computational learning. Attention is also paid to applications of computational learning models, especially Valiant's so-called `probably approximately correctly' (PAC) learning model, in econometric situations.</p><p>Specifically, an economically reasonable stopping time model of learning is the subject of two attached papers. In the rst paper, Paper A, the economics of PAC learning are considered. It is shown how a general form of the optimal stopping time bounds can be achieved using the PAC convergence rates for a `pessimistic-rational' learner in the most standard binary case of passive supervised PAC model of finite Vapnik-Chervonenkis (VC) dimension.</p><p> </p><p>The second paper, Paper B, states precisely and improves the ideas introduced in Paper A and tests them in a specific and mathematically simple case. Using the maxmin procedure of Gilboa and Schmeidler the bounds for the stopping time are expressed in terms of the largest expected error of recall, and thus, effectively, in terms of the least expected reward. The problem of locating a real number θ by testing whether x<sub>i</sub> ≤ θ , with x<sub>i</sub> drawn from an calculated for a range of term rates, sample costs and rewards/penalties from a recall ae included. The standard econometric situations, such as product promotion, market research, credit risk assessment, and bargaining and tenders, where such bounds could be of interest, are pointed. </p><p>These two papers are the essence of this thesis, and form it togheter with an introduction to the subject of learning.</p> / <p>Målet med denna avhandling är att studera optimering av inlärning när det finns kostnader. Speciellt studerar jag Valiants så kallade PAC-inlärningsmodell  (Probably Approximately Correctly), ofta använd inom datavetenskap. I två artiklar behandlar jag hur länge, ur ekonomisk synvinkel, inlärningsperioden bör fortsätta.</p><p>I den första artikeln visar vi hur en generell form av begränsningar av den optimala inlärningsperioden kan fås med hjälp av PAC-konvergenshastigheten för en ’pessimistiskt rationell’ studerande (i det vanligaste binära fallet av passiv PAC-inlärningsmodell med ändlig VC-dimension).</p><p>I den andra artikeln fördjupar och förbättrar vi idéerna från den första artikeln, och testar dem i en specifik situation som är matematiskt enkel. Med hjälp av Gilboa – Schmeidlers max - minprocedur  uttrycker vi begränsningarna av den optimala inlärningsperioden som funktion av det största förväntade felet och därmed som funktion av den minsta förväntade belöningen. Vi diskuterar problemet med att hitta ett reellt tal θ genom testning av huruvida x<sub>i</sub> ≤ θ, där x<sub>i</sub> dras från en okänd fördelning. Här tar vi också upp exempel på begränsningar av inlärningsperioden, beräknade för en mängd av diskontovärden, stickprovskostnader och belöning/straff för erinran, samt en del vanliga ekonometriska situationer där sådana begränsningar är av intresse, såsom marknadsföring av produkter, marknadsanalys, kreditriskskattning och offertförhandling.</p><p>Avhandlingen består i huvuddel av dessa två artiklar samt en kort introduktion till ekonomiska, matematiska och datavetenskapliga inlärningsmodeller.</p><p> </p>
2

On the optimal stopping time of learning

Fedyszak-Koszela, Anna January 2008 (has links)
The goal of this thesis is to study the economics of computational learning. Attention is also paid to applications of computational learning models, especially Valiant's so-called `probably approximately correctly' (PAC) learning model, in econometric situations. Specifically, an economically reasonable stopping time model of learning is the subject of two attached papers. In the rst paper, Paper A, the economics of PAC learning are considered. It is shown how a general form of the optimal stopping time bounds can be achieved using the PAC convergence rates for a `pessimistic-rational' learner in the most standard binary case of passive supervised PAC model of finite Vapnik-Chervonenkis (VC) dimension.   The second paper, Paper B, states precisely and improves the ideas introduced in Paper A and tests them in a specific and mathematically simple case. Using the maxmin procedure of Gilboa and Schmeidler the bounds for the stopping time are expressed in terms of the largest expected error of recall, and thus, effectively, in terms of the least expected reward. The problem of locating a real number θ by testing whether xi ≤ θ , with xi drawn from an calculated for a range of term rates, sample costs and rewards/penalties from a recall ae included. The standard econometric situations, such as product promotion, market research, credit risk assessment, and bargaining and tenders, where such bounds could be of interest, are pointed.  These two papers are the essence of this thesis, and form it togheter with an introduction to the subject of learning. / Målet med denna avhandling är att studera optimering av inlärning när det finns kostnader. Speciellt studerar jag Valiants så kallade PAC-inlärningsmodell  (Probably Approximately Correctly), ofta använd inom datavetenskap. I två artiklar behandlar jag hur länge, ur ekonomisk synvinkel, inlärningsperioden bör fortsätta. I den första artikeln visar vi hur en generell form av begränsningar av den optimala inlärningsperioden kan fås med hjälp av PAC-konvergenshastigheten för en ’pessimistiskt rationell’ studerande (i det vanligaste binära fallet av passiv PAC-inlärningsmodell med ändlig VC-dimension). I den andra artikeln fördjupar och förbättrar vi idéerna från den första artikeln, och testar dem i en specifik situation som är matematiskt enkel. Med hjälp av Gilboa – Schmeidlers max - minprocedur  uttrycker vi begränsningarna av den optimala inlärningsperioden som funktion av det största förväntade felet och därmed som funktion av den minsta förväntade belöningen. Vi diskuterar problemet med att hitta ett reellt tal θ genom testning av huruvida xi ≤ θ, där xi dras från en okänd fördelning. Här tar vi också upp exempel på begränsningar av inlärningsperioden, beräknade för en mängd av diskontovärden, stickprovskostnader och belöning/straff för erinran, samt en del vanliga ekonometriska situationer där sådana begränsningar är av intresse, såsom marknadsföring av produkter, marknadsanalys, kreditriskskattning och offertförhandling. Avhandlingen består i huvuddel av dessa två artiklar samt en kort introduktion till ekonomiska, matematiska och datavetenskapliga inlärningsmodeller.
3

Dynamic Programming Approach to Price American Options

Yeh, Yun-Hsuan 06 July 2012 (has links)
We propose a dynamic programming (DP) approach for pricing American options over a finite time horizon. We model uncertainty in stock price that follows geometric Brownian motion (GBM) and let interest rate and volatility be fixed. A procedure based on dynamic programming combined with piecewise linear interpolation approximation is developed to price the value of options. And we introduce the free boundary problem into our model. Numerical experiments illustrate the relation between value of option and volatility.
4

Stopping Times Related to Trading Strategies

Abramov, Vilen 25 April 2008 (has links)
No description available.
5

Discrétisation de processus à des temps d’arrêt et Quantification d'incertitude pour des algorithmes stochastiques / Discretization of processes at stopping times and Uncertainty quantification of stochastic approximation limits

Stazhynski, Uladzislau 12 December 2018 (has links)
Cette thèse contient deux parties qui étudient deux sujets différents. Les Chapitres 1-4 sont consacrés aux problèmes de discrétisation de processus à des temps d’arrêt. Dans le Chapitre 1 on étudie l'erreur de discrétisation optimale pour des intégrales stochastiques par rapport à une semimartingale brownienne multidimensionnelle continue. Dans ce cadre on établit une borne inférieure trajectorielle pour la variation quadratique renormalisée de l'erreur. On fournit une suite de temps d’arrêt qui donne une discrétisation asymptotiquement optimale. Cette suite est définie comme temps de sortie d'ellipsoïdes aléatoires par la semimartingale. Par rapport aux résultats précédents on permet une classe de semimartingales assez large. On démontre qui la borne inférieure est exacte. Dans le Chapitre 2 on étudie la version adaptative au modèle de la discrétisation optimale d’intégrales stochastique. Dans le Chapitre 1 la construction de la stratégie optimale utilise la connaissance du coefficient de diffusion de la semimartingale considérée. Dans ce travail on établit une stratégie de discrétisation asymptotiquement optimale qui est adaptative au modèle et n'utilise pas aucune information sur le modèle. On démontre l'optimalité pour une classe de grilles de discrétisation assez générale basée sur les technique de noyau pour l'estimation adaptative. Dans le Chapitre 3 on étudie la convergence en loi des erreurs de discrétisation renormalisées de processus d’Itô pour une classe concrète et assez générale de grilles de discrétisation données par des temps d’arrêt. Les travaux précédents sur le sujet considèrent seulement le cas de dimension 1. En plus ils concentrent sur des cas particuliers des grilles, ou démontrent des résultats sous des hypothèses abstraites. Dans notre travail on donne explicitement la distribution limite sous une forme claire et simple, les résultats sont démontré dans le cas multidimensionnel pour le processus et pour l'erreur de discrétisation. Dans le Chapitre 4 on étudie le problème d'estimation paramétrique pour des processus de diffusion basée sur des observations à temps d’arrêt. Les travaux précédents sur le sujet considèrent que des temps d'observation déterministes, fortement prévisibles ou aléatoires indépendants du processus. Sous des hypothèses faibles on construit une suite d'estimateurs consistante pour une classe large de grilles d'observation données par des temps d’arrêt. On effectue une analyse asymptotique de l'erreur d'estimation. En outre, dans le cas du paramètre de dimension 1, pour toute suite d'estimateurs qui vérifie un TCL sans biais, on démontre une borne inférieure uniforme pour la variance asymptotique; on montre que cette borne est exacte. Les Chapitres 5-6 sont consacrés au problème de quantification d'incertitude pour des limites d'approximation stochastique. Dans le Chapitre 5 on analyse la quantification d'incertitude pour des limites d'approximation stochastique (SA). Dans notre cadre la limite est définie comme un zéro d'une fonction donnée par une espérance. Cette espérance est prise par rapport à une variable aléatoire pour laquelle le modèle est supposé de dépendre d'un paramètre incertain. On considère la limite de SA comme une fonction de cette paramètre. On introduit un algorithme qui s'appelle USA (Uncertainty for SA). C'est une procédure en dimension croissante pour calculer les coefficients de base d'expansion de chaos de cette fonction dans une base d'un espace de Hilbert bien choisi. La convergence de USA dans cet espace de Hilbert est démontré. Dans le Chapitre 6 on analyse le taux de convergence dans L2 de l'algorithme USA développé dans le Chapitre 5. L'analyse est non trivial à cause de la dimension infinie de la procédure. Le taux obtenu dépend du modèle et des paramètres utilisés dans l'algorithme USA. Sa connaissance permet d'optimiser la vitesse de croissance de la dimension dans USA. / This thesis consists of two parts which study two separate subjects. Chapters 1-4 are devoted to the problem of processes discretization at stopping times. In Chapter 1 we study the optimal discretization error of stochastic integrals, driven by a multidimensional continuous Brownian semimartingale. In this setting we establish a path wise lower bound for the renormalized quadratic variation of the error and we provide a sequence of discretization stopping times, which is asymptotically optimal. The latter is defined as hitting times of random ellipsoids by the semimartingale at hand. In comparison with previous available results, we allow a quite large class of semimartingales and we prove that the asymptotic lower bound is attainable. In Chapter 2 we study the model-adaptive optimal discretization error of stochastic integrals. In Chapter 1 the construction of the optimal strategy involved the knowledge about the diffusion coefficient of the semimartingale under study. In this work we provide a model-adaptive asymptotically optimal discretization strategy that does not require any prior knowledge about the model. In Chapter 3 we study the convergence in distribution of renormalized discretization errors of Ito processes for a concrete general class of random discretization grids given by stopping times. Previous works on the subject only treat the case of dimension 1. Moreover they either focus on particular cases of grids, or provide results under quite abstract assumptions with implicitly specified limit distribution. At the contrast we provide explicitly the limit distribution in a tractable form in terms of the underlying model. The results hold both for multidimensional processes and general multidimensional error terms. In Chapter 4 we study the problem of parametric inference for diffusions based on observations at random stopping times. We work in the asymptotic framework of high frequency data over a fixed horizon. Previous works on the subject consider only deterministic, strongly predictable or random, independent of the process, observation times, and do not cover our setting. Under mild assumptions we construct a consistent sequence of estimators, for a large class of stopping time observation grids. Further we carry out the asymptotic analysis of the estimation error and establish a Central Limit Theorem (CLT) with a mixed Gaussian limit. In addition, in the case of a 1-dimensional parameter, for any sequence of estimators verifying CLT conditions without bias, we prove a uniform a.s. lower bound on the asymptotic variance, and show that this bound is sharp. In Chapters 5-6 we study the problem of uncertainty quantification for stochastic approximation limits. In Chapter 5 we analyze the uncertainty quantification for the limit of a Stochastic Approximation (SA) algorithm. In our setup, this limit is defined as the zero of a function given by an expectation. The expectation is taken w.r.t. a random variable for which the model is assumed to depend on an uncertain parameter. We consider the SA limit as a function of this parameter. We introduce the so-called Uncertainty for SA (USA) algorithm, an SA algorithm in increasing dimension for computing the basis coefficients of a chaos expansion of this function on an orthogonal basis of a suitable Hilbert space. The almost-sure and Lp convergences of USA, in the Hilbert space, are established under mild, tractable conditions. In Chapter 6 we analyse the L2-convergence rate of the USA algorithm designed in Chapter 5.The analysis is non-trivial due to infinite dimensionality of the procedure. Moreover, our setting is not covered by the previous works on infinite dimensional SA. The obtained rate depends non-trivially on the model and the design parameters of the algorithm. Its knowledge enables optimization of the dimension growth speed in the USA algorithm, which is the key factor of its efficient performance.
6

Information on a default time : Brownian bridges on a stochastic intervals and enlargement of filtrations / Information sur le temps de défaut : ponts browniens sur des intervalles stochastiques et grossissement de filtrations

Bedini, Matteo 12 October 2012 (has links)
Dans ce travail de thèse le processus d'information concernant un instant de défaut τ dans un modèle de risque de crédit est décrit par un pont brownien sur l'intervalle stochastique [0, τ]. Un tel processus de pont est caractérisé comme plus adapté dans la modélisation que le modèle classique considérant l'indicatrice I[0,τ]. Après l'étude des formules de Bayes associées, cette approche de modélisation de l'information concernant le temps de défaut est reliée avec d'autres informations sur le marché financier. Ceci est fait à l'aide de la théorie du grossissement de filtration, où la filtration générée par le processus d'information est élargie par la filtration de référence décrivant d'autres informations n'étant pas directement liées avec le défaut. Une attention particulière est consacrée à la classification du temps de défaut par rapport à la filtration minimale mais également à la filtration élargie. Des conditions suffisantes, sous lesquelles τ est totalement inaccessible, sont discutées, mais également un exemple est donné dans lequel τ évite les temps d'arrêt, est totalement inaccessible par rapport à la filtration minimale et prévisible par rapport à la filtration élargie. Enfin, des contrats financiers comme, par exemple, des obligations privée et des crédits default swaps, sont étudiés dans le contexte décrit ci-dessus. / In this PhD thesis the information process concerning a default time τ in a credit risk model is described by a Brownian bridge over the random time interval [0, τ]. Such a bridge process is characterised as to be a more adapted model than the classical one considering the indicator function I[0,τ]. After the study of related Bayes formulas, this approach of modelling information concerning the default time is related with other financial information. This is done with the help of the theory of enlargement of filtration, where the filtration generated by the information process is enlarged with a reference filtration modelling other information not directly associated with the default. A particular attention is paid to the classification of the default time with respect to the minimal filtration but also with respect to the enlarged filtration. Sufficient conditions under which τ is totally inaccessible are discussed, but also an example is given of a τ avoiding the stopping times of the reference filtration, which is totally inaccessible with respect to its own filtration and predictable with respect to the enlarged filtration. Finally, common financial contracts like defaultable bonds and credit default swaps are considered in the above described settings.
7

新建房屋最適銷售時機--融資決策與實質選擇權的配合

李克誠, Li, Philip K.C. Unknown Date (has links)
以前在台灣房地產開發市場上主要的房屋銷售模式是預售制度,這是受限於當時政治、經濟的環境條件下,所形成的特殊制度,主要的原因就是需要從市場中,獲得足夠的營運週轉資金;但是台灣的房地產市場在這幾年來逐漸轉變,已經出現為數不少的成屋銷售個案,主要著眼於當房地產市場景氣上揚時,延遲銷售能夠使專案獲得更大的報酬,而且當房地產專案融資的取得逐漸放寬,資金來源不在成為限制條件時,預售房屋可能已不再是唯一的銷售模式,且可能不再是最適銷售模式,但市場上房地產業者仍延續以前的思考模式,以融資比例的大小(有錢沒錢),作為判斷銷售時機的決策依據。本研究所想要研究的方向是最適銷售時機的選擇與融資決策是否會影響銷售時機的選擇,在各種不同市場條件下最適銷售時機與選擇權價值的變化。 本研究以實質選擇權(Real Options)模式探討新建房屋最適銷售時機,但以應用以前學者所推導的模式並不做模式的推導;首先以建立市場中專案營收的模式與建立實質選擇權決策模式,模擬房地產業者營運情境,並以隨機亂數帶入房價與融資利率模擬模式中,以模擬房地產市場中房價與融資利率,將模擬結果帶入所建立的模式中,模擬不同房地產市場條件下專案的營收,並藉由不同的決策值所模擬的專案營收,探討房地產市場中新建房屋的最適銷售時機的選擇與選擇權的價值。並且將模式中所應用的各變數予以獨立(在其他條件不變下,僅改變該變數)做敏感性分析,探討各模式中變數對於選擇最適銷售時機與實質選擇價值變化所產生的影響,以瞭解房地產市場中各外生變數,對於房地產市場新建房屋最適銷售時機與實質選擇權價值所可能造成的影響,與所應該注意的涵義。 第壹章 緒論 第壹節 研究動機與目的 1 第貳節 研究範圍與限制 5 第參節 研究架構 7 第貳章 產業分析與個案訪談 第壹節 銷售時機 11 第貳節 不動產金融 25 第參節 文獻探討與個案研究對本研究的涵意 28 第參章 文獻探討 第壹節 最適銷售時機模式 32 第貳節 文獻探討與個案研究對本研究的涵意 51 第肆章 模式建構與模式設計 第壹節 最適銷售時機 56 第貳節 研究設計 67 第伍章 實證結果分析 第壹節 融資決策與最適銷售時機 75 第貳節 實質選擇權價值敏感性分析 81 第參節 最適銷售時機選擇敏感性分析 90 第陸章 結論與建議 第壹節 研究結果涵義 104 第貳節 建議 110 參考文獻 中文部份 114 英文部份 115
8

Detection of the Change Point and Optimal Stopping Time by Using Control Charts on Energy Derivatives

AL, Cihan, Koroglu, Kubra January 2011 (has links)
No description available.
9

Safe Stopping Distances and Times in Industrial Robotics

Smith, Hudson Cahill 20 December 2023 (has links)
This study presents a procedure for the estimation of stopping behavior of industrial robots with a trained neural network. This trained network is presented as a single channel in a redundant architecture for safety control applications, where its potential for future integration with an analytical model of robot stopping is discussed. Basic physical relations for simplified articulated manipulators are derived, which motivate a choice of quantities to predict robot stopping behavior and inform the training and testing of a network for prediction of stopping distances and times. Robot stopping behavior is considered in the context of relevant standards ISO 10218-1, ISO/TS 15066 and IS0 13849-1, which inform the definitions for safety related stopping distances and times used in this study. Prior work on the estimation of robot stopping behavior is discussed alongside applications of machine learning to the broader field of industrial robotics, and particularly to the cases of prediction of forward and inverse kinematics with trained networks. A state-driven data collection program is developed to perform repeated stopping experiments for a controlled stop on path within a specified sampling domain. This program is used to collect data for a simulated and real robot system. Special attention is given to the identification of meaningful stopping times, which includes the separation of stopping into pre-deceleration and post-deceleration phases. A definition is provided for stopping of a robot in a safety context, based on the observation that residual motion over short distances (less than 1 mm) and at very low velocities (less than 1 mm/s) is not relevant to robot safety. A network architecture and hyperparameters are developed for the prediction of stopping distances and times for the first three joints of the manipulator without the inclusion of payloads. The result is a dual-network structure, where stopping distance predictions from the distance prediction network serve as inputs to the stopping time prediction network. The networks are validated on their capacity to interpolate and extrapolate predictions of robot stopping behavior in the presence of initial conditions not included in the training and testing data. A method is devised for the calculation of prediction errors for training training, testing and validation data. This method is applied both to interpolation and extrapolation to new initial velocity and positional conditions of the manipulator. In prediction of stopping distances and times, the network is highly successful at interpolation, resulting in comparable or nominally higher errors for the validation data set when compared to the errors for training and testing data. In extrapolation to new initial velocity and positional conditions, notably higher errors in the validation data predictions are observed for the networks considered. Future work in the areas of predictions of stopping behavior with payloads and tooling, further applications to collaborative robotics, analytical models of stopping behavior, inclusion of additional stopping functions, use of explainable AI methods and physics-informed networks are discussed. / Master of Science / As the uses for industrial robots continue to grow and expand, so do the need for robust safety measures to avoid, control, or limit the risks posed to human operators and collaborators. This is exemplified by Isaac Asimov's famous first law of robotics - "A robot may not injure a human being, or, through inaction, allow a human being to come to harm." As applications for industrial robots continue to expand, it is beneficial for robots and human operators to collaborate in work environments without fences. In order to ethically implement such increasingly complex and collaborative industrial robotic systems, the ability to limit robot motion with safety functions in a predictable and reliable way (as outlined by international standards) is paramount. In the event of either a technical failure (due to malfunction of sensors or mechanical hardware) or change in environmental conditions, it is important to be able to stop an industrial robot from any position in a safe and controlled manner. This requires real-time knowledge of the stopping distance and time for the manipulator. To understand stopping distances and times reliability, multiple independent methods can be used and compared to predict stopping behavior. The use of machine learning methods is of particular interest in this context due to their speed of processing and the potential for basis on real recorded data. In this study, we will attempt to evaluate the efficacy of machine learning algorithms to predict stopping behavior and assess their potential for implementation alongside analytical models. A reliable, multi-method approach for estimating stopping distances and times could also enable further methods for safety in collaborative robotics such as Speed and Separation Monitoring (SSM), which monitors both human and robot positions to ensure that a safe stop is always possible. A program for testing and recording the stopping distances and times for the robot is developed. As stopping behavior varies based on the positions and speeds of the robot at the time of stopping, a variety of these criteria are tested with the robot stopping program. This data is then used to train an artificial neural network, a machine learning method that mimics the structure of human and animal brains to learn relationships between data inputs and outputs. This network is used to predict both the stopping distance and time of the robot. The network is shown to produce reasonable predictions, especially for positions and speeds that are intermediate to those used to train the network. Future improvements are suggested and a method is suggested for use of stopping distance and time quantities in robot safety applications.
10

Quantitative Methods of Statistical Arbitrage

Boming Ning (18414465) 22 April 2024 (has links)
<p dir="ltr">Statistical arbitrage is a prevalent trading strategy which takes advantage of mean reverse property of spreads constructed from pairs or portfolios of assets. Utilizing statistical models and algorithms, statistical arbitrage exploits and capitalizes on the pricing inefficiencies between securities or within asset portfolios. </p><p dir="ltr">In chapter 2, We propose a framework for constructing diversified portfolios with multiple pairs trading strategies. In our approach, several pairs of co-moving assets are traded simultaneously, and capital is dynamically allocated among different pairs based on the statistical characteristics of the historical spreads. This allows us to further consider various portfolio designs and rebalancing strategies. Working with empirical data, our experiments suggest the significant benefits of diversification within our proposed framework.</p><p dir="ltr">In chapter 3, we explore an optimal timing strategy for the trading of price spreads exhibiting mean-reverting characteristics. A sequential optimal stopping framework is formulated to analyze the optimal timings for both entering and subsequently liquidating positions, all while considering the impact of transaction costs. Then we leverages a refined signature optimal stopping method to resolve this sequential optimal stopping problem, thereby unveiling the precise entry and exit timings that maximize gains. Our framework operates without any predefined assumptions regarding the dynamics of the underlying mean-reverting spreads, offering adaptability to diverse scenarios. Numerical results are provided to demonstrate its superior performance when comparing with conventional mean reversion trading rules.</p><p dir="ltr">In chapter 4, we introduce an innovative model-free and reinforcement learning based framework for statistical arbitrage. For the construction of mean reversion spreads, we establish an empirical reversion time metric and optimize asset coefficients by minimizing this empirical mean reversion time. In the trading phase, we employ a reinforcement learning framework to identify the optimal mean reversion strategy. Diverging from traditional mean reversion strategies that primarily focus on price deviations from a long-term mean, our methodology creatively constructs the state space to encapsulate the recent trends in price movements. Additionally, the reward function is carefully tailored to reflect the unique characteristics of mean reversion trading.</p>

Page generated in 0.0625 seconds