Global ETD Search

131	Reinforcement Learning for Multi-Agent Strategy Synthesis Using Higher-Order Knowledge Forsell, Gustav, Gergi, Shamoun January 2023 (has links) Imagine for a moment we are living in the distant future where autonomous robots are patrollingthe streets as police officers. Two such robots are chasing a robber through the city streets. Fearingthe thief might listen in to any potential transmission, both robots remain radio silent and are thuslimited to a strictly visual pursuit. Since the robots cannot see the robber the entire time, they haveto deduce the potential location of the robber. What would the best strategy be for these robots toachieve their objective? This bachelor's thesis investigated the above example by creating strategies through reinforcementlearning. The thesis also investigated the performance of the players when they have differentabilities of deduction. This was tested by creating a suitable game and corresponding reinforcementlearning algorithm and running the simulations for different degrees of knowledge. The study provedthat reinforcement learning is a viable method for strategy construction, reaching nearly guaranteedvictory for cases when the agent knows everything about the environment and a slightly lower winratio when there is uncertainty introduced. The implementation yielded only a small gain in win ratiowhen the agents could deduce even more about each other. / Föreställ dig för ett ögonblick att vi lever i en avlägsen framtid där autonoma robotar patrullerar pågatorna som poliser. Två sådana robotar jagar en rånare genom stadens gator. Eftersom de är räddaför att tjuven kan lyssna på alla möjliga sändningar, förblir båda robotarna radiotysta och är därförbegränsade till en strikt visuell strävan. Eftersom robotarna inte kan se rånaren hela tiden, måste dehärleda den potentiella platsen för rånaren. Vilken skulle den bästa strategin vara för dessa robotarför att uppnå sitt mål? Denna kandidatuppsats undersökte ovanstående exempel genomskapa strategier genomförstärkningsinlärning. Avhandlingen undersökte också spelarnas prestationer när de har olikaavdragsförmåga. Detta testades genom att skapa ett lämpligt spel och motsvarandeförstärkningsinlärningsalgoritm och köra simuleringarna för olika kunskapsgrader. Studien visade attförstärkningsinlärning är en användbar metod för strategikonstruktion, och når nästan garanteradseger i fall då agenten vet allt om miljön och en något lägre vinstkvot när det finns osäkerhet.Implementeringen gav bara en liten vinst i vinstförhållandet när agenterna kunde härleda ännu merom varandra. / Kandidatexjobb i elektroteknik 2023, KTH, Stockholm Higher Order Knowledge Imperfect Information Reinforcement Learning Deep Q- networks Knowledge Representation Pursuit Evasion Games Elektroteknik och elektronik
132	Multi-Agent Games of Imperfect Information: Algorithms for Strategy Synthesis Åkerblom Jonsson, Viktor, Berisha, David January 2021 (has links) The aim of this project was to improve upon a toolfor strategy synthesis for multi-agent games of imperfect informationagainst nature. Another objective was to compare the toolwith the original tool we improved upon and the Strategic ModelChecker (SMC). For the strategy synthesis, an existing extensionfor expanding the games called the Multi-Agent Knowledge-Based Subset Construction was used. The construction creates anew knowledge-based game where strategies can be tested. Thestrategies were synthesized for the individual agents and thenjoint profiles of the individual strategies were tested to see ifthey were winning.Four different algorithms for going through the game graphswere tested against the other existing tools. The new andimproved tool was faster at synthesizing a strategy than both theold tool and the SMC for almost all games tested. Although forthe games where the new tool is out-performed, results indicateit to be due to a combination of chance and how the games areperceived by the tools. No algorithm or tool proved to be thebest performing for all games. / Syftet med detta projekt var att förbättra ettexisterande verktyg för att syntetisera strategier för fleragentspelav imperfect information mot naturen. Därefter också jämföraverktyget med original verktyget och med ett verktyg somheter the strategic model checker (SMC). För syntetiseringenav strategier användes ett existerande verktyg för att expanderaspel, som kallas Multi-Agent Knowledge-Based Subset Construction.Konstruktionen skapar ett kunskapsbaserat spel därstrategierna kan bli testade. Strategierna syntetiserades för deenskilda agenterna och därefter skapades en sammansatt profilav strategier, som då testades för att se om det var en vinnandestrategi.Fyra olika algoritmer för att gå igenom spelgrafen testadesoch jämfördes med de andra verktygen. Det nya och förbättradeverktyget var snabbare att syntetisera en strategi än både detgamla verktyget och SMC verktyget för nästan alla spel somtestades. Fast, för spelen då nya verktyget inte var snabbast så indikerar resultaten på att detta är p.g.a. en kombination avslump och hur spelen ses på av verktygen. Ingen algoritm ellerverktyg visade sig vara det snabbaste för samtliga spel. / Kandidatexjobb i elektroteknik 2021, KTH, Stockholm Strategy Synthesis Strategic Model Checker Multi- Agent Games Imperfect Information Elektroteknik och elektronik
133	Deep Recurrent Q Networks for Dynamic Spectrum Access in Dynamic Heterogeneous Envirnments with Partial Observations Xu, Yue 23 September 2022 (has links) Dynamic Spectrum Access (DSA) has strong potential to address the need for improved spectrum efficiency. Unfortunately, traditional DSA approaches such as simple "sense-and-avoid" fail to provide sufficient performance in many scenarios. Thus, the combination of sensing with deep reinforcement learning (DRL) has been shown to be a promising alternative to previously proposed simplistic approaches. DRL does not require the explicit estimation of transition probability matrices and prohibitively large matrix computations as compared to traditional reinforcement learning methods. Further, since many learning approaches cannot solve the resulting online Partially-Observable Markov Decision Process (POMDP), Deep Recurrent Q-Networks (DRQN) have been proposed to determine the optimal channel access policy via online learning. The fundamental goal of this dissertation is to develop DRL-based solutions to address this POMDP-DSA problem. We mainly consider three aspects in this work: (1) optimal transmission strategies, (2) combined intelligent sensing and transmission strategies, and (c) learning efficiency or online convergence speed. Four key challenges in this problem are (1) the proposed DRQN-based node does not know the other nodes' behavior patterns a priori and must to predict the future channel state based on previous observations; (2) the impact to primary user throughput during learning and even after learning must be limited; (3) resources can be wasted the sensing/observation; and (4) convergence speed must be improved without impacting performance performance. We demonstrate in this dissertation, that the proposed DRQN can learn: (1) the optimal transmission strategy in a variety of environments under partial observations; (2) a sensing strategy that provides near-optimal throughput in different environments while dramatically reducing the needed sensing resources; (3) robustness to imperfect observations; (4) a sufficiently flexible approach that can accommodate dynamic environments, multi-channel transmission and the presence of multiple agents; (5) in an accelerated fashion utilizing one of three different approaches. / Doctor of Philosophy / With the development of wireless communication, such as 5G, global mobile data traffic has experienced tremendous growth, which makes spectrum resources even more critical for future networks. However, the spectrum is an exorbitant and scarce resource. Dynamic Spectrum Access (DSA) has strong potential to address the need for improved spectrum efficiency. Unfortunately, traditional DSA approaches such as simple "sense-and-avoid" fail to provide sufficient performance in many scenarios. Thus, the combination of sensing with deep reinforcement learning (DRL) has been shown to be a promising alternative to previously proposed simplistic approaches. Compared with traditional reinforcement learning methods, DRL does not require explicit estimation of transition probability matrices and extensive matrix computations. Furthermore, since many learning methods cannot solve the resulting online partially observable Markov decision process (POMDP), a deep recurrent Q-network (DRQN) is proposed to determine the optimal channel access policy through online learning. The basic goal of this paper is to develop a DRL-based solution to this POMDP-DSA problem. This paper mainly focuses on improving performance from three directions. 1. Find the optimal (or sub-optimal) channel access strategy based on fixed partial observation mode; 2. Based on work 1, propose a more intelligent way to dynamically and efficiently find more reasonable (higher efficiency) sensing/observation policy and corresponding channel access strategy; 3. On the premise of ensuring performance, use different machine learning algorithms or structures to improve learning efficiency and avoid users waiting too long for expected performance. Through the research in these three main directions, we have found an efficient and diverse solution, namely DRQN-based technology. Dynamic Spectrum Access Partial Knowledge Deep Recurrent Neural Networks Parallel Learning Transfer Learning Meta-Learning Sensing Prediction Imperfect System Feedback Multi-Rate and Multi-Agent Dynamic environments Cache
134	MIMO block-fading channels with mismatched CSI Asyhari, A.Taufiq, Guillen i Fabregas, A. 23 August 2014 (has links) Yes / We study transmission over multiple-input multiple-output (MIMO) block-fading channels with imperfect channel state information (CSI) at both the transmitter and receiver. Specifically, based on mismatched decoding theory for a fixed channel realization, we investigate the largest achievable rates with independent and identically distributed inputs and a nearest neighbor decoder. We then study the corresponding information outage probability in the high signal-to-noise ratio (SNR) regime and analyze the interplay between estimation error variances at the transmitter and at the receiver to determine the optimal outage exponent, defined as the high-SNR slope of the outage probability plotted in a logarithmic-logarithmic scale against the SNR. We demonstrate that despite operating with imperfect CSI, power adaptation can offer substantial gains in terms of outage exponent. / A. T. Asyhari was supported in part by the Yousef Jameel Scholarship, University of Cambridge, Cambridge, U.K., and the National Science Council of Taiwan under grant NSC 102-2218-E-009-001. A. Guillén i Fàbregas was supported in part by the European Research Council under ERC grant agreement 259663 and the Spanish Ministry of Economy and Competitiveness under grant TEC2012-38800-C03-03. Block fading Diversity Generalized mutual information Imperfect channel state information MIMO Mismatched decoding Multiple antenna Nearest neighbour decoding Outage probability Outage exponent Power adaptation
135	The nature and extent of intra-industry trade in South Africa Parr, Richard Geoffrey 06 1900 (has links) Intra-industry trade occurs when goods from the same industry category are both exported and imported. Types of intra-industry trade are identified, and theoretical models of intraindustry trade under conditions of imperfect competition are examined. The results of thirtyseven empirical studies on the determinants of intra-industry trade are analysed. Methods of measuring intra-industry trade and marginal intra-industry trade are discussed, and various measurement problems are dealt with. The extent of intra-industry trade in South Africa in 1992 and 1997 is measured, using the Grubel-Lloyd and Michaely indices. The BrUlhart indices are applied to measure marginal intra-industry trade. South Africa has a relatively low and stable level of intra-industry trade in manufactured goods: the GrubelLloyd index for 1997 is calculated to be 37 per cent. / Economics and Management Sciences / M.Com. (Economics) Intra-industry trade marginal intra-industry trade South Africa product differentiation Scale economies Grubel-Lloyd index Brulhart index Heckscher-Ohlin theorem Imperfect competition Data aggregation 382.0968 Commerce South Africa -- Commerce
136	貨幣政策與信用管道：資本不完全移動之動態分析 / Monetary Policy and the Credit Channel: A Dynamic Open Economy Model with Imperfect Capital Mobility 王書盛, Wang, Shu-Sheng Unknown Date (has links) 無 / This study investigates the monetary effects under the floating exchange rates and imperfect capital mobility by extending the model of Bernanke and Blinder (1988) into a small open economy. It is shown that with credit channel of monetary transmission explicitly considered, the effect of monetary policy on output may be augmented or lessen in our model depending on whether the exchange rate depreciates or appreciates. In addition, the exchange rate puzzle found in the empirical studies can be explained in our theoretical model. The dynamic adjustment patterns of the output and the exchange rate after an increase in money supply are further examined. Under the case of relative high capital mobility, when the real output gradually adjusts toward a higher level, the exchange rate may overshoot, undershoot, or even counter-shoot during the dynamic adjustment process. This provides another one explanation for the volatility of exchange rates under floating rates. Therefore, as financial markets become more internationalized, the conduct of monetary policy turns more complicated in an open economy. 信用管道匯率困惑資本不完全移動動態調整 credit channel exchange rate puzzle imperfect capital mobility dynamic adjustment
137	Modélisation statistique d'événements récurrents. Exploration empirique des estimateurs, prise en compte d'une covariable temporelle et application aux défaillances des réseaux d'eau / Statistical modeling of recurrent events. Empirical assessment of estimators’ properties, accounting for time-dependent covariate and application to failures of water networks Babykina, Evgénia 08 December 2010 (has links) Dans le contexte de la modélisation aléatoire des événements récurrents, un modèle statistique particulier est exploré. Ce modèle est fondé sur la théorie des processus de comptage et est construit dans le cadre d'analyse de défaillances dans les réseaux d'eau. Dans ce domaine nous disposons de données sur de nombreux systèmes observés durant une certaine période de temps. Les systèmes étant posés à des instants différents, leur âge est utilisé en tant qu'échelle temporelle dans la modélisation. Le modèle tient compte de l'historique incomplet d'événements, du vieillissement des systèmes, de l'impact négatif des défaillances précédentes sur l'état des systèmes et des covariables. Le modèle est positionné parmi d'autres approches visant à l'analyse d'événements récurrents utilisées en biostatistique et en fiabilité. Les paramètres du modèle sont estimés par la méthode du Maximum de Vraisemblance (MV). Une covariable dépendante du temps est intégrée au modèle. Il est supposé qu'elle est extérieure au processus de défaillance et constante par morceaux. Des méthodes heuristiques sont proposées afin de tenir compte de cette covariable lorsqu'elle n'est pas observée. Des méthodes de simulation de données artificielles et des estimations en présence de la covariable temporelle sont proposées. Les propriétés de l'estimateur (la normalité, le biais, la variance) sont étudiées empiriquement par la méthode de Monte Carlo. L'accent est mis sur la présence de deux directions asymptotiques : asymptotique en nombre de systèmes n et asymptotique en durée d'observation T. Le comportement asymptotique de l'estimateur MV constaté empiriquement est conforme aux résultats théoriques classiques. Il s'agit de l'asymptotique en n. Le comportement T-asymptotique constaté empiriquement n'est pas classique. L'analyse montre également que les deux directions asymptotiques n et T peuvent être combinées en une unique direction : le nombre d'événements observés. Cela concerne les paramètres classiques du modèle (les coefficients associés aux covariables fixes et le paramètre caractérisant le vieillissement des systèmes). Ce n'est en revanche pas le cas pour le coefficient associé à la covariable temporelle et pour le paramètre caractérisant l'impact négatif des défaillances précédentes sur le comportement futur du système. La méthodologie développée est appliquée à l'analyse des défaillances des réseaux d'eau. L'influence des variations climatiques sur l'intensité de défaillance est prise en compte par une covariable dépendante du temps. Les résultats montrent globalement une amélioration des prédictions du comportement futur du processus lorsque la covariable temporelle est incluse dans le modèle. / In the context of stochastic modeling of recurrent events, a particular model is explored. This model is based on the counting process theory and is built to analyze failures in water distribution networks. In this domain the data on a large number of systems observed during a certain time period are available. Since the systems are installed at different dates, their age is used as a time scale in modeling. The model accounts for incomplete event history, aging of systems, negative impact of previous failures on the state of systems and for covariates.The model is situated among other approaches to analyze the recurrent events, used in biostatistics and in reliability. The model parameters are estimated by the Maximum Likelihood method (ML). A method to integrate a time-dependent covariate into the model is developed. The time-dependent covariate is assumed to be external to the failure process and to be piecewise constant. Heuristic methods are proposed to account for influence of this covariate when it is not observed. Methods for data simulation and for estimations in presence of the time-dependent covariate are proposed. A Monte Carlo study is carried out to empirically assess the ML estimator's properties (normality, bias, variance). The study is focused on the doubly-asymptotic nature of data: asymptotic in terms of the number of systems n and in terms of the duration of observation T. The asymptotic behavior of the ML estimator, assessed empirically agrees with the classical theoretical results for n-asymptotic behavior. The T-asymptotics appears to be less typical. It is also revealed that the two asymptotic directions, n and T can be combined into one unique direction: the number of observed events. This concerns the classical model parameters (the coefficients associated to fixed covariates, the parameter characterizing aging of systems). The presence of one unique asymptotic direction is not obvious for the time-dependent covariate coefficient and for a parameter characterizing the negative impact of previous events on the future behavior of a system.The developed methodology is applied to the analysis of failures of water networks. The influence of climatic variations on failure intensity is assessed by a time-dependent covariate. The results show a global improvement in predictions of future behavior of the process when the time-dependent covariate is included into the model. Processus de comptage Maximum de vraisemblance Événements récurrents Covariable dépendante du temps Simulations Monte Carlo Réparation imparfaite Propriétés asymptotiques Défaillances dans les réseaux d'eau Counting process Maximum likelihood Recurrent events Time-dependent covariate Monte Carlo simulations Imperfect repair Asymptotic properties Failures in water networks
138	Modélisation micromécanique des roches poreuses. Application aux calcaires oolitiques / Micromechanical modelling of porous rocks. Application to oolitic limestone Nguyen, Ngoc Bien 03 December 2010 (has links) Ce travail de thèse est consacré à l'étude du comportement poro-élastique linéaire et non linéaire des matériaux et géomatériaux poreux (notamment les calcaires oolithiques et le minerai de fer) par approche de changement d'échelle. A partir des observations microstructurales de ces matériaux, un modèle conceptuel a été proposé. Les roches poreuses étudiées sont constituées par un assemblage de grains (oolithes), à forte fraction volumique, cimentés par une matrice. La porosité, supposée connectée, est présente dans les oolithes (inter-oolithique) et dans la matrice (intra-oolithique). Un modèle d'homogénéisation à deux étapes est développé dans le cadre du modèle des sphères composites. L'importance des effets de liaison d'interface sur les propriétés poro-élastiques des sphères composites est étudiée en déterminant la solution exacte du modèle aux conditions d'interfaces parfaite ou/et imparfaite. Le modèle est tout d'abord appliqué pour estimer les propriétés effectives poro-élastiques linéaires des roches étudiées. Le comportement non linéaire de ces roches est étudié en attribuant à la matrice un comportement élastoplastique et en développant un comportement non linéaire pour les interfaces (oolithes - matrice). La comparaison entre résultats issus de la modélisation et ceux expérimentaux macroscopiques montre l'importance cruciale de la zone interfaciale de transition / This work is devoted to the modelling of the linear and non-linear hydro-mechanical behaviour of porous rocks (such as oolitic limestone, iron ore) by the multiscale modelling approach. Based on microstructure observations, a conceptual model was proposed. Porous rocks studied are constituted by an assemblage of grains (oolites), with high volume fraction, coated by a matrix. The overall porosity is supposed connected and decomposed into oolite porosity and matrix porosity. A two step homogenization method has been developed in the framework of CSA models (Composite Sphere Assemblage). The effect of interfacial bonding condition on poroelastic properties of composite sphere is investigated by determining the exact solution of the model in the case of perfect or/and imperfect interface. Micromechanical model is applied firstly to estimation of effective linear poroelastic properties of rocks studied. Their non-linear behaviour is studied by considering a elasto-plastic behavior for both the matrix and the interfaces (oolite-matrix). The comparison between numerical simulations and macroscopic experimental results underlined the crucial role of the interfacial transition zone Milieux hétérogènes Géomatériaux Approche micromécanique Modèle de sphères composites Interface imparfaite Poro-élasticité linéaire Elasto-plasticité Homogénéisation linéaire Homogénéisation non-linéaire Heterogeneous medium Geomaterials approach Composite Sphere Model Imperfect interface Linear poro-elasticity Linear and no-linear homogenization
139	資本不完全移動性與最適非線型所得稅:小型開放經濟的內生成長模型 / World capital mobility, optimal non-linear income taxation and endogenous growth in a small open economy 王琇華, Wang, Hsiu-Hua Unknown Date (has links) 本文以Barro (1990)、Turnovsky (1997)與Lai and Liao (2012)的模型為基礎，建構一個小型開放經濟的內生成長模型，為凸顯資本市場移動性所扮演的角色，分別探討政府當局在面對資本完全移動及資本不完全移動時，該如何制訂一套最適之非線型所得稅以追求社會福利極大。根據本文的分析，可得出以下結論：一、在資本完全移動的情況下，為矯正政府基礎建設的生產外部性，可透過課徵所得稅矯正市場失靈的扭曲，並利用累退稅率矯正因所得稅尺度所造成資本邊際生產力過低的扭曲。政府可以透過最適租稅結構矯正所有分權經濟體系的扭曲，使得經濟體系達到最佳境界的經濟成長率與福利水準。二、在資本不完全移動的情況下，當最適的所得稅尺度等於基礎建設的生產外部性，矯正了基礎建設的生產外部性，並且透過累進／累退稅率矯正課稅後導致資本邊際生產力過低的扭曲，然而代表性個人在做最適決策時視國外利率為固定，總體決策中利率會隨著借債規模而變動，存在資本不完全移動性的扭曲，經數值模擬的結果得知，代表性個人相對社會最適借債過多，因而無法使得經濟體系達到柏拉圖最適境界。 / Based on the Barro (1990), Turnovsky (1997) and Lai and Liao (2012) model, this thesis specifies that on endogenous growth model of a developing economy facing an upward-sloping supply curve of debt. The analysis includes both perfect world capital market case and imperfect world capital market case. The government’s infrastructure expenditure is financed by nonlinear income taxation, and examine how the fiscal authority devises its nonlinear tax structure from the viewpoint of welfare maximization. Several main findings emerge from the analysis. First, in a world of perfect capital market, it is found that a suitable package containing two instruments can fully remedy the inefficiencies arising from the production externality and distortionary taxation, as a result, the Pareto optimality can be restored. Second, according to the calibration results, in the face of imperfect world capital market, there are three distortions in the economy: the production externality, the capital externality, and the financial externality association with the upward-sloping supply of debt. Two policy instruments for the tax scalar and tax progressivity/regressivity causing the distortion arising from the production externality and the capital externality to vanish. Consequently, one remaining distortion, namely, the financial externality association with the upward-sloping supply of debt, are present in the economy. As a consequence, the structure of the optimal tax policy that won’t permit the attainment of the first-best optimum. 內生成長模型非線型所得稅基礎建設資本不完全移動性 Endogenous growth Nonlinear income taxation Government expenditure Imperfect capital markets
140	Užití aoristu, imperfekta a perfekta v česky psané próze poloviny 14. století / Use of aorist, imperfect and perfect in Czech prose in the middle of the 14th century Zdeňková, Jana January 2011 (has links) This thesis deals with the use of the aorist and imperfect tenses and the periphrastic preterite in four Old Czech prosaic translated texts dated back to the beginning of the 2nd half of the 14th century. Analyzed texts are the part of the first edition of the Old Czech translation of the Bible (part of the Dresden Bible completed with the text of Proroci rožmberští (Prophets of Rožmberk), the Legend of St. Wenceslaus from The Old Czech Passional, The Apocalypse of Paul and two chapters of the text O svatém Jeronýmovi knihy troje. We deal with the percentual occurence of the verb forms under investigation and with the aspectual characteristics of the verbs of which the investigated forms are constructed. Also their relationship to the Latin pretext is examined. The acquired results are presented in tables and graphs. This thesis also includes an electronic database of the investigated verbal forms.

Search results