• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 9
  • 1
  • 1
  • Tagged with
  • 11
  • 11
  • 11
  • 7
  • 7
  • 6
  • 6
  • 6
  • 5
  • 4
  • 4
  • 3
  • 3
  • 3
  • 3
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Prospect Theory Multi-Agent Based Simulations for Non-Rational Route Choice Decision Making Modelling / Prospect Theorie basierte Multi-Agenten Simulationen für nicht-rationalle Route Entscheidung Modellierung

Kuhn Andriotti, Gustavo January 2009 (has links) (PDF)
Simulations (MASim) and non-rational behaviour. This non-rational behaviour is here based on the Prospect Theory [KT79] (PT), which is compared to the rational behaviour in the Expected Utility Theory [vNM07] (EUT). This model was used to design a modified Q-Learning [Wat89, WD92] algorithm. The PT based Q-Learning was then integrated into a proposed agent architecture. Because much attention is given to a limited interpretation of Simon's definition of bounded-rationality, this interpretation is broadened here. Both theories, rationality and the non-rationality, are compared and the discordance in their results discussed. The main contribution of this work is to show that an alternative is available to the EUT that is more suitable for human decision-makers modelling. The evidences show that rationality is not appropriated for modelling persons. Therefore, instead of fine-tuning the existent model the use of another one is proposed and evaluated. To tackle this, the route choice problem was adopted to perform the experiments. To evaluate the proposed model three traffic scenarios are simulated and their results analysed.
2

Bestärkendes Lernen zur Steuerung und Regelung nichtlinearer dynamischer Systeme

Pritzkoleit, Max 21 January 2020 (has links)
In der vorliegenden Arbeit wird das bestärkende Lernen im Kontext der Steuerung und Regelung nichtlinearer dynamischer Systeme untersucht. Es werden zunächst die Grundlagen der stochastischen Optimalsteuerung sowie des maschinellen Lernens, die für die Betrachtungen dieser Arbeit relevant sind, erläutert. Anschließend werden die Methoden des bestärkenden Lernens im Kontext der datenbasierten Steuerung und Regelung dargelegt, um anschließend auf drei Methoden des tiefen bestärkenden Lernens näher einzugehen. Der Algorithmus Deep-Deterministic-Policy-Gradient (DDPG) wird zum Gegenstand intensiver Untersuchungen an verschiedenen mechanischen Beispielsystemen. Weiterhin erfolgt der Vergleich mit einem klassischen Ansatz, bei dem die zu bewältigenden Steuerungsaufgaben mit einer modellbasierten Trajektorienberechnung, die auf dem iterativen linear-quadratischen Regler (iLQR) basiert, gelöst werden. Mit dem iLQR können zwar alle Steuerungsaufgaben erfolgreich bewältigt werden, aber für neue Anfangswerte muss das Problem erneut gelöst werden. Bei DDPG hingegen wird ein Regler erlernt, der das zu steuernde dynamische System – aus nahezu beliebigen Anfangswerten – in den gewünschten Zustand überführt. Nachteilig ist jedoch, dass der Algorithmus sich auf hochgradig nichtlineare Systeme bisher nicht anwenden lässt und eine geringe Dateneffizienz aufweist. / In this thesis, the application of reinforcement learning for the control of nonlinear dynamical systems is researched. At first, the relevant principles of stochastic optimal control and machine learning are explained. Afterwards, reinforcement learning is embedded in the context of optimal control. Three methods of deep reinforcement learning are analyzed. A particular algorithm, namely Deep-Deterministic-Policy-Gradient (DDPG), is chosen for further studies on a variety of mechanical systems. Furthermore, the reinforcement learning approach is compared to a model-based trajectory optimization method, called iterative linear-quadratic regulator (iLQR). All control problems can be successfully solved with the trajectory optimization approach, but for new initial conditions, the problem has to be solved again. In contrast, with DDPG a \emph{global} feedback controller is learned, that can drive the controlled system in the desired state. Disadvantageous is the poor data efficiency and the lack of applicability to highly nonlinear systems.
3

Reinforcement Learning Strategies for a Context-Aware Adaptive Cruise Control

Joganantham, Rubina 29 April 2022 (has links)
Adaptive Cruise Control (ACC), which is a smart combination of pre-existing cruise control and time gap control, plays a major role in rendering driving comfort for the drivers. Currently available ACC system allows the vehicle to maintain the set speed and to automatically adjust the speed to keep up the fixed distance to the vehicle ahead. Here, the speed and the distance are set as per user preferences. Each individual user has their own perceptions and preferences but the existing ACC system lacks the property of user adaption. Hence, this thesis focuses on automatizing the distance settings of the ACC system, which can be adapted to each individual users. In order to incorporate the property of user specific distance setting for ACC, the most relevant contexts in which a change in ACC distance needed is sorted out and a standard distance setting is assigned. Reinforcement-Learning strategies are handled where by the pre-existing distance settings can be modified and adapted to the user once they start driving.
4

From Goals to Habits in Alcohol Dependence: Psychological and Computational Investigations

Sebold, Miriam Hannah 31 July 2017 (has links)
Alkoholabhängigkeit (AA) zeichnet sich durch einen starken Drang nach Alkoholkonsum trotz schwerwiegender negativer Folgen aus. Eine gängige Theorie aus der Suchtforschung besagt, dass AA mit einer Verlagerung von zielgerichteter zu habitueller Kontrolle einhergeht, durch welche Handlungen automatisiert ausgeführt werden und weitgehend unabhängig von ihren Folgen sind. Evidenzen hierfür stammen weitgehend aus experimentellen Untersuchungen an Tieren. Das Fachgebiet des maschinellen Lernens hat zudem neue Experimente hervorgebracht, welche die Anwendung von Algorithmen erlauben, um die Verlagerung von zielgerichtetem zu habituellen Verhalten zu untersuchen. Diese Paradigmen fanden bisher keine Anwendung in der Untersuchung von alkoholabhängigen Patienten. Daher widmet sich diese Dissertation der Untersuchung von habituellem und zielgerichtetem Verhalten bei AA aus unterschiedlichen Perspektiven. Hierfür adaptierten wir zunächst ein Paradigma aus der Tierliteratur, durch welches habituelles Verhalten als reizgesteuerte Kontrolle quantifiziert wird. Anschließend nutzten wir eine Aufgabe, die aus dem maschinellen Lernen stammt und die Untersuchung von habitueller und zielgerichteter Kontrolle ermöglicht. Drittens untersuchten wir den Zusammenhang des Verhaltens über beiden Paradigmen hinweg. Zuletzt untersuchten wir, ob habitueller und zielgerichteter Kontrolle mit dem Alkoholkonsum in jungen sozialen Trinkern assoziiert ist. Die Ergebnisse liefern weitere Hinweise auf eine Verlagerung von zielgerichteter zu habitueller Kontrolle bei AA. Das Verhalten in beiden Paradigmen war miteinander assoziiert, was darauf rückschließen lässt, dass ähnliche kognitive Mechanismen involviert sind. Soziale Trinker zeigten keine Verlagerung von zielgerichteter zu habituellem Verhalten, was darauf hin weist, dass jenes Ungleichgewicht erst im Verlauf der AA entsteht und kein Korrelat von Alkoholkonsum per se darstellt. / Alcohol dependence (AD) manifests as a strong drive to consume alcohol despite serious adverse consequences. A popular theory in addiction research thus suggests that AD is characterized by a shift from goal-directed to habitual control, where actions are automatic and disentangled from outcomes. Evidence for this has mainly been drawn from experimental investigations in animals. The field of machine learning has additionally advanced new experiments that allow the application of reinforcement learning algorithms to investigate a shift towards habits. Again, these tasks have yet not been applied to human AD. To fill this gap, this thesis investigates habitual at the expense of goal-directed control from distinct theoretical fields in AD patients. We adapted a paradigm from the animal literature, which quantifies habits as cue-induced control over behavior. Then, we applied an experimental procedure inspired from machine learning that allows to investigate the balance between habitual and goal-directed control. Third, we examined the relationship between behavior across these paradigms. Last, we investigated whether the imbalance between habitual and goal-directed control was associated with alcohol consumption in young social drinkers. Our results add further evidence that AD is associated with a shift from goal-directed to habitual control, e.g. increased cue-induced control / reductions in goal-directed decision-making. Behavior across both paradigms were associated with each other, suggesting the involvement of similar mechanisms. As non-pathological alcohol intake was not associated with an imbalance between goal-directed and habitual control, this imbalance might arise over the course of AD rather than being a trait marker of alcohol intake.
5

Sustainability of empathy as driver for prosocial behavior and social closeness: insights from computational modelling and functional magnetic resonance imaging / Nachhaltigkeit von Empathie als Motiv für prosoziales Verhalten und soziale Nähe: Erkenntnisse auf Grundlage von computational modelling und funktioneller Magnetresonanztomographie

Saulin, Anne Christin January 2023 (has links) (PDF)
Empathy, the act of sharing another person’s affective state, is a ubiquitous driver for helping others and feeling close to them. These experiences are integral parts of human behavior and society. The studies presented in this dissertation aimed to investigate the sustainability and stability of social closeness and prosocial decision-making driven by empathy and other social motives. In this vein, four studies were conducted in which behavioral and neural indicators of empathy sustainability were identified using model-based functional magnetic resonance imaging (fMRI). Applying reinforcement learning, drift-diffusion modelling (DDM), and fMRI, the first two studies were designed to investigate the formation and sustainability of empathy-related social closeness (study 1) and examined how sustainably empathy led to prosocial behavior (study 2). Using DDM and fMRI, the last two studies investigated how empathy combined with reciprocity, the social norm to return a favor, on the one hand and empathy combined with the motive of outcome maximization on the other hand altered the behavioral and neural social decision process. The results showed that empathy-related social closeness and prosocial decision tendencies persisted even if empathy was rarely reinforced. The sustainability of these empathy effects was related to recalibration of the empathy-related social closeness learning signal (study 1) and the maintenance of a prosocial decision bias (study 2). The findings of study 3 showed that empathy boosted the processing of reciprocity-based social decisions, but not vice versa. Study 4 revealed that empathy-related decisions were modulated by the motive of outcome maximization, depending on individual differences in state empathy. Together, the studies strongly support the concept of empathy as a sustainable driver of social closeness and prosocial behavior. / Empathie, das Teilen des Affekts einer anderen Person, ist eine allgegenwärtige Motivation, anderen Menschen zu helfen und sich ihnen nahe zu fühlen. Diese Erfahrungen sind wesentliche Bestandteile menschlichen Verhaltens und zentral für unsere Gesellschaft. Die vorliegende Dissertation setzte sich zum Ziel, die Nachhaltigkeit und Stabilität sozialer Nähe sowie prosozialem Entscheidungsverhalten basierend auf Empathie und anderen sozialen Motiven zu beleuchten. In den vier Studien wurden das Verhalten und neuronale Indikatoren für die Nachhaltigkeit von Empathie mit modellbasierter funktioneller Magnetresonanztomographie (fMRT) untersucht. Unter Verwendung von Verstärkungslernmodellen, Drift-Diffusionsmodellen (DDM) und fMRT untersuchten die ersten zwei Studien den zeitlichen Verlauf von empathiebasierter sozialer Nähe und prosozialem Verhalten. Mit Hilfe von DDM und fMRT wurde in den abschließenden Studien untersucht, wie Empathie in Kombination mit Reziprozität, der sozialen Norm, Gefallen zurückzuzahlen, und Empathie in Kombination mit dem Motiv der Gewinnmaximierung den verhaltensbezogenen und neuronalen sozialen Entscheidungsprozess verändert. Die Ergebnisse zeigten, dass empathiebasierte soziale Nähe und prosoziale Entscheidungstendenzen selbst dann fortbestanden wenn Empathie nur noch selten verstärkt wurde. Die Nachhaltigkeit dieser Effekte hing mit der Rekalibrierung des empathiebasierten Lernsignals für soziale Nähe (Studie 1) und dem Aufrechterhalten eines prosozialen Entscheidungsbias zusammen (Studie 2). Die Ergebnisse von Studie 3 zeigten, dass Empathie reziprozitätsbasierte soziale Entscheidungen stärkt, aber nicht umgekehrt. Studie 4 zeigte, dass empathiebasierte soziale Entscheidungen durch das Motiv der Gewinnmaximierung vereinfacht werden können. Zusammengefasst unterstützen die Ergebnisse der vorliegenden Dissertation nachdrücklich das Konzept von Empathie als nachhaltige Triebkraft für soziale Nähe und prosoziales Verhalten.
6

Möglichkeiten und Strategien der Technologieclusterentwicklung - Eine Analyse der Voraussetzungen für eine erfolgreiche Clusterbildung in der Region Mainfranken / Opportunities and Strategies of Technology Cluster Development - An Analysis of Preconditions for a Successful Cluster Formation in the Region Mainfranken

Rhönisch, Anna Franziska January 2019 (has links) (PDF)
This paper focuses on the development of technology clusters and based on this, on two research questions: What are the preconditions for technology cluster development according to cluster research? And, does the region Mainfranken fulfill the requirements for a technology cluster formation? For this purpose, a qualitative study will be conducted by referring to various theoretical concepts of cluster formation. Due to this, the following determinants of cluster development can be deduced into: the traffic infrastructure and infrastructure component, the cluster environment component, the university component, the state component and the industrial component. The analysis of the parameter value of the separate cluster components shows that the core requirements of technology cluster development in the region of Mainfranken are fulfilled. Nevertheless, it is necessary to improve the infrastructure, the commercial and industrial availability of land and availability of capital to form a successful technology cluster. Within the framework of this paper, the potential of technology cluster development in the field of artificial intelligence could also be analyzed. / Dieser Beitrag konzentriert sich auf die Entwicklung von Technologieclustern und basiert auf zwei Forschungsfragen: Was sind die Voraussetzungen für die Entwicklung von Technologieclustern gemäß der Clusterforschung? Und erfüllt die Region Mainfranken die Voraussetzungen für eine Technologieclusterbildung? Zu diesem Zweck wird eine qualitative Studie unter Bezugnahme auf verschiedene theoretische Konzepte der Clusterbildung durchgeführt. Aus diesem Grund können die folgenden Determinanten der Clusterentwicklung abgeleitet werden: die Verkehrsinfrastruktur- und Infrastrukturkomponente, die Clusterumfeldkomponente, die Universitätskomponente, die Staatskomponente und die Branchenkomponente. Die Analyse der Parameterwerte der einzelnen Clusterkomponenten zeigt, dass die Kernanforderungen der Technologieclusterentwicklung in der Region Mainfranken erfüllt sind. Dennoch ist es notwendig, die Infrastruktur, die kommerzielle und industrielle Verfügbarkeit von Land und die Verfügbarkeit von Kapital zu verbessern, um ein erfolgreiches Technologiecluster zu bilden. Im Rahmen der vorliegenden Arbeit konnte darüber hinaus das Potenzial der Technologieclusterentwicklung im Bereich der künstlichen Intelligenz analysiert werden.
7

From Parameter Tuning to Dynamic Heuristic Selection

Semendiak, Yevhenii 18 June 2020 (has links)
The importance of balance between exploration and exploitation plays a crucial role while solving combinatorial optimization problems. This balance is reached by two general techniques: by using an appropriate problem solver and by setting its proper parameters. Both problems were widely studied in the past and the research process continues up until now. The latest studies in the field of automated machine learning propose merging both problems, solving them at design time, and later strengthening the results at runtime. To the best of our knowledge, the generalized approach for solving the parameter setting problem in heuristic solvers has not yet been proposed. Therefore, the concept of merging heuristic selection and parameter control have not been introduced. In this thesis, we propose an approach for generic parameter control in meta-heuristics by means of reinforcement learning (RL). Making a step further, we suggest a technique for merging the heuristic selection and parameter control problems and solving them at runtime using RL-based hyper-heuristic. The evaluation of the proposed parameter control technique on a symmetric traveling salesman problem (TSP) revealed its applicability by reaching the performance of tuned in online and used in isolation underlying meta-heuristic. Our approach provides the results on par with the best underlying heuristics with tuned parameters.:1 Introduction 1 1.1 Motivation 1 1.2 Research objective 2 1.3 Solution overview 2 2 Background and RelatedWork Analysis 3 2.1 Optimization Problems and their Solvers 3 2.2 Heuristic Solvers for Optimization Problems 9 2.3 Setting Algorithm Parameters 19 2.4 Combined Algorithm Selection and Hyper-Parameter Tuning Problem 27 2.5 Conclusion on Background and Related Work Analysis 28 3 Online Selection Hyper-Heuristic with Generic Parameter Control 31 3.1 Combined Parameter Control and Algorithm Selection Problem 31 3.2 Search Space Structure 32 3.3 Parameter Prediction Process 34 3.4 Low-Level Heuristics 35 3.5 Conclusion of Concept 36 4 Implementation Details 37 4.2 Search Space 40 4.3 Prediction Process 43 4.4 Low Level Heuristics 48 4.5 Conclusion 52 5 Evaluation 55 5.1 Optimization Problem 55 5.2 Environment Setup 56 5.3 Meta-heuristics Tuning 56 5.4 Concept Evaluation 60 5.5 Analysis of HH-PC Settings 74 5.6 Conclusion 79 6 Conclusion 81 7 FutureWork 83 7.1 Prediction Process 83 7.2 Search Space 84 7.3 Evaluations and Benchmarks 84 Bibliography 87 A Evaluation Results 99 A.1 Results in Figures 99 A.2 Results in numbers 105
8

Sports Scene Searching, Rating & Solving using AI

Marzilger, Robert, Hirn, Fabian, Aznar Alvarez, Raul, Witt, Nicolas 14 October 2022 (has links)
This work shows the application of artificial intelligence (AI) on invasion game tracking data to realize a fast (sub-second) and adaptable search engine for sports scenes, scene ratings based on machine learning (ML) and computer-generated solutions using reinforcement learning (RL). We provide research results for all three areas. Benefits are expected for accelerated video analysis at professional sports clubs. / Diese Arbeit zeigt die Anwendung von künstlicher Intelligenz (KI) auf Invasionsspielverfolgungsdaten, um eine schnelle (unter einer Sekunde) und anpassungsfähige Suchmaschine für Sportszenen zu realisieren, Szenenbewertungen auf der Grundlage von maschinellem Lernen (ML) und computergenerierte Lösungen unter Verwendung von Verstärkungslernen (RL). Wir stellen Forschungsergebnisse für alle drei Bereiche vor. Es werden Vorteile für eine beschleunigte Videoanalyse in Profisportvereinen erwartet.
9

Causal Models over Infinite Graphs and their Application to the Sensorimotor Loop / Kausale Modelle über unendlichen Grafen und deren Anwendung auf die sensomotorische Schleife - stochastische Aspekte und gradientenbasierte optimale Steuerung

Bernigau, Holger 27 April 2015 (has links) (PDF)
Motivation and background The enormous amount of capabilities that every human learns throughout his life, is probably among the most remarkable and fascinating aspects of life. Learning has therefore drawn lots of interest from scientists working in very different fields like philosophy, biology, sociology, educational sciences, computer sciences and mathematics. This thesis focuses on the information theoretical and mathematical aspects of learning. We are interested in the learning process of an agent (which can be for example a human, an animal, a robot, an economical institution or a state) that interacts with its environment. Common models for this interaction are Markov decision processes (MDPs) and partially observable Markov decision processes (POMDPs). Learning is then considered to be the maximization of the expectation of a predefined reward function. In order to formulate general principles (like a formal definition of curiosity-driven learning or avoidance of unpleasant situation) in a rigorous way, it might be desirable to have a theoretical framework for the optimization of more complex functionals of the underlying process law. This might include the entropy of certain sensor values or their mutual information. An optimization of the latter quantity (also known as predictive information) has been investigated intensively both theoretically and experimentally using computer simulations by N. Ay, R. Der, K Zahedi and G. Martius. In this thesis, we develop a mathematical theory for learning in the sensorimotor loop beyond expected reward maximization. Approaches and results This thesis covers four different topics related to the theory of learning in the sensorimotor loop. First of all, we need to specify the model of an agent interacting with the environment, either with learning or without learning. This interaction naturally results in complex causal dependencies. Since we are interested in asymptotic properties of learning algorithms, it is necessary to consider infinite time horizons. It turns out that the well-understood theory of causal networks known from the machine learning literature is not powerful enough for our purpose. Therefore we extend important theorems on causal networks to infinite graphs and general state spaces using analytical methods from measure theoretic probability theory and the theory of discrete time stochastic processes. Furthermore, we prove a generalization of the strong Markov property from Markov processes to infinite causal networks. Secondly, we develop a new idea for a projected stochastic constraint optimization algorithm. Generally a discrete gradient ascent algorithm can be used to generate an iterative sequence that converges to the stationary points of a given optimization problem. Whenever the optimization takes place over a compact subset of a vector space, it is possible that the iterative sequence leaves the constraint set. One possibility to cope with this problem is to project all points to the constraint set using Euclidean best-approximation. The latter is sometimes difficult to calculate. A concrete example is an optimization over the unit ball in a matrix space equipped with operator norm. Our idea consists of a back-projection using quasi-projectors different from the Euclidean best-approximation. In the matrix example, there is another canonical way to force the iterative sequence to stay in the constraint set: Whenever a point leaves the unit ball, it is divided by its norm. For a given target function, this procedure might introduce spurious stationary points on the boundary. We show that this problem can be circumvented by using a gradient that is tailored to the quasi-projector used for back-projection. We state a general technical compatibility condition between a quasi-projector and a metric used for gradient ascent, prove convergence of stochastic iterative sequences and provide an appropriate metric for the unit-ball example. Thirdly, a class of learning problems in the sensorimotor loop is defined and motivated. This class of problems is more general than the usual expected reward maximization and is illustrated by numerous examples (like expected reward maximization, maximization of the predictive information, maximization of the entropy and minimization of the variance of a given reward function). We also provide stationarity conditions together with appropriate gradient formulas. Last but not least, we prove convergence of a stochastic optimization algorithm (as considered in the second topic) applied to a general learning problem (as considered in the third topic). It is shown that the learning algorithm converges to the set of stationary points. Among others, the proof covers the convergence of an improved version of an algorithm for the maximization of the predictive information as proposed by N. Ay, R. Der and K. Zahedi. We also investigate an application to a linear Gaussian dynamic, where the policies are encoded by the unit-ball in a space of matrices equipped with operator norm.
10

Causal Models over Infinite Graphs and their Application to the Sensorimotor Loop: Causal Models over Infinite Graphs and their Application to theSensorimotor Loop: General Stochastic Aspects and GradientMethods for Optimal Control

Bernigau, Holger 04 July 2015 (has links)
Motivation and background The enormous amount of capabilities that every human learns throughout his life, is probably among the most remarkable and fascinating aspects of life. Learning has therefore drawn lots of interest from scientists working in very different fields like philosophy, biology, sociology, educational sciences, computer sciences and mathematics. This thesis focuses on the information theoretical and mathematical aspects of learning. We are interested in the learning process of an agent (which can be for example a human, an animal, a robot, an economical institution or a state) that interacts with its environment. Common models for this interaction are Markov decision processes (MDPs) and partially observable Markov decision processes (POMDPs). Learning is then considered to be the maximization of the expectation of a predefined reward function. In order to formulate general principles (like a formal definition of curiosity-driven learning or avoidance of unpleasant situation) in a rigorous way, it might be desirable to have a theoretical framework for the optimization of more complex functionals of the underlying process law. This might include the entropy of certain sensor values or their mutual information. An optimization of the latter quantity (also known as predictive information) has been investigated intensively both theoretically and experimentally using computer simulations by N. Ay, R. Der, K Zahedi and G. Martius. In this thesis, we develop a mathematical theory for learning in the sensorimotor loop beyond expected reward maximization. Approaches and results This thesis covers four different topics related to the theory of learning in the sensorimotor loop. First of all, we need to specify the model of an agent interacting with the environment, either with learning or without learning. This interaction naturally results in complex causal dependencies. Since we are interested in asymptotic properties of learning algorithms, it is necessary to consider infinite time horizons. It turns out that the well-understood theory of causal networks known from the machine learning literature is not powerful enough for our purpose. Therefore we extend important theorems on causal networks to infinite graphs and general state spaces using analytical methods from measure theoretic probability theory and the theory of discrete time stochastic processes. Furthermore, we prove a generalization of the strong Markov property from Markov processes to infinite causal networks. Secondly, we develop a new idea for a projected stochastic constraint optimization algorithm. Generally a discrete gradient ascent algorithm can be used to generate an iterative sequence that converges to the stationary points of a given optimization problem. Whenever the optimization takes place over a compact subset of a vector space, it is possible that the iterative sequence leaves the constraint set. One possibility to cope with this problem is to project all points to the constraint set using Euclidean best-approximation. The latter is sometimes difficult to calculate. A concrete example is an optimization over the unit ball in a matrix space equipped with operator norm. Our idea consists of a back-projection using quasi-projectors different from the Euclidean best-approximation. In the matrix example, there is another canonical way to force the iterative sequence to stay in the constraint set: Whenever a point leaves the unit ball, it is divided by its norm. For a given target function, this procedure might introduce spurious stationary points on the boundary. We show that this problem can be circumvented by using a gradient that is tailored to the quasi-projector used for back-projection. We state a general technical compatibility condition between a quasi-projector and a metric used for gradient ascent, prove convergence of stochastic iterative sequences and provide an appropriate metric for the unit-ball example. Thirdly, a class of learning problems in the sensorimotor loop is defined and motivated. This class of problems is more general than the usual expected reward maximization and is illustrated by numerous examples (like expected reward maximization, maximization of the predictive information, maximization of the entropy and minimization of the variance of a given reward function). We also provide stationarity conditions together with appropriate gradient formulas. Last but not least, we prove convergence of a stochastic optimization algorithm (as considered in the second topic) applied to a general learning problem (as considered in the third topic). It is shown that the learning algorithm converges to the set of stationary points. Among others, the proof covers the convergence of an improved version of an algorithm for the maximization of the predictive information as proposed by N. Ay, R. Der and K. Zahedi. We also investigate an application to a linear Gaussian dynamic, where the policies are encoded by the unit-ball in a space of matrices equipped with operator norm.

Page generated in 0.4766 seconds