• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 11
  • 2
  • 1
  • Tagged with
  • 15
  • 15
  • 9
  • 4
  • 4
  • 4
  • 4
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Three Essays on Exploration and Exploitation: Behavioral Insights and Individual Decision-Making

Guida, Vittorio 14 December 2022 (has links)
Since James G. March introduced the concepts of exploration and exploitation in 1991, they have become ubiquitous in research on organizations and management. According to March (1991), exploration and exploitation are two sets of activities that allow systems (i.e., agents, either organizations or individuals) to adapt to their environment. On the one hand, exploitation activities are based on pre-existing knowledge, and consist of its implementation and/or refinement (e.g., production). On the other hand, exploration is based on knowledge that is not currently possessed by the system and, hence, refers to those activities that allow to acquire such new knowledge (e.g., search and experimentation). Scholars have produced a large number of contributions that have expanded our knowledge of exploration and exploitation even going beyond the initial boundaries of the field of organizational learning. Today, this large body of contributions that has developed over 30 years appears complex and divided into a plethora of research subfields (e.g., Almahendra and Ambos, 2015). Thus, research on exploration and exploitation has reached a level of conceptual and methodological sophistication that demands a high level of effort from researchers wishing to approach it. Among the multiple strands of emerging research, some scholars (such as Wilden et al., 2018) have recently begun to propose a return to the adoption of a behavioral approach to the study of exploration and exploitation. The earliest behavioral approach adopted in organizational studies is that of the "Carnegie School", which included Herbert Simon, Richard Cyert, and James March himself. Such an approach focuses the investigation of organizations on human behavior. In other words, adopting a behavioral approach involves studying organizations from the attitudes of their members, cognition, rationality, motivation, relationships, conflicts, and many other instances of psychological, economic, and social factors that influence human behavior (see, for example, March and Simon, 1958; Cyert and March, 1963). Today, this return to the behavioral approach is also associated with the "micro-foundations of strategy" movement (e.g., Felin et al., 2015) and so-called behavioral strategy (Powell et al., 2011). In essence, while the former is based on the importance of studying organizations and strategy by adopting a level of analysis below the collective/systemic (i.e., organizational) level, the latter includes all the elements that already characterized the behavioral approach (i.e., psychological, and social factors), reinforced by insights from the behavioral economics literature and the adoption of multiple methods, including experiments. This Doctoral dissertation enters this discussion and aims to investigate exploration and exploitation by adopting a behavioral approach, a "micro-foundational" perspective, and research methods that include laboratory experiments and computer simulations. The first study is a literature review paper with three purposes, each pursued in one of its three sections. First, it addresses the conceptual development of the exploration-exploitation literature that led to the emergence of the complex body of contributions mentioned above, providing a kind of "road map" of the research field based on the major literature reviews published over the past three decades. This is intended as a contribution towards researchers who want to take the first steps in the study of exploration-exploitation research. At the end of this road map, the paper by Wilden et al. (2018) is presented, linking the entire field of research to an emerging stream of research directed toward a return to James March's behavioral approach, enhanced by contributions in the areas of "micro-foundations" (e.g., Felin et al., 2015) and behavioral strategy (Powell et al., 2011). Second, based on the approach promoted in such research stream, a review of the literature on experimental studies of exploration and exploitation is provided. Laboratory experiments are considered key methods for advancing the study of exploration and exploitation by adopting a behavioral approach. Finally, the first essay is concluded with three suggested directions for further research: the improvement of existing conceptualizations through modeling, the further sophistication of existing experimental designs to capture features of managerial decision making that are currently beyond the scope of the state-of-the-art models underlying the mainly adopted experimental investigations, and the consideration of a multilevel approach to the study of individual exploration and exploitation, which consists of examining the variables that influence individual behavior at different organizational levels. The second study consists of an experimental investigation of the role of different sources of uncertainty on individual exploration-exploitation. It is based on the rationale underlying the third further research path proposed in the first study. Although an increasing adoption of laboratory experiments can be acknowledged in the research field, it is here argued that scholars have not experimentally disentangled the effects of two different types of uncertainty that emerge in the managerial and psychological literature, namely internal uncertainty, and external uncertainty. The former consists in the inability of individuals to predict future performance; while the latter results from the external environment and consists of unknown information about phenomena that may affect the final outcomes of a decision. The experimental design deployed in the study exposes a group of participants to the presence of the sole internal uncertainty, and a treatment group to the combined presence of the two sources. Findings show that the combined presence of these two sources of uncertainty may lead to the over-exploitation of initial routines, and, consequently, to the inability of individuals to exploit new opportunities stemming by alternatives discovered over time. Finally, the third study focuses on imitation, and exploration and exploitation, and builds on an agent-based model and computer simulations. This essay follows the first research trajectory suggested in the first study. While prominent research has defined imitation as a less costly alternative to experimentation (i.e., exploration), the possible role of imitation in the exploration-exploitation trade-off appears to be under-investigated. The interplay between imitation and exploration is rendered by the modeling of two types of agents: imitators and explorers. Differently from previous studies based on modeling, agent types are explicitly modeled as Simonian "satisficers". Experimentation is modeled as random search, whereas imitation builds on research on imitative heuristics. When engaging in adaptation in a competitive environment, both the types of agent experience "over-crowding" effects depending on the characteristics of their type. The paper concludes with the acknowledgement of limitations of the adopted model and proposes further investigation paths that include the calibration through experimental data.
12

Scenario-based strategic planning and strategic management in family firms

Brands, Christian 18 September 2013 (has links)
This cumulative dissertation covers the concepts of scenario-based strategic planning and strategic management in family firms over five articles. The first article gives an overview of the cumulative dissertation explaining the research gap, approach and contribution of the dissertation. The paper highlights the two research areas covered by the dissertation with two articles focusing on scenario-based strategic planning and two on strategic management in family firms. The second article is the first of two focusing on scenario-based strategic planning. It introduces and describes a set of six tools facilitating the implementation of scenario-based strategic planning in corporate practice. The third paper adapts these tools to the financial management and controlling context in private companies highlighting the tools’ flexibility in managing uncertain and volatile environments. The fourth article is the first of two focusing on strategic management in family firms. It analyzes organizational ambidexterity as a factor explaining family firm performance. The article shows that a high level of organizational ambidexterity in family firms leads to a higher family firm performance. The final paper concludes the dissertation examining the tendency of family firms to focus on capability exploration or resource exploitation over different generations managing the family firm.:I. SCENARIO-BASED STRATEGIC PLANNING AND STRATEGIC MANAGEMENT IN FAMILY FIRMS … 1 1. Research question and goal of the dissertation … 2 2. Summary of papers … 8 2.1. Contribution … 12 2.2. Implications and further research … 16 II. SIX TOOLS FOR SCENARIO-BASED STRATEGIC PLANNING AND THEIR APPLICATION … 25 1. Introducing tools one and two: The framing checklist and 360° stakeholder feedback … 27 1.1. The framing checklist … 27 1.2. Description of the framing checklist … 29 1.3. 360° stakeholder feedback … 36 1.3.1. Existing perceptions, blind spots and weak signals … 37 1.3.2. Description of 360° stakeholder feedback … 38 1.4. Evaluation of the framing checklist and 360° stakeholder feedback … 44 2. Applying frameworks one and two: The framing checklist and 360° stakeholder feedback in the European airline industry … 46 2.1. Introduction … 46 2.2. The framing checklist … 46 2.3. 360° stakeholder feedback … 48 3. Introducing tools three and four: The impact/uncertainty grid and the scenario matrix … 53 3.1. The impact/uncertainty grid … 53 3.2. Description of the impact/uncertainty grid … 55 3.3. The scenario matrix … 57 3.4. Description of the scenario matrix … 62 3.5. Evaluating the impact/uncertainty grid and the scenario matrix … 67 4. Applying frameworks three and four: The impact/uncertainty grid and the scenario matrix in the European airline industry … 69 4.1. Introduction … 69 4.2. The impact/uncertainty grid … 69 4.3. The scenario matrix … 71 5. Introducing tools five and six: The strategy manual and the monitoring cockpit … 87 5.1. Introduction … 87 5.2. The strategy manual … 87 5.3. Description of the strategy manual … 91 5.4. The scenario cockpit … 95 5.5. Description of the scenario cockpit … 96 5.6. Evaluating the strategy manual and the scenario cockpit ..................... 99 6. Applying frameworks five and six: The strategy manual and the scenario cockpit in the European airline industry … 102 6.1. The strategy manual … 102 6.2. The scenario cockpit … 105 III. SZENARIOBASIERTE STRATEGISCHE PLANUNG IN VOLATILEN UMFELDERN … 111 1. Einführung: Unternehmen agieren in einer zunehmend volatilen Umwelt … 112 2. Volatilität als Herausforderung für die strategische Planung … 112 3. Szenariobasierte strategische Planung als Lösungsansatz für Planung unter Volatilität …114 3.1. Grundlagen der szenariobasierten strategischen Planung … 114 3.2. Prozess der szenariobasierten strategischen Planung … 115 4. Zusammenfassung ... 122 IV. ORGANIZATIONAL AMBIDEXTERITY AND FAMILY FIRM PERFORMANCE … 125 1. Introduction … 126 2. Theory and Hypotheses … 127 3. Methodology … 131 3.1. Research Design and Sample Generation … 131 3.2. Measures … 133 4. Analysis and Results … 135 5. Discussion and Conclusion … 139 V. THE IMPACT OF SUCCESOR GENERATION DISCOUNT IN FAMILY FIRMS: EXAMINING NONLINEAR EFFECTS ON EXPLORATION AND EXPLOITATION … 150 1. Introduction … 151 2. The RBV and the importance of exploration and exploitation … 154 3. The importance of exploration and exploitation in family firms … 156 4. The impact of generational involvement on exploration and exploitation in family firms … 159 5. Methodology … 164 5.1. Constructs … 165 5.2. Results … 167 6. Discussion … 172 6.1. Implications for theory and practice … 175 6.2. Study limitations and future research … 176 6.3. Conclusion … 177
13

Large state spaces and self-supervision in reinforcement learning

Touati, Ahmed 08 1900 (has links)
L'apprentissage par renforcement (RL) est un paradigme d'apprentissage orienté agent qui s'intéresse à l'apprentissage en interagissant avec un environnement incertain. Combiné à des réseaux de neurones profonds comme approximateur de fonction, l'apprentissage par renforcement profond (Deep RL) nous a permis récemment de nous attaquer à des tâches très complexes et de permettre à des agents artificiels de maîtriser des jeux classiques comme le Go, de jouer à des jeux vidéo à partir de pixels et de résoudre des tâches de contrôle robotique. Toutefois, un examen plus approfondi de ces remarquables succès empiriques révèle certaines limites fondamentales. Tout d'abord, il a été difficile de combiner les caractéristiques souhaitables des algorithmes RL, telles que l'apprentissage hors politique et en plusieurs étapes, et l'approximation de fonctions, de manière à obtenir des algorithmes stables et efficaces dans de grands espaces d'états. De plus, les algorithmes RL profonds ont tendance à être très inefficaces en raison des stratégies d'exploration-exploitation rudimentaires que ces approches emploient. Enfin, ils nécessitent une énorme quantité de données supervisées et finissent par produire un agent étroit capable de résoudre uniquement la tâche sur laquelle il est entrainé. Dans cette thèse, nous proposons de nouvelles solutions aux problèmes de l'apprentissage hors politique et du dilemme exploration-exploitation dans les grands espaces d'états, ainsi que de l'auto-supervision dans la RL. En ce qui concerne l'apprentissage hors politique, nous apportons deux contributions. Tout d'abord, pour le problème de l'évaluation des politiques, nous montrons que la combinaison des méthodes populaires d'apprentissage hors politique et à plusieurs étapes avec une paramétrisation linéaire de la fonction de valeur pourrait conduire à une instabilité indésirable, et nous dérivons une variante de ces méthodes dont la convergence est prouvée. Deuxièmement, pour l'optimisation des politiques, nous proposons de stabiliser l'étape d'amélioration des politiques par une régularisation de divergence hors politique qui contraint les distributions stationnaires d'états induites par des politiques consécutives à être proches les unes des autres. Ensuite, nous étudions l'apprentissage en ligne dans de grands espaces d'états et nous nous concentrons sur deux hypothèses structurelles pour rendre le problème traitable : les environnements lisses et linéaires. Pour les environnements lisses, nous proposons un algorithme en ligne efficace qui apprend activement un partitionnement adaptatif de l'espace commun en zoomant sur les régions les plus prometteuses et fréquemment visitées. Pour les environnements linéaires, nous étudions un cadre plus réaliste, où l'environnement peut maintenant évoluer dynamiquement et même de façon antagoniste au fil du temps, mais le changement total est toujours limité. Pour traiter ce cadre, nous proposons un algorithme en ligne efficace basé sur l'itération de valeur des moindres carrés pondérés. Il utilise des poids exponentiels pour oublier doucement les données qui sont loin dans le passé, ce qui pousse l'agent à continuer à explorer pour découvrir les changements. Enfin, au-delà du cadre classique du RL, nous considérons un agent qui interagit avec son environnement sans signal de récompense. Nous proposons d'apprendre une paire de représentations qui mettent en correspondance les paires état-action avec un certain espace latent. Pendant la phase non supervisée, ces représentations sont entraînées en utilisant des interactions sans récompense pour encoder les relations à longue portée entre les états et les actions, via une carte d'occupation prédictive. Au moment du test, lorsqu'une fonction de récompense est révélée, nous montrons que la politique optimale pour cette récompense est directement obtenue à partir de ces représentations, sans aucune planification. Il s'agit d'une étape vers la construction d'agents entièrement contrôlables. Un thème commun de la thèse est la conception d'algorithmes RL prouvables et généralisables. Dans la première et la deuxième partie, nous traitons de la généralisation dans les grands espaces d'états, soit par approximation de fonctions linéaires, soit par agrégation d'états. Dans la dernière partie, nous nous concentrons sur la généralisation sur les fonctions de récompense et nous proposons un cadre d'apprentissage non-supervisé de représentation qui est capable d'optimiser toutes les fonctions de récompense. / Reinforcement Learning (RL) is an agent-oriented learning paradigm concerned with learning by interacting with an uncertain environment. Combined with deep neural networks as function approximators, deep reinforcement learning (Deep RL) allowed recently to tackle highly complex tasks and enable artificial agents to master classic games like Go, play video games from pixels, and solve robotic control tasks. However, a closer look at these remarkable empirical successes reveals some fundamental limitations. First, it has been challenging to combine desirable features of RL algorithms, such as off-policy and multi-step learning with function approximation in a way that leads to both stable and efficient algorithms in large state spaces. Moreover, Deep RL algorithms tend to be very sample inefficient due to the rudimentary exploration-exploitation strategies these approaches employ. Finally, they require an enormous amount of supervised data and end up producing a narrow agent able to solve only the task that it was trained on. In this thesis, we propose novel solutions to the problems of off-policy learning and exploration-exploitation dilemma in large state spaces, as well as self-supervision in RL. On the topic of off-policy learning, we provide two contributions. First, for the problem of policy evaluation, we show that combining popular off-policy and multi-step learning methods with linear value function parameterization could lead to undesirable instability, and we derive a provably convergent variant of these methods. Second, for policy optimization, we propose to stabilize the policy improvement step through an off-policy divergence regularization that constrains the discounted state-action visitation induced by consecutive policies to be close to one another. Next, we study online learning in large state spaces and we focus on two structural assumptions to make the problem tractable: smooth and linear environments. For smooth environments, we propose an efficient online algorithm that actively learns an adaptive partitioning of the joint space by zooming in on more promising and frequently visited regions. For linear environments, we study a more realistic setting, where the environment is now allowed to evolve dynamically and even adversarially over time, but the total change is still bounded. To address this setting, we propose an efficient online algorithm based on weighted least squares value iteration. It uses exponential weights to smoothly forget data that are far in the past, which drives the agent to keep exploring to discover changes. Finally, beyond the classical RL setting, we consider an agent interacting with its environments without a reward signal. We propose to learn a pair of representations that map state-action pairs to some latent space. During the unsupervised phase, these representations are trained using reward-free interactions to encode long-range relationships between states and actions, via a predictive occupancy map. At test time, once a reward function is revealed, we show that the optimal policy for that reward is directly obtained from these representations, with no planning. This is a step towards building fully controllable agents. A common theme in the thesis is the design of provable RL algorithms that generalize. In the first and the second part, we deal with generalization in large state spaces either by linear function approximation or state aggregation. In the last part, we focus on generalization over reward functions and we propose a task-agnostic representation learning framework that is provably able to solve all reward functions.
14

S-MARL: An Algorithm for Single-To-Multi-Agent Reinforcement Learning : Case Study: Formula 1 Race Strategies

Davide, Marinaro January 2023 (has links)
A Multi-Agent System is a group of autonomous, intelligent, interacting agents sharing an environment that they observe through sensors, and upon which they act with actuators. The behaviors of these agents can be either defined upfront by programmers or learned by trial-and-error resorting to Reinforcement Learning. In this last context, the approaches proposed by literature can be categorized either as Single-Agent or Multi-Agent. The former approaches experience more stable training at the cost of defining upfront the policies of all the agents that are not learning, with the risk of limiting the performances of the learned policy. The latter approaches do not have such a limitation but experience higher training instability. Therefore, we propose a new approach based on the transition from Single-Agent to Multi-Agent Reinforcement Learning that exploits the benefits of both approaches: higher stability at the beginning of the training to learn the environment’s dynamics, and unconstrained agents in the latest phases. To conduct this study, we chose Formula 1 as the Multi-Agent System, a complex environment with more than two interacting agents. In doing so, we designed a realistic racing simulation environment, framed as a Markov Decision Process, able to reproduce the core dynamics of races. After that, we trained three agents based on Semi-Gradient Q-Learning with different frameworks: pure Single-Agent, pure Multi-Agent, and Single-to-Multi-Agent. The results established that, given the same initial conditions and training episodes, our approach outperforms both the Single-Agent and Multi-Agent frameworks, obtaining higher scores in the proposed benchmarks. / Ett system med flera agenter är en grupp autonoma, intelligenta, interagerande agenter som delar en miljö som de observerar med hjälp av sensorer och som de agerar på med hjälp av agenter. Beteendena hos dessa agenter kan antingen definieras i förväg av programmerare eller läras in genom försök och misstag med hjälp av förstärkningsinlärning. I det sistnämnda sammanhanget kan de metoder som föreslagits i litteraturen kategoriseras som antingen en eller flera agenter. De förstnämnda tillvägagångssätten ger en stabilare utbildning till priset av att man i förväg måste definiera politiken för alla de agenter som inte lär sig, vilket innebär en risk för att den inlärda politikens prestanda begränsas. De senare metoderna har inte en sådan begränsning men upplever en högre instabilitet i utbildningen. Därför föreslår vi en ny metod som bygger på övergången från förstärkningsinlärning med en agent till förstärkningsinlärning med flera agenter och som utnyttjar fördelarna med båda metoderna: högre stabilitet i början av utbildningen för att lära sig miljöns dynamik och agenter utan begränsningar i de senaste faserna. För att genomföra den här studien valde vi Formel 1 som ett system med flera agenter, en komplex miljö med mer än två interagerande agenter. Vi utformade därför en realistisk simulering av tävlingar som är utformad som en Markov-beslutsprocess och som kan återge den centrala dynamiken i tävlingar. Därefter tränade vi tre agenter baserat på Semi-Gradient Q-Learning med olika ramar: ren Single-Agent, ren Multi-Agent och Single-to-Multi-Agent. Resultaten visade att vår metod, med samma startvillkor och träningsepisoder, överträffar både Single-Agent- och Multi-Agent-ramarna och får högre poäng i de föreslagna riktmärkena.
15

Statistical Design of Sequential Decision Making Algorithms

Chi-hua Wang (12469251) 27 April 2022 (has links)
<p>Sequential decision-making is a fundamental class of problem that motivates algorithm designs of online machine learning and reinforcement learning. Arguably, the resulting online algorithms have supported modern online service industries for their data-driven real-time automated decision making. The applications span across different industries, including dynamic pricing (Marketing), recommendation (Advertising), and dosage finding (Clinical Trial). In this dissertation, we contribute fundamental statistical design advances for sequential decision-making algorithms, leaping progress in theory and application of online learning and sequential decision making under uncertainty including online sparse learning, finite-armed bandits, and high-dimensional online decision making. Our work locates at the intersection of decision-making algorithm designs, online statistical machine learning, and operations research, contributing new algorithms, theory, and insights to diverse fields including optimization, statistics, and machine learning.</p> <p><br></p> <p>In part I, we contribute a theoretical framework of continuous risk monitoring for regularized online statistical learning. Such theoretical framework is desirable for modern online service industries on monitoring deployed model's performance of online machine learning task. In the first project (Chapter 1), we develop continuous risk monitoring for the online Lasso procedure and provide an always-valid algorithm for high-dimensional dynamic pricing problems. In the second project (Chapter 2), we develop continuous risk monitoring for online matrix regression and provide new algorithms for rank-constrained online matrix completion problems. Such theoretical advances are due to our elegant interplay between non-asymptotic martingale concentration theory and regularized online statistical machine learning.</p> <p><br></p> <p>In part II, we contribute a bootstrap-based methodology for finite-armed bandit problems, termed Residual Bootstrap exploration. Such a method opens a possibility to design model-agnostic bandit algorithms without problem-adaptive optimism-engineering and instance-specific prior-tuning. In the first project (Chapter 3), we develop residual bootstrap exploration for multi-armed bandit algorithms and shows its easy generalizability to bandit problems with complex or ambiguous reward structure. In the second project (Chapter 4), we develop a theoretical framework for residual bootstrap exploration in linear bandit with fixed action set. Such methodology advances are due to our development of non-asymptotic theory for the bootstrap procedure.</p> <p><br></p> <p>In part III, we contribute application-driven insights on the exploration-exploitation dilemma for high-dimensional online decision-making problems. Such insights help practitioners to implement effective high-dimensional statistics methods to solve online decisionmaking problems. In the first project (Chapter 5), we develop a bandit sampling scheme for online batch high-dimensional decision making, a practical scenario in interactive marketing, and sequential clinical trials. In the second project (Chapter 6), we develop a bandit sampling scheme for federated online high-dimensional decision-making to maintain data decentralization and perform collaborated decisions. These new insights are due to our new bandit sampling design to address application-driven exploration-exploitation trade-offs effectively. </p>

Page generated in 0.1833 seconds